Forum Archive

Drizzel

Jan 16, 2019 - 17:51

I’m trying to get the website content of this website, it simply shows the lessons (I’m still in school) that aren’t going to take place.
When I first manually log into here , and then manually open the previously mentioned website, I get some usable source code.

But, if I then close Safari, reopen it, and repeat these steps without logging in, there is no source code whatsoever.

I didn’t manage to first login first with requests and then scrape the content of the other website, but I’m confident it’s possible. How could I do that?

eddo888

Jan 16, 2019 - 22:26

two excellent modules to use are
* requests , to retrieve html content
* beautiful soup (bs4), to parse html content

you can load these with StaSh and use "pip install requests bs4"

JonB

Jan 17, 2019 - 01:30

bs4 and requests come preinstalled, no reason to update, which usually only causes issues

mikael

Jan 17, 2019 - 05:17

@Drizzel, some sites are use so much JS that they are hard to scrape with just requests and bs4. If this is the case here, you can use WebView to act as a browser and run the JS. I have a small helper class for this, discussed here.

In all cases, web scraping seems to be a lot of detective work, trial and error.

Forum Archive

Get webpage content