Forum Archive

.pdf links from a website

Jozh

Is it possible to build a workflow that pulls all the links to .PDFs off a webpage in the built-in browser.

peterh86

Yes, but i can't help in detail. It probably requires a workflow with just a Python script.

Given the webpage address, you'd use Requests to get the webpage html, then search for links ending in .pdf and return them in a list. I imagine you could use Requests to download the pdfs as well.

Gerzer

You might be able to pull the HTML directly from the built-in browser, but I’m not 100% sure.

ccc

See the two links below.... The basic idea is to use requests to get the webpage HTML and use BeautifulSoup to parse that HTML to find the links that end in ".pdf".

http://omz-forums.appspot.com/pythonista/post/5903606662299648

http://omz-forums.appspot.com/pythonista/post/5253563362050048