I made this python script for web scraping, it can map a web site in a dictionary for example:
{url:[url_type,{url1_in_url:[url1_in_url_type,{...}],url2_inurl:[...],...}]}
if someone also want to download all files in that web_site only need specify it:
with the variable 'descargar' and set it to True
url = 'https://en.wikipedia.org/wiki/Roy_Clark'
descargar = True
profundidad = 2
archivo = 'clark.json'
s = Scraper()
s.lineal(url,profundidad,descargar,archivo)
I want to implement some threading algorithm to speed up the analysis.
Note:
it sometimes have an error when downloading files.