Forum Archive

Web scraping script not working

cubflier

The following script returns the current weather model from the url in the the script and works fine on my mac however I'm getting socket errors on pythonista3

from bs4 import BeautifulSoup as bs
import urllib
import re
import urllib2 as ul
import html5lib


base_url = 'https://weather.cod.edu/forecast/menus/gfs_menu.php?type=2018110518-AK-700-spd-0-0'
all_str = ''
def extract_names(filename, td):
        text = re.findall(td,str(filename))
        data_list = []
        for line in text:
            data_list.append(line)
        return data_list

source = ul.urlopen(base_url).read()
tree = bs(source, 'html5lib')
filename = tree.find_all('td')

i = '\*'
td = re.compile(r'(\d+)Z' + i)
line_str = extract_names(filename, td)
line_str = str(line_str).strip('[]')
line_str = str(line_str).strip("'")
#all_str = all_str + line_str + '\n\n'

print line_str

I have used python 2.7 and tried python 3.6 with changing line 17 to :

source = urllib.request.urlopen(base_url).read()

I'm just starting on pythonista and trying to adapt some scripts that scrape weather sites for information that will be used to build custom url's to access the weather sites.

Any help is appreciated,

Thanks - Jerry

mikael

@cubflier, intermittent or consistent errors?

In any case, I would suggest using the requests module, comes standard with Pythonista, and usually works with no fuss.

JonB

To use requests, you would use:

source = requests.get(base_url).contents
JonB

Also, stupid questions, but is your network available? If on cellular, you would need to ensure data is on, and pythonista is authorized to use data.

cubflier

Requests worked.

Changed code to:

from bs4 import BeautifulSoup as bs
import requests
import re



base_url = 'https://weather.cod.edu/forecast/menus/gfs_menu.php?type=2018110518-AK-700-spd-0-0'
all_str = ''
def extract_names(filename, td):
        text = re.findall(td,str(filename))
        data_list = []
        for line in text:
            data_list.append(line)
        return data_list

source = bs(requests.get(base_url).text)
filename = source.find_all('td')

i = '\*'
td = re.compile(r'(\d+)Z' + i)
line_str = extract_names(filename, td)
line_str = str(line_str).strip('[]')
line_str = str(line_str).strip("'")

print(line_str)

Network was on and available. I'm still not sure why the old code that has worked in all other application platforms(linux and mac) failed but then again my skills in python are minimal.

It now returns a two digit number for the current weather model that I need to go forward.

I sure do appreciate the help.

Thanks - Jerry

JonB

Were you getting SSL: CERTIFICATE_VERIFY_FAILED socket errors?

I recall some issues with the version of OpenSSL used by pythonista, either not supporting all of the latest protocols, or uses an old set of root ca's, or is otherwise unable to validate certificates.
I think the default setting in requests essentially ignores these issues.

cubflier

Yes - that was the error that I was getting with the original code. The error was consistent and the code did not execute.

Jerry