Forum Archive

BeautifulSoup Bug

n8henrie142

I have a script that is running perfectly on my Mac, but giving me an error in Pythonista.

BeautifulSoup is throwing an AttributeError: 'NoneType' object has no attribute 'next_element' on finding all data points in an HTML table: soup.find('table').find_all('td').

I can verify that soup appears correct and has the td that I'm looking for. I can print soup.find('table') in the console and it is correct. I can break it down to table = soup.find('table'); table.find_all('td'); and it still doesn't work. I've tried changing to the old .findAll instead of .find_all and that doesn't work either.

In fact, even soup.find('table').find('td') works correctly, but gives the error when changing .find('td') to .find_all('td').

find_all seems to work in some contexts, e.g. `bs4.BeautifulSoup(requests.get('http://omz-software.com').content).find('p').find_all('a') seems to work fine.

I can verify the identical code (synced by Dropbox) works fine on Python 2.7.8 in OS X.

Has anyone run into this?

JonB

Have you tried saving off the soup and trying on your OSX? The user agent might be different, so you might be comparing different soups.

Also, it is possible that bs4 is an older version on pythonista.

n8henrie142

Thanks for the response.

Same version on both.

$ python -c 'import bs4; print(bs4.__version__)'
4.3.2

I think the thing that seals it as a bug is that soup.find('table').find('td') works, but soup.find('table').find_all('td') throws an error, on the same soup object.

briarfox

I'd make sure you are getting the same web page that you are using for your soup. Maybe you are pulling a mobile version on your ipad. I think this is what JonB means by a different user-agent.

n8henrie142

I understand what he means, and I'll check, but I don't think that would explain in any way why .find('td') would have a result but .find_all('td') would cause an error. It wouldn't even make sense if it came up empty (it should at least find the result that .find() found), but it should definitely not cause an error.

n8henrie142

As suspected, I wrote html from Pythonista to a pickle file, loaded it on OS X, converted to soup, and had no problem using find_all('td') on OS X.

I also used difflib to inspect the differences between HTML content of the Pythonista file and that downloaded on OS X , and as far as I can tell the only differences are timestamps (as the content was downloaded minutes apart).

briarfox

Thats really odd, I've been using bs4 for awhile, no issues. Can you setup a gist of the page and I'll try?

n8henrie142

Unfortunately, I would have done that already except that it's a password-protected site I use for work. I haven't been able to replicate yet on a couple of other sites, but I'll try to find a public site that has the same bug.

n8henrie142

Getting all td from the table at w3schools.com/html/html_tables.asp works fine.

Here's my traceback.

2014-10-27 13:12:43 /var/mobile/Containers/Data/Application/3664C317-2455-4F95-AFC5-EAF05BC6B8BF/Documents/scratchpad.py :: __main__ ERROR    There was an error.
2014-10-27 13:12:44 /var/mobile/Containers/Data/Application/3664C317-2455-4F95-AFC5-EAF05BC6B8BF/Documents/scratchpad.py :: __main__ ERROR    'NoneType' object has no attribute 'next_element'
Traceback (most recent call last):
  File "/var/mobile/Containers/Data/Application/3664C317-2455-4F95-AFC5-EAF05BC6B8BF/Documents/scratchpad.py", line 28, in <module>
    print(len(table.find('td').find_all('td')))
  File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 1180, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 497, in _find_all
    return ResultSet(strainer, result)
  File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 1610, in __init__
    super(ResultSet, self).__init__(result)
  File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 494, in <genexpr>
    result = (element for element in generator
  File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 1198, in descendants
    current = current.next_element
AttributeError: 'NoneType' object has no attribute 'next_element'
JonB

Is it possible for you to "sanitize" the html so it is no longer contains any work info? I.e just strip out text and replace with random text?

Have you tried pickling the soup itself? (Mmmm pickle soup) either going from OSX to pythonista, or vice versa?