Forum Archive

BeautifulSoup XML

daveg

For some reason, bs4 is giving a "FeatureNotFound" error when trying to parse xml documents.

This happens with both BeautifulSoup(markdown,'xml') and BeautifulStoneSoup(markdown).

Is this just because lxml is not installed? Is there any way to get BeautifulSoup to use a different xml parser?

Thanks.

Webmaster4o

Why don't you use the xml module to parse xml? ;) BeautifulSoup is specifically for HTML parsing, and although it may parse XML in many cases, it's not built to do that, so it won't work perfectly.

dgelessus

BeautifulSoup isn't a HTML parser (I think), it's a tool for working with a HTML document. By passing it the "xml" argument it can be switched into XML mode, which (among other things) means that a different parser needs to be used. You are right, the BeautifulSoup objects have some similarities to the xml.etree.ElementTree API, but the best that ElementTree can do is recursive searching - with BeautifulSoup you can e. g. match based on a tag's attributes.

daveg

Yeah, I know I could use xml directly or xmltodict, but prefer BeautifulSoup's interface (and have a bunch of useful scripts I've already written that use it).

BeautifulSoup DOES support XML, but it needs a parser (think it might only support lxml now).

roosterboy

Yet another reason why I'd like to get lxml included in the next release of Pythonista!

[deleted]

Add lxml plz.