Somehow this Custom Action I made for a Editorial can get help information on the html2text module, though a simple
help(html2text)
In the scratchpad can't find the module.
Hmmm
The output even says it was made by Aaron Swartz (I had to customized the full module myself to work with Editorial), but Ole could just import it.
It also tells you if the modules are built-in or not.
The help:
Help on module html2text:
NAME
html2text - html2text: Turn HTML into equivalent Markdown-structured text.
FILE
/var/mobile/Applications/ECD5B996-AD70-4852-B4F6-DF54A263C0CD/Editorial.app/pylib/site-packages/html2text.py
CLASSES
HTMLParser.HTMLParser(markupbase.ParserBase)
HTML2Text
class HTML2Text(HTMLParser.HTMLParser)
| Method resolution order:
| HTML2Text
| HTMLParser.HTMLParser
| markupbase.ParserBase
|
| Methods defined here:
|
| __init__(self, out=None, baseurl='', bodywidth=78)
|
| charref(self, name)
|
| close(self)
|
| drop_last(self, nLetters)
|
| entityref(self, c)
|
| feed(self, data)
|
| google_nest_count(self, style)
| calculate the nesting count of google doc lists
|
| handle(self, data)
|
| handle_charref(self, c)
|
| handle_data(self, data)
|
| handle_emphasis(self, start, tag_style, parent_style)
| handles various text emphases
|
| handle_endtag(self, tag)
|
| handle_entityref(self, c)
|
| handle_starttag(self, tag, attrs)
|
| handle_tag(self, tag, attrs, start)
|
| o(self, data, puredata=0, force=0)
|
| optwrap(self, text)
| Wrap all paragraphs in the provided text.
|
| outtextf(self, s)
|
| p(self)
|
| pbr(self)
|
| previousIndex(self, attrs)
| returns the index of certain set of attributes (of a link) in the
| self.a list
|
| If the set of attributes is not found, returns None
|
| replaceEntities(self, s)
|
| soft_br(self)
|
| unescape(self, s)
|
| unknown_decl(self, data)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| r_unescape = <_sre.SRE_Pattern object>
|
| ----------------------------------------------------------------------
| Methods inherited from HTMLParser.HTMLParser:
|
| check_for_whole_start_tag(self, i)
| # Internal -- check to see if we have a complete starttag; return end
| # or -1 if incomplete.
|
| clear_cdata_mode(self)
|
| error(self, message)
|
| get_starttag_text(self)
| Return full source of start tag: '<...>'.
|
| goahead(self, end)
| # Internal -- handle data as far as reasonable. May leave state
| # and data to be processed by a subsequent call. If 'end' is
| # true, force handling all data as if followed by EOF marker.
|
| handle_comment(self, data)
| # Overridable -- handle comment
|
| handle_decl(self, decl)
| # Overridable -- handle declaration
|
| handle_pi(self, data)
| # Overridable -- handle processing instruction
|
| handle_startendtag(self, tag, attrs)
| # Overridable -- finish processing of start+end tag:
|
| parse_bogus_comment(self, i, report=1)
| # Internal -- parse bogus comment, return length or -1 if not terminated
| # see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
|
| parse_endtag(self, i)
| # Internal -- parse endtag, return end or -1 if incomplete
|
| parse_html_declaration(self, i)
| # Internal -- parse html declarations, return length or -1 if not terminated
| # See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
| # See also parse_declaration in _markupbase
|
| parse_pi(self, i)
| # Internal -- parse processing instr, return end or -1 if not terminated
|
| parse_starttag(self, i)
| # Internal -- handle starttag, return end or -1 if not terminated
|
| reset(self)
| Reset this instance. Loses all unprocessed data.
|
| set_cdata_mode(self, elem)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from HTMLParser.HTMLParser:
|
| CDATA_CONTENT_ELEMENTS = ('script', 'style')
|
| entitydefs = None
|
| ----------------------------------------------------------------------
| Methods inherited from markupbase.ParserBase:
|
| getpos(self)
| Return current line number and offset.
|
| parse_comment(self, i, report=1)
| # Internal -- parse comment, return length or -1 if not terminated
|
| parse_declaration(self, i)
| # Internal -- parse declaration (for use by subclasses).
|
| parse_marked_section(self, i, report=1)
| # Internal -- parse a marked section
| # Override this to handle MS-word extension syntax content
|
| updatepos(self, i, j)
| # Internal -- update line number and offset. This should be
| # called for each piece of data exactly once, in order -- in other
| # words the concatenation of all the input strings to this
| # function should be exactly the entire input.
FUNCTIONS
dumb_css_parser(data)
returns a hash of css selectors, each of which contains a hash of
css attributes
dumb_property_dict(style)
returns a hash of css attributes
element_style(attrs, style_def, parent_style)
returns a hash of the 'final' style attributes of the element
escape_md(text)
Escapes markdown-sensitive characters within other markdown
constructs.
escape_md_section(text, snob=False)
Escapes markdown-sensitive characters across whole document sections.
google_fixed_width_font(style)
check if the css of the current element defines a fixed width font
google_has_height(style)
check if the style of the element has the 'height' attribute
explicitly defined
google_list_style(style)
finds out whether this is an ordered or unordered list
google_text_emphasis(style)
return a list of all emphasis modifiers of the element
hn(tag)
### End Entity Nonsense ###
html2text(html, baseurl='', bodywidth=78)
list_numbering_start(attrs)
extract numbering from list element attributes
main()
name2cp(k)
skipwrap(para)
unescape(s, unicode_snob=False)
wrapwrite(text)
DATA
BODY_WIDTH = 78
ESCAPE_SNOB = 0
GOOGLE_LIST_INDENT = 36
IGNORE_ANCHORS = False
IGNORE_EMPHASIS = False
IGNORE_IMAGES = False
INLINE_LINKS = True
LINKS_EACH_PARAGRAPH = 0
SKIP_INTERNAL_LINKS = True
SPACE_RE = <_sre.SRE_Pattern object>
UNICODE_SNOB = 0
__author__ = 'Aaron Swartz (me@aaronsw.com)'
__contributors__ = ["Martin 'Joey' Schulze", 'Ricardo Reyes', 'Kevin J...
__copyright__ = '(C) 2004-2008 Aaron Swartz. GNU GPL 3.'
__version__ = '2014.4.5'
division = _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192...
k = 'icirc'
md_backslash_matcher = <_sre.SRE_Pattern object>
md_chars_matcher = <_sre.SRE_Pattern object>
md_chars_matcher_all = <_sre.SRE_Pattern object>
md_dash_matcher = <_sre.SRE_Pattern object>
md_dot_matcher = <_sre.SRE_Pattern object>
md_plus_matcher = <_sre.SRE_Pattern object>
ordered_list_matcher = <_sre.SRE_Pattern object>
slash_chars = r'\`*_{}[]()#+-.!'
unifiable = {'aacute': 'a', 'acirc': 'a', 'aelig': 'ae', 'agrave': 'a'...
unifiable_n = {160: ' ', 169: '(C)', 183: '*', 224: 'a', 225: 'a', 226...
unordered_list_matcher = <_sre.SRE_Pattern object>
VERSION
2014.4.5
AUTHOR
Aaron Swartz (me@aaronsw.com)
None