Forum Archive

TutorialDoctor

Nov 25, 2014 - 16:20

Ole just posted a workflow that imports the html2text module, which isn't documented anywhere. Also, in the Pythonista forum there wasn't documentation on the function that let's you set a badge on the editorial app icon.

How would I go about profiling editorial or Pythonista for undocumented modules?

JonB

Nov 24, 2014 - 23:43

I've been considering something like this, but never bothered.

Here was my thought:
There is an index of the docs, found here: each row contains an object name, type, and doc ref (tab deliminated).

  /var/mobile/Applications/88D6E0F3-4BB7-4D36-8AD6-BAF532976A46/Pythonista.app/Documentation/objects_py.inv

Packages live in one of the paths in sys.path,
though mostly you'll be interested in pylib/site-packages and pylib_ext. If you just want to look for undocumented modules, os.listdir, look at the list, and you are done.

To find undoc stuff inside custom ios modules, is trickier, but there are only a few.
You likely need to import, which executes code so could cause issues, dunno. Perhaps jedi would be useful here.
If importing, dir(modulename) can give you list of functions and attributes, which can be compared against the index. Likely you would need to traverse packages. Inspect can give you some other stuff for pure python, like function args, etc. The tricky part will be finding instance attributes, or function arguments for c builtins.

TutorialDoctor

Nov 25, 2014 - 03:48

Somehow this Custom Action I made for a Editorial can get help information on the html2text module, though a simple

help(html2text)

In the scratchpad can't find the module.

Hmmm

The output even says it was made by Aaron Swartz (I had to customized the full module myself to work with Editorial), but Ole could just import it.

It also tells you if the modules are built-in or not.

The help:

Help on module html2text:

NAME
    html2text - html2text: Turn HTML into equivalent Markdown-structured text.

FILE
    /var/mobile/Applications/ECD5B996-AD70-4852-B4F6-DF54A263C0CD/Editorial.app/pylib/site-packages/html2text.py

CLASSES
    HTMLParser.HTMLParser(markupbase.ParserBase)
        HTML2Text

    class HTML2Text(HTMLParser.HTMLParser)
     |  Method resolution order:
     |      HTML2Text
     |      HTMLParser.HTMLParser
     |      markupbase.ParserBase
     |  
     |  Methods defined here:
     |  
     |  __init__(self, out=None, baseurl='', bodywidth=78)
     |  
     |  charref(self, name)
     |  
     |  close(self)
     |  
     |  drop_last(self, nLetters)
     |  
     |  entityref(self, c)
     |  
     |  feed(self, data)
     |  
     |  google_nest_count(self, style)
     |      calculate the nesting count of google doc lists
     |  
     |  handle(self, data)
     |  
     |  handle_charref(self, c)
     |  
     |  handle_data(self, data)
     |  
     |  handle_emphasis(self, start, tag_style, parent_style)
     |      handles various text emphases
     |  
     |  handle_endtag(self, tag)
     |  
     |  handle_entityref(self, c)
     |  
     |  handle_starttag(self, tag, attrs)
     |  
     |  handle_tag(self, tag, attrs, start)
     |  
     |  o(self, data, puredata=0, force=0)
     |  
     |  optwrap(self, text)
     |      Wrap all paragraphs in the provided text.
     |  
     |  outtextf(self, s)
     |  
     |  p(self)
     |  
     |  pbr(self)
     |  
     |  previousIndex(self, attrs)
     |      returns the index of certain set of attributes (of a link) in the
     |      self.a list
     |      
     |      If the set of attributes is not found, returns None
     |  
     |  replaceEntities(self, s)
     |  
     |  soft_br(self)
     |  
     |  unescape(self, s)
     |  
     |  unknown_decl(self, data)
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |  
     |  r_unescape = <_sre.SRE_Pattern object>
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from HTMLParser.HTMLParser:
     |  
     |  check_for_whole_start_tag(self, i)
     |      # Internal -- check to see if we have a complete starttag; return end
     |      # or -1 if incomplete.
     |  
     |  clear_cdata_mode(self)
     |  
     |  error(self, message)
     |  
     |  get_starttag_text(self)
     |      Return full source of start tag: '<...>'.
     |  
     |  goahead(self, end)
     |      # Internal -- handle data as far as reasonable.  May leave state
     |      # and data to be processed by a subsequent call.  If 'end' is
     |      # true, force handling all data as if followed by EOF marker.
     |  
     |  handle_comment(self, data)
     |      # Overridable -- handle comment
     |  
     |  handle_decl(self, decl)
     |      # Overridable -- handle declaration
     |  
     |  handle_pi(self, data)
     |      # Overridable -- handle processing instruction
     |  
     |  handle_startendtag(self, tag, attrs)
     |      # Overridable -- finish processing of start+end tag: 
     |  
     |  parse_bogus_comment(self, i, report=1)
     |      # Internal -- parse bogus comment, return length or -1 if not terminated
     |      # see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
     |  
     |  parse_endtag(self, i)
     |      # Internal -- parse endtag, return end or -1 if incomplete
     |  
     |  parse_html_declaration(self, i)
     |      # Internal -- parse html declarations, return length or -1 if not terminated
     |      # See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
     |      # See also parse_declaration in _markupbase
     |  
     |  parse_pi(self, i)
     |      # Internal -- parse processing instr, return end or -1 if not terminated
     |  
     |  parse_starttag(self, i)
     |      # Internal -- handle starttag, return end or -1 if not terminated
     |  
     |  reset(self)
     |      Reset this instance.  Loses all unprocessed data.
     |  
     |  set_cdata_mode(self, elem)
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes inherited from HTMLParser.HTMLParser:
     |  
     |  CDATA_CONTENT_ELEMENTS = ('script', 'style')
     |  
     |  entitydefs = None
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from markupbase.ParserBase:
     |  
     |  getpos(self)
     |      Return current line number and offset.
     |  
     |  parse_comment(self, i, report=1)
     |      # Internal -- parse comment, return length or -1 if not terminated
     |  
     |  parse_declaration(self, i)
     |      # Internal -- parse declaration (for use by subclasses).
     |  
     |  parse_marked_section(self, i, report=1)
     |      # Internal -- parse a marked section
     |      # Override this to handle MS-word extension syntax content
     |  
     |  updatepos(self, i, j)
     |      # Internal -- update line number and offset.  This should be
     |      # called for each piece of data exactly once, in order -- in other
     |      # words the concatenation of all the input strings to this
     |      # function should be exactly the entire input.

FUNCTIONS
    dumb_css_parser(data)
        returns a hash of css selectors, each of which contains a hash of
        css attributes

    dumb_property_dict(style)
        returns a hash of css attributes

    element_style(attrs, style_def, parent_style)
        returns a hash of the 'final' style attributes of the element

    escape_md(text)
        Escapes markdown-sensitive characters within other markdown
        constructs.

    escape_md_section(text, snob=False)
        Escapes markdown-sensitive characters across whole document sections.

    google_fixed_width_font(style)
        check if the css of the current element defines a fixed width font

    google_has_height(style)
        check if the style of the element has the 'height' attribute
        explicitly defined

    google_list_style(style)
        finds out whether this is an ordered or unordered list

    google_text_emphasis(style)
        return a list of all emphasis modifiers of the element

    hn(tag)
        ### End Entity Nonsense ###

    html2text(html, baseurl='', bodywidth=78)

    list_numbering_start(attrs)
        extract numbering from list element attributes

    main()

    name2cp(k)

    skipwrap(para)

    unescape(s, unicode_snob=False)

    wrapwrite(text)

DATA
    BODY_WIDTH = 78
    ESCAPE_SNOB = 0
    GOOGLE_LIST_INDENT = 36
    IGNORE_ANCHORS = False
    IGNORE_EMPHASIS = False
    IGNORE_IMAGES = False
    INLINE_LINKS = True
    LINKS_EACH_PARAGRAPH = 0
    SKIP_INTERNAL_LINKS = True
    SPACE_RE = <_sre.SRE_Pattern object>
    UNICODE_SNOB = 0
    __author__ = 'Aaron Swartz (me@aaronsw.com)'
    __contributors__ = ["Martin 'Joey' Schulze", 'Ricardo Reyes', 'Kevin J...
    __copyright__ = '(C) 2004-2008 Aaron Swartz. GNU GPL 3.'
    __version__ = '2014.4.5'
    division = _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192...
    k = 'icirc'
    md_backslash_matcher = <_sre.SRE_Pattern object>
    md_chars_matcher = <_sre.SRE_Pattern object>
    md_chars_matcher_all = <_sre.SRE_Pattern object>
    md_dash_matcher = <_sre.SRE_Pattern object>
    md_dot_matcher = <_sre.SRE_Pattern object>
    md_plus_matcher = <_sre.SRE_Pattern object>
    ordered_list_matcher = <_sre.SRE_Pattern object>
    slash_chars = r'\`*_{}[]()#+-.!'
    unifiable = {'aacute': 'a', 'acirc': 'a', 'aelig': 'ae', 'agrave': 'a'...
    unifiable_n = {160: ' ', 169: '(C)', 183: '*', 224: 'a', 225: 'a', 226...
    unordered_list_matcher = <_sre.SRE_Pattern object>

VERSION
    2014.4.5

AUTHOR
    Aaron Swartz (me@aaronsw.com)


None

JonB

Nov 25, 2014 - 10:30

You must import first, before help

import html2text
help(html2text)

TutorialDoctor

Nov 25, 2014 - 15:51

Right! That was the difference

import workflow
params = workflow.get_parameters()

module = __import__(params['module'])

#print params['module']
print help(module)

I made the imported module variable

Thanks JonB. I will figure out the rest

omz

Nov 25, 2014 - 16:20

Fwiw you can also just pass a string to the help function, and it'll try to find the module automatically, e.g. help('math') should work, even if you haven't imported the math module.

Forum Archive

Undocumented Modules?

The help: