Forum Archive

Open page source in Pythonista

Webmaster4o

There was a recent post about opening page source in Textastic. I don't own Textastic personally, so I wrote a script to open page source in Pythonista:

import appex
import urllib2
from objc_util import *
#Helper functions
def openUrl(url):
    '''Allows webbrowser.open()-esque functionality from the app extension'''
    app=UIApplication.sharedApplication()
    app._openURL_(nsurl(url))
def getDocPath():
    '''Gets the path to ~/Documents'''
    split=__file__.split('/')
    path=split[:split.index('Documents')+1]
    return '/'.join(path)+'/'
#Get the url    
url=appex.get_url()
#Read page contents
f=urllib2.urlopen(url)
source=f.read()
f.close()
#Detect the type of page we're viewing
test=source.lower().strip()
if '<html>' in test or test.startswith('<!doctype html>'): #Page is HTML
    extension='.html'
else: #fallback to .txt
    extension='.txt'    
#Where to save the source
filename='source'+extension
filepath=getPath()+filename
#Save the source
with open(filepath,'w') as f:
    f.write(source)
#Close appex window
appex.finish()
#Open in pythonista
openUrl('pythonista://'+filename)

It's under 50 lines so I can justify not putting it in a Gist for the time being :)

omz

Nice! I would suggest a different approach for detecting the content type though:

# ...
#Read page contents
import requests
r = requests.get(url)
source = r.text
ct = r.headers['Content-Type']
# A fancier version could use the mimetypes module to guess the proper file extension...
extension = '.html' if ct.startswith('text/html') else '.txt'
# ...

(I'm sure it's also possible to get the response headers with urllib2, I'm just more familiar with requests.)

brumm

line28 = getDocPath