Forum Archive

Is it possible login and scrape from javascript based website.

Raz

I am thinking about buying pythonista, but i wanted to know, if this is possible:
You cant use "requests" and "beautifulsoup" to scrape from javascrape based website. I used selenium to login and scrape from those website.

Is there anyway to login and scrape data from javascript based website using pythonista?

cvp

@Raz said:

Is there anyway to login and scrape data from javascript based website using pythonista?

There are a lot of ways and examples in this forum...Pythonista is the best buy on iOS.

Edit: or I don't understand the question (as too often 😢)

JonB

you can use webview to execute javascript on a page.

cvp

Or @mikael's WKWebView

Raz

@cvp said:

@Raz said:

Is there anyway to login and scrape data from javascript based website using pythonista?

There are a lot of ways and examples in this forum...Pythonista is the best buy on iOS.

Edit: or I don't understand the question (as too often 😢)

Thanks for reply. and i know Pythonista def worth it, but i just wanted to know if it can do what i wanted to do.
I have searched before posting, and i think its kinda possible, but i want to confirm.

Raz

@JonB said:

you can use webview to execute javascript on a page.

@cvp said:

Or @mikael's WKWebView

Thanks, i will check it out, is there any example code?

mikael

@Raz, I found this old project of mine that is intended to help in scraping.

You define a handler for urls like this:

class DemoScraper(WebScraper):

def __init__(self, webview):
  super().__init__(webview)
  self.url_map = {
    self.login_page: 'https://some.url',
  }
  self.handler = self.login_page

Hit the first page with something like this:

wv = ui.WebView()
ds = DemoScraper(wv)
wv.load_html('https://some.url')
wv.present('fullscreen')

Then in the handler functions you can access and manipulate js with these chainable helpers:

def login_page(self):

  assert self.by_id('test').to_string() == '[object HTMLDivElement]'

  assert self.xpath('head/title').to_string() == '[object HTMLTitleElement]'

  assert self.xpath('head/title').value() == 'Test document'

  assert self.value('head/title') == 'Test document'

  assert self.xpath('*[@class="test_class"]').to_string() == '[object HTMLDivElement]'

  test_div = self.by_id('test')

  assert test_div.style('top') == 100.0

  assert test_div.style('backgroundColor') == 'inherit'

  assert test_div.abs_style('backgroundColor') == 'rgb(0, 0, 255)'

  test_div.set_style('left', 5)

  assert test_div.abs_style('left') == 5.0

  cell_values = self.for_each('table//tr').map(
    key='td[1]',
    some_value='td[3]'
  )
  assert cell_values == {'A1': {'some_value': 'A3'}, 'B1': {'some_value': 'B3'}}

  names = self.list_each('input/@name')
  assert names == [ 'username', 'passwd' ]

  self.set_field('username', 'your username')
  self.set_field('passwd', 'your password')

  # Explicitly set the handler for the
  # next page
  self.handler = self.other_page
  self.by_name('form1').submit()
cvp

@Raz very short code I use to login to a particular site

import ui
from   wkwebview import WKWebView

class MyWKWebViewDelegate:
    def webview_should_start_load(self, webview, url, nav_type):
        print('Will start loading', url)
        return True

    @ui.in_background
    def webview_did_finish_load(self, webview):
        title = str(webview.eval_js('document.title'))
        print('Finished loading ' + title)
        if 'login' in title.lower():
            #print(w.eval_js('document.documentElement.innerHTML'))
            webview.eval_js('document.getElementsByName("Callsign")[0].value="myuser";')
            webview.eval_js('document.getElementsByName("EnteredPassword")[0].value="mypassword";')
            webview.eval_js('document.getElementById("CFForm_1").submit();')

web = WKWebView(delegate=MyWKWebViewDelegate())
web.frame = (0,0,400,400)
web.present('sheet')
web.load_url('https://www.eqsl.cc/QSLCard/login.cfm')