PyIE: Easy Screen-Scraping with Python and IE

I always like it when information I’m interested in just flows over to me.


The first thing I did about this was start sending myself daily reports about my investment-account. I wrote a script that logged into the bank, fetched the info I wanted, and spat it out. A simple batch file was all it took to pipe this into another tool that simply sms’s the data over to me. This essentialy makes any worthy piece of data instantly deliverable to me wherever I am.

This just sound like you’re typical Matrix-hackerish style paragraph, but if you’re a moderate computer programmer, the next few paragraphs will clear it all up ;-)
If you’re not a programmer, The Matrix stuff is comming in masses.


Screen scraping is the act of capturing data from a system or program by capturing and interpreting the contents of some display that is not actually intended for data transport or inspection by programs

source: wikipedia


The first thing you’ll want when you’re scraping for data is an easy way to control and fetch data from, let’s say, a browser.

I picked IE. I know it sucks but it’s easy to automate.

Anyways, any web-page is easily accesible through what’s called the DOM. To find out where in the DOM lyes the piece of data you’re after, use the priceless DOM-Inspector, a click here and a click there and you suddenly know that you’re daily loss/gain is neatly located at :
document.frames.middle.document.frames.IFrames.document.
getElementById(“QTable2″).childNodes.item(0).childNodes.
item(0).childNodes.item(0).childNodes.item(0).childNodes.
item(2).innertext
I would have never guessed this myself.

Now how do we automate all this process ? I mean we need to load up the page, fill in the form with our username/password, click ‘login’, click a few more buttons, fetch data, etc…
This is where Python + win32com + PyIE come to the rescue.

Python is fun and easy for scripting. PyIE is a small component I’ve written that simply loads up an invisible IE page (or visible if you want to debug stuff), and just does whatever you tell it to do. It handles lots of annoying issues like ‘click here, now wait for the stuff to come up’. Most components I tried to use required you to manually timeout the requests and handle things yourself, which is tiresome.

PyIE solves almost all of the tedious work of timing, error-handling and stuff like that.

PyIE provides a single method that safely fetches data or fills forms with your data, and is thus called safely.

Download: PyIE.py
Examples: send_sms.py (send an SMS in IL-Orange), stat_counter.py (fetch info from your account at statcounter).

The two examples are most likely outdated (that’s the bad thing about screen-scraping), but use them to learn how to use PyIE.

Did I mention Python is beautiful ?

3 thoughts on “PyIE: Easy Screen-Scraping with Python and IE

  1. You might also be interested in PAMIE — google python pamie to find it. It is a complete library for manipulating IE.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>