Data Scraping From Website: An introduction to data scraping with Scraperwiki

Last week I spent a day playing with the screen scraping website Scraperwiki with a class of MA Online Journalism students and a local blogger or two, led by Scraperwiki’s own Anna Powell-Smith. I thought I might take the opportunity to try to explain what screen scraping is through the functionality of Scraperwiki, in journalistic terms.

It’s pretty good.
Why screen scraping is useful for journalists

Screen scraping can cover a range of things but for journalists it, initially, boils down to a few things:

    Getting information from somewhere
    Storing it somewhere that you can get to it later
    And in a form that makes it easy (or easier) to analyse and interrogate

So, for instance, you might use a screen scraper to gather information from a local police authority website, and store it in a lovely spreadsheet that you can then sort through, average, total up, filter and so on – when the alternative may have been to print off 80 PDFs and get out the highlighter pens, Post-Its and back-of-a-fag-packet calculations.

But those are just the initial aspects of screen scraping. Screen scraping tools like Scraperwiki or scripts you might write yourself offer further benefits that are also worth outlining:

    Scheduling a scraper to run at regular intervals (Adrian Holovaty compares this to making regular virtual trips to the local police station)
    Re-formatting data to clarify it, filter it, or make it compatible with other sets of data (for example, converting lat-long coordinates to postcodes, or feet to metres)
    Visualising data (for example as a chart, or on a map)
    Combining data from more than one source (for example, scraping a list of company directors and comparing that against a list of donors)

If you can think of any more, let me know.

Source: http://onlinejournalismblog.com/2010/07/07/an-introduction-to-data-scraping-with-scraperwiki/

Data Scraping From Website

Monday, 27 May 2013

An introduction to data scraping with Scraperwiki

No comments:

Post a Comment