Wednesday, 10 April 2013

How I use Outwit Hub for the TILLIN One Name Study

Following my post on how I conduct my One Name Study I received a tweet from @genejean aka Pam Smith asking how I used Outwit Hub and following a couple more tweets this post was born. (Apologies for the poor quality photos but I'm not a great blogger expert yet!)

If I want to scrape some data I go to Outwit and open the page from within Outwit itself. I like to think of Outwit as a geeky internet browser which allows you to see what's behind the page as well as what you can normally see. For this example I'm going to search for all instances WILLIAM TILLIN in the 1841 England census to keep it small - 3 records.



Normally if I wanted to collect the information for each of these records I would go into each individual record as below and type the information from the screen into my database - or at best copy and paste the information using something like a Firefox addin.



But Outwit works slightly differently.

This is the page of data that I want to "scrape" and this is the page that my scraper will go to and collect the data. I've written the scraper myself based on the html behind this page but I won't go into too much detail here about how I did that as it will make this a really long post. I can do more of an explanation in another post if that would help anybody.

So using the back button in the top left hand corner (just like other web browsers) I get back to my original search results. Then I click on the links button (circled in red below) and this gives me a completely different view of the page.


Then I select the rows labelled View Record. These are going to act as an address for the scraper so it knows where to go and find the data. Once it gets there it will apply the rules within the scraper to the data it finds and then return that data to the data catch area.


So I highlight the rows and tell it to go and explore using my 1841 scraper (making sure I don't overutilize the resources of the site by exploring too many records too quickly) and this is the data it brings back in about 10-15 seconds
.

You can see the detailed information at the bottom based on the 3 records from the original search. The data can then be exported as a csv or excel file and added to the database.

I hope this makes sense - I've found it exceptionally useful for census data as well as civil registration data. I would not have all the data I have if I'd had to collate the information in the traditional way.

I'd be happy to provide some more examples or go into more detail - so please leave any questions in the comments below or ask me on twitter - you can find me @Wibblingjo - or on my google+ page +Wibbling Jo Genealogy



3 comments:

  1. This looks enormously useful. Thanks!

    ReplyDelete
  2. Indeed, outwit hub is a very handy tool. My stumbling block using it is importing the data into my database. Can you share how you do this?

    ReplyDelete
  3. I forget to click the notify me on my last post

    ReplyDelete