Big Data: Demonstrating the Value of the UK Web Domain Dataset for Social Science Research

Project Date: 
Feb 2012 - Aug 2013

The Oxford Internet Institute Government on the Web team is excited to announce a new big data project: Big Data: Demonstrating the Value of the UK Web Domain Dataset for Social Science Research.

The potential of web archives for link analysis research has been well documented, but this potential has yet to be realised and demonstrated in good research. This project aims to increase visibility, accessibility, and ease-of-use of the JISC UK Web Domain Dataset, a 30 terabyte web archive of the .uk country-code top level domain (ccTLD) collected from 1996 to 2010. The project will extract link graphs from the data, assess the feasibility and impact of using the .uk ccTLD as a boundary for UK web presence, and conduct and disseminate high-quality social science research examples using the collection. It will also trial tools and procedures to make the data more easily accessible including tools for remote access and assessing the feasibility of developing code to allow the easy import of link data from the collection into NodeXL or other network data analysis software packages to allow for easy access, visualisation, and analysis of subsets of the corpus.

The current and transient nature of the Web means that new information replaces older information constantly without any record of the previous state (or versions) of the same information. While new information is being added, existing information also disappears from the Web, leaving a significant gap in our knowledge of the historical web and potentially in social history and our understanding of change over time. The JISC UK Web Domain Dataset, maintained by the British Library who are partnering with us in this project, contains webpages within the .uk ccTLD from 1996 to 2010. We are excited to explore this dataset from a Big Data prospective and to enhance the collection to allow for easier future use.

Further details of this project is available on the Oxford Internet Institute's website.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.