New Data Catalogue for Datastore

The eagle-eyed amongst you may have spotted a new menu item appearing in the Datastore navigation recently.  The new Data Catalogue page has been set up in response to user requests and contains a link to download the back-end metadata for all packages on the Datastore in CSV format.

The file is updated on a daily basis and should allow developers to programmatically navigate the contents of the site without having to write screen-scraping code.  It should also be of interest to anyone wanting to get a quick feel for the number (currently 444) and range of datasets on the site without having to browse multiple pages.

Data catalogue screenshot

Due to the varied nature of the datasets on the Datastore there are a couple of issues to consider with the way the site and therefore the catalogue are structured. The initial design for the site was based on the assumption that each dataset would be represented by a single page (called a package) with a fixed number of links to the actual data files e.g. one link to the data in csv format, one link to data in xml format etc. 

However it quickly became clear that for some datasets that are updated monthly, like the GLA Claimant Count, creating a new package each time makes it difficult to manage and for the user to view.  As a result there are instances in the catalogue where multiple links to the constituent datasets for a single package are embedded within the HTML content of the Long Description field.  We recognise this isn’t ideal and hope to be able to modify the site structure at some point in the future to accommodate multiple datafiles per package.

Thanks go to a number of colleagues in the GLA’s Information Technology Unit for working on the scripting of the database export and upload to the Datastore web server.

We hope that the data catalogue is of use to people and please let us know if you have any suggestions as to how we can improve it.


Gareth Baker
London Datastore Team