City DataStore: Data Sharing for Impact from London’s Data Ecosystem

London has taken another step towards creating a more frictionless data sharing and governance environment this week with the launch of the City DataStore. The simplest way to describe this is as “a private twin” to the London DataStore – an environment in which new forms of data from a wider variety of sources, can be shared securely, in a way that protects privacy, and that allows for more rigorous analysis and stronger value creation.

Secure Data Sharing in an Open World

The addition of a private twin to the open data store is not a change in our philosophy on data. If we cannot always make data open for valid legal, ethical and commercial reasons, then we should always still aim to allow for its sharing. We will always be open in that we promote open and common standards, open APIs, and open learning exercises. As an example, we have made the algorithmic code used in our recent London Office of Data Analytics (LODA) pilot available for further development by other parties, and have generally blogged and communicated honestly about the cultural and technical improvements that need to support the application of data science in our city. We think this is critical to being successful in this new field.

In this wider context, we see the City DataStore as a highly practical tool, supporting London’s data ecosystem and public services, encouraging data-led creativity, and promoting wider economic competitiveness.

Time to get Technical

Development was carried out by staff at the big data specialists Mastodon C, and delivered as part of their wider Witan platform. As has been our approach for a while now, it was funded by the EU Sharing Cities Horizon 2020 programme (Sharing Cities) and is contributing to work also funded by Innovate UK.

The City DataStore is based on an event-driven architecture with a central log of all transactions from the users and the system itself. This has a number of benefits:

• building up an understanding of how the City DataStore is being used, initially informing the roll-out of new functions, but in the longer term driving the wider adoption of machine learning in tackling public services and urban challenges.

• allowing the creation of new data (through simple operations such as sub-setting, aggregating or linking), greatly extending the potential uses beyond access to the raw data.

• providing a robust infrastructure supporting version control, roll back and the ability to recreate files that become corrupted).

The City DataStore has been built with an open API from the start and although it does support manual uploading of files through a user interface, it will also allow data to be shared directly between systems (for instance, to support regular processes such as monthly updates).

More features will come over the next 12 months, including the ability to recognise and verify data schemas which will allow London to build up ‘small standards’ for sharing specific datasets (e.g. between the London Boroughs). We see this ‘use case by use case’ development of standards as a complement to more traditional, top-down and large-scale approaches to standards development. This way we achieve the interoperability of data that is so vital to our mission to create datasets that – like the shared challenges that public services In London face – do not stop at organisational boundaries.

Build it and they will come?

Initially, the City DataStore will not be lying idle. The range of initial activity looks like this:

• SafeStats – By bringing together and analysing data from a range of sources including the different Emergency Services, transport and policing, SafeStats provides the most comprehensive and current picture of crime and safety in London today. The City DataStore will provide a user-friendly means for partners to supply monthly updates feeding directly into the GLA analysis process through the API.

• LODA – As discussed on other London DataStore blogs, the establishment of the London Office for Data Analytics is an opportunity to draw together projects, ideas, initiatives, expertise and resources from across the public sector in London to answer the most important questions our city faces. The City DataStore will provide practical support to LODA projects by enabling the secure sharing of files.

• Air Quality modelling – the GLA is collaborating with the Alan Turing Institute (the national institute for data science) to develop new machine learning algorithms and data science platforms to better understand and improve air quality in London. Processing and modelling resources are provided through the ATI, however, the City DataStore will be useful in giving researchers access to GLA data about London and in operationalising the models at the end of the project. This way, we hope to create a new set of policies and interventions that are more sensitive to time and local areas.

Clearly – and to return to the spirit of openness and organising for impact – we want the City DataStore to be a London-wide resource. So now that it has been built, if you are interested in finding out more about the City DataStore or would like to discuss how it could be used for your project please contact paul.hodgson@london.gov.uk. We will see what we can do to oblige. And if your project idea requires new forms of data – from that held by central government, to private sector organisations, to household IoT data), then we are more than pleased to start the discussion with other parties about how it should and could be shared.

London DataStore will continue to be the place to search for open data about London. In parallel with the City DataStore, we will continue to invest in adding both new open datasets and new features. The London DataStore will also play an important role to support LODA through the sharing of resources including the pipeline of LODA projects, code & models and information about a forthcoming Data Academy, aimed at equipping public servants with the relevant skills to engage with the City Datastore and the new disciplines we want it to attract.

We hope you are as excited as we are by this development. As our engagement of London’s data talent strengthens, we prefer to see this not as a narrow technical development, but a serious addition to our efforts to make London’s data (ecosystem) low on friction and high on impact.