Tube Feed Update

1ST JULY 2010

Owing to overwhelming demand by apps that use the service, the London Underground feed has had to be temporarily suspended. We hope to restore the service as soon as possible but this may take some days. We will keep everyone informed of progress towards a resolution.

Thanks for the update. I can't work out whether this news feels like you'd consider the venture a success or failure!

The popularity proves up-to-the-second data is something developers really appreciate, but maybe is itself causing a lot more demand through the expectation that apps using it will themselves provide that "very latest" info. I'm sure most devs are responsible and cache & manually countdown data for a while between calls but I've not seen any guidelines to request that this is done.

I assume you are "beefing up" capacity, but am curious to know - has the problem been with number of individual calls or overall volume of data sent? Presumably the number of distinct users isn't huge so improving the usage patterns of the biggest 'callers' could possibly help.

Since the biggest hit would come from those running servers for apps who wanting the whole system picture several times a minute, maybe the pressure could be eased by having data for the whole network generated & published with a timestamp every 15 seconds (rather than each call for each line polling the database).

One feed could provide an entire picture broken down by line and lead car ID with destination & location text once and just "seconds to" for each station (rather than all that repeated location text info per station which you get if you poll an entire line, since we can easily  ascertain whether there's a train in each station from the "seconds to" field).

A second feed could provide just "what's changed in the last 15 seconds" for the whole network with any updated LCIDs & locations, so as long as each call is returned successfully you hugely reduce the data being requested from those who constantly want updates.

It certainly looks like the "list by train LCID" approach would reduce the amount of data compared to the status of every station on each line. Serving it as JSON could be a good idea too - although maybe not everyone's favourite!

Finally, is there a place to post feedback or error reports? Some trains occasionally sail through stations unnoticed by the system (eg. a fair number of Edgware Road to Wimbledon services).

Hope to see it back soon - thanks again

 

Drew

Thu, 07/01/2010 13:41

Comment submitted by rew (not verified

I second the idea of having a call that returns the status of all the trains for feed's 4 and 5, just without the time to destinations. 

I am trying to build a live tube map and therefore don't need the time to stations, just the state of all the trains in the network as I am plotting based upon TrackCode and Location.

And it would make sense to cache that data for 15-30 seconds in my opinion.

Fri, 07/02/2010 10:45

Comment submitted by Daniel Bartlett

I'll third Drew's request for the list by LCIDs.  I've held off from obtaining an up-to-date list of trains because of the effort of sampling the data regularly.  The count-downs per platform are extremely wasteful and not useful really the information we want.

 

 

Is there some reason you can only expose Train Lists, and not TrackerNet itself?

 

Fri, 07/02/2010 13:57

Comment submitted by ark (not verified

A lot of people *are* interested in time to station however, so perhaps splitting that feed by providing a less detailed variant is the way to go.
Anyway I look forward to the data coming back soon !

 

Fri, 07/02/2010 14:51

Comment submitted by ragomaskhalos (not verified

I'm really not surprised if it got overloaded. It seemed to be designed to handle personalised requests eg. from someone occasionally wanting info about a single station and needing all the textual descriptions each time.

App developers are bound to be the "biggest callers" and constantly need the whole network picture but without all the duplication, so how could we help here?

Firstly I presume SecondsTo info is summed from a lookup table of (average) time in seconds between each pair of trackcode positions. If that's the case then maybe we don't even need a SecondsTo returned for each & every station, every single time we ask.

That along with destination code (tied directly to destination text) and the location text (tied to trackcode) for each & every station was adding a mass of unnecessary returned data and all of that could also be read once and kept locally by developers.

 

By my reckoning, we really need:

1) A feed showing all station codes with their text names and track codes (which we call once)

2) A feed showing each track code with a text description for that location and a breakdown of destination codes possible from that point, with a list of times in seconds to each station(code) on that route (which,again, we call once). Incidentally I'm assuming we couldn't manage with *just * time between raw pairs of trackcodes since we' have to manually map out routes to each destination, hence suggesting the feed should at least provide summed timings to each station per route (to destination code).

 

Then the regularly published (every 15 seconds?) pages we call once we have that initial data:

3) A feed of all active trains for each line with LCID#, set #, track code, destination code (which we use periodically - already much less data to consume)

and..

4) A timestamped update feed of all changes since the previous feed of this type (15 secs ago) - moved trains, new LCIDs, removed LCIDs with *just* the data that has changed (track code, destination code, set#) - this would cut down hugely on data transfer for those users requiring complete network info constantly. If the timestamp returned was not the one expected then calling the "full feed" would get things.. back on track (sorry!)

Have I missed anything? The main purpose is really to find a way to reduce the repetition and per-line calls so this is just one paradigm suggestion.

Fri, 07/02/2010 18:51

Comment submitted by rew (not verified

I've not read anyone suggest this yet, but if I've missed it up there, my apologies.

As to "Time to Station", certainly trains are running more often than they are not, as stops in stations are brief, but one way to reduce the dataload would be to stop sending the value for the duration of station stops.  As there will be many trains stopping simultaneously this reduction in dataload could be noteworthy.  The client can then assume that if a value is not returned, it has not changed.  Once the train starts again, the value would be refreshed.

Also, certainly "Time to Station" would follow definite patterns and could be built into the client instead of forming part of the dataload.  It would mean less accurate reporting, but if the effect is a working Tube map against a non-working Tube map, I'd be inclined to accept it.  Also, how accurate is the reported Time to Station anyway?  If it's not, is it worth reporting beyond an "indicative" time, easy to build into the client.

Breaking it down to the essentials, really only "train ID" and location would be needed.  The remainder can be built in to the client.  Train ID could be a 2 byte integer and location the smallest useable geolocator.  Same principle as above but train location would not be uploaded during stops.

Fri, 07/02/2010 19:31

Comment submitted by ndrew (not verified

Sorry, one more suggestion.

Channel 4's 4 On Demand used to, or still does, use the user's broadband to transmit packets.  A requirement for use of the site could be that I agree to transmit to other users like in a bittorrent for as long as I'm on the site.  I would agree to that as long as it's a small amount of bandwidth.  But thousands of users could mean higher speeds.

Fri, 07/02/2010 19:34

Comment submitted by ndrew (not verified

Probably no need to panic too much regarding the massive number of hits requiring the service suspension. Stephen Fry tweeted about Matthew's excellent visualation and considering that Stephen Fry has c1.6m followers, it was probably what caused the traffic spike.

Roy

Sat, 07/03/2010 12:13

Comment submitted by oy Richards (not verified

OK, there's a bit of confusion in some of these comments.  Let's try to work out what we want and keep a clear message.

 

Drew: the feed is based on the output of a TFL feed.  Using timestamps would mean london.gov.uk storing the data it last sent you (or every update), so it could determine which data to send.  This means extra work server-side, and removes the simplicity (which is the power) of a RESTful API.  It would only be useful if there is a significant trade-off between bandwidth and server load.

Andrew: hiding the stationary trains would be a very small saving for a significant increase in complexity. For instance, how do you tell whether a unit has been taken out of service, or is simply at a platform?  Far more significant savings (most of the bandwidth and probably half the server load) would be provided by Drew's suggestions 1 2 and 3.

Incidentally, the 4od and iPlayer apps used Kontiki for peer-to-peer sharing.  This worked because the data is large, static and consumed by a large number of users who are happy to install a client.  The APIs here are not.

Roy Richards: Matthew's server can cope with the load (you'll notice it's on :81, because he moved it when it started becoming popular).  He downloads the TFL summary data once a minute (despite the feed updating every 2 seconds), and fills in the gaps using dead-reckoning.  He is not likely to be causing a significant load on the servers.

 

I still agree the best solution would be to send the raw data per train, and document recommended refresh intervals and information on the track codes.  However, I fear the "obvious" solution will be to throw caches and proxies at it.  Changing the data involves more people than adding servers, and it's great that we have any data at all.

 

Sat, 07/03/2010 18:04

Comment submitted by ark (not verified

You virtually can't scale this kind of frequently often updated data if each of the endpoint have thousands of susbcribers. Twitter tries and keeps failing.

However, there is an easy solution : pushing your content to these susbcribers, rather than having them poll you.

A good technology for this : PubSubHubbub.

Let's do this :

- each train is an RSS/Atom feed. New events like "train 1234 got in Paddington Station" are the feed entries

- eacg station is an RSS/Atom feed as well. The entries are the same.

Now, whenever a train gets into a station, you ping a PubSubHubbub hub. This hub will then fat ping all the subscribers with the information. I'm not so familiar with the tube's traffic, but this shouldn't be more than a few pings per sec... which is very very little!

Superfeedr (full disclosure: my company) already pushes more than 20M of atom entries per day for the hubs we hosts. We could certainly addup the ones for the London trains!

Sat, 07/03/2010 20:41

Comment submitted by Julien

That's a good summary (Mark? I'm showing as "rew" so I assume "ark" is "Mark"!) - and I agree it's great that we have any data at all.

Important to note that while Matthew is being careful with his volume of calls, others may not be and there was nothing to advise them of a good calling policy.

For clarification - I wasn't suggesting the service should remember calls from specific users - quite the contrary since I don't think there should be a need for any server side processing per user call. As far as I can see the pages don't even need to be dynamic, just published frequently - say every 15 seconds. That should surely prevent any burden as it grows in popularity.

Those who want "all the data all the time" don't need per station or per line data - just one URL for "full network data" and another for rolling changes every 15 seconds.This was where timestamp came in - after getting the initial full picture you could just retrieve the much smaller "what's changed" updates and know if you've missed one, just by the timestamp - if so, then get the full data again.

I hope that's clearer.

Sat, 07/03/2010 20:51

Comment submitted by rew (not verified

Julien: PubSubHubbub is a great model for this kind of data, but I don't think it helps with performance.  Trains are moving all the time, so each train you're following would ping every 10-15 seconds, and you'd still have a large pull request to get the data.  A request for every train on the network would mean sending almost as much data to the server as you get back.  Or is there some way to send the data with the pings?

Drew: Ahh sorry, I completely misunderstood the timestamp as something sent back to the server.  I guess an incrementing sequence number would also do the job?

 

Mark

Sun, 07/04/2010 13:20

Comment submitted by ark (not verified

Keep It Simple:

1. Cached data for low-usage sites such as a single board at a single station.

2. One-time state download plus incremental updates for high-usage sites.

The Trackernet UI runs on numerous machines within TfL's network, and I don't believe it needs anything more complicated than a relatively simple algorithm.

Mon, 07/05/2010 09:46

Comment submitted by Peter Hicks

Mark, the beautify of PubSubHubbub is that it does fat-pings between the hub and subscribers.

For each ping you do, you would get at most 1 single pull (from the hub, which shouldn't trust the subscriber, and it's at most, because if nobody subscribed to the feed for which you ping, the hub should ignore this ping).

Then, none of the subscribers should poll you :)

You're right, it will not tehcnically increase the thruput, but it will drastically decrease the amount of requested content!

Please get in touch!

 

 

 

Mon, 07/05/2010 14:42

Comment submitted by Julien

Julien: then we'll need some people to commit to providing hubs, and get that coordinated with data.london.gov.uk. I don't think you can rely on users requesting only a few trains each - that's what TfL have found. All the excited programmers want to get back data on every train, and they're all moving at the same time.

(I believe the pshb pulls are actually individual requests for each feed, only in one HTTP session).

Cheers,
Mark

Mon, 07/05/2010 19:54

Comment submitted by ark (not verified

Not surprised. This is what I expected from TFL. just like endless signal failures and chaotic underground service.

But the question is where is £9.2bn spent ? there are countries with such budget.

shame on you.

 

Mon, 07/05/2010 21:09

Comment submitted by nonymous (not verified

If you had put your name to it, that could have been a bold statement, 'nonymous'!

I'd suggest the realtime data provision is a nicely transparent move and demonstrated to me very clearly how much of the service was running well at any time. There are so many ways it can help to counter the "chaos" which, after all, has a lot to do with sheer numbers of people using the trains.

Yes a solution for all the trains all the time is what we're after. I'll leave Mark & Julien to debate fat pings & the like, but it would seem nice simple published pages of all data as it changes would do the job. Mark a sequence number sounds like a good idea too!

Any news on when access might return?

Tue, 07/06/2010 13:51

Comment submitted by Drew (not verified

An update would be nice.

Wed, 07/07/2010 09:31

Comment submitted by Anonymous (not verified

What's happening TFL?  The service has been down for a week now - could we get an update?

 

Thanks

 

mz

Thu, 07/08/2010 10:37

Comment submitted by ax Zorin (not verified

Over the two week mark of the service being down, any updates at all?

Wed, 07/14/2010 13:28

Comment submitted by Daniel Bartlett

Hi - it seems these comments are being moderated so perhaps somebody reading this could comment on the status of this API.  We're only asking because we like it so much......

Tue, 07/20/2010 13:33

Comment submitted by Anonymous (not verified

Comment submitted by nonymous (not verified

Any news? I really love the live map!

Sun, 08/08/2010 09:52

Comment submitted by nonymous (not verified

Update please?

Mon, 08/23/2010 14:58

Comment submitted by ax Zorin (not verified

Is this service ever going to be turned on again?

I can understand turning off the service for whatever reasons you have - but not updating us on what is happening, that is not acceptable.

Wed, 08/25/2010 09:52

Comment submitted by User (not verified

Lending my voice for better communication around what is being planned for this beta API. 

There is significant work involved in implementing the API and TFL displays no respect for the collaborative nature of achieving better services for all by communicating nothing on its future. 

TFL communication is seriously overdue on the APIs future availability, what changes we should expect, and the roadmap to go from beta to full production.

Give me an appropriate level of information,

Ben Stewart

Caution Your Blast Ltd.

 

Thu, 08/26/2010 09:53

Comment submitted by Ben Stewart

I'm keen to hear what is happening with the live feed. also can you change this  display so that it shows the most recent posting first?  We are watching this with interest from Australia. 

Wed, 09/01/2010 03:27

Comment submitted by Anonymous

Why are you being so rude and ignoring these requests for more information on what's going on?  Any news is better than no news.  It really is very poor, rude and childish to behave like this.  The comments are (occasionally) moderated so somebody must be reading this.

Fri, 09/03/2010 23:12

Comment submitted by nonymous (not verified

If you had put your name to it, that could have been a bold statement, 'nonymous'!

I'd suggest the realtime data provision is a nicely transparent move and demonstrated to me very clearly how much of the service was running well at any time. There are so many ways it can help to counter the "chaos" which, after all, has a lot to do with sheer numbers of people using the trains.

SEO Backlinks

Yes a solution for all the trains all the time is what we're after. I'll leave Mark & Julien to debate fat pings & the like, but it would seem nice simple published pages of all data as it changes would do the job. Mark a sequence number sounds like a good idea too!

Mon, 09/06/2010 16:49

Comment submitted by nne (not verified

Err, good job deleting the valid comments along with the spam there.  An update, any update, would be greatly appreciated.

Mon, 09/06/2010 18:14

Comment submitted by ark (not verified

So in the middle of a tube strike this would be interesting.  You could see at a glance how effective or otherwise the strike was, the walk or ride decision would be easy.   Would make a great TV backdrop.  Or would it just be supplying "the enemy" with data.

Mon, 09/06/2010 20:43

Comment submitted by nonymous (not verified

Finally - some news!!!! http://data.london.gov.uk/blog/update-trackernet-feed

 

Tue, 09/07/2010 21:50

Comment submitted by User (not verified

Mark - apologies for any accidental deleting of comments. We seem to be getting a lot of spam lately and it can be quite hard to see the wood for the trees.  Hopefully people have seen the blog with the latest update on Trackernet.

Wed, 09/08/2010 09:08

Comment submitted by areth Bake

Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.

Fri, 10/08/2010 17:19

Comment submitted by neil

Is the API available again?

Sun, 10/31/2010 02:32

Comment submitted by Mamun Ahmed

Is the API available again?

Thanks

Mamun Ahmed

Sun, 10/31/2010 02:34

Comment submitted by Mamun Ahmed

We realise that our last deadline passed but we will be in a position to give you dates for Trackernet return fairly soon. Main concern for TfL was making sure that the solution they have is robust so that when they open it up again it does not fall over like it did last time. So bear with us it will be back soon (and we might have other good news to announce as well). If you want to keep up you might think of following on @londondatastore in case we are not checking these comments often enough to keep up you can remind us through Twitter which is constantly checked

Thu, 11/04/2010 16:02

Comment submitted by Emer Coleman

I hope the Api is working for me, Thanks.

Mon, 05/02/2011 21:20

Comment submitted by Bestimweb24 Blog