Tube Feed Update
1ST JULY 2010
Owing to overwhelming demand by apps that use the service, the London Underground feed has had to be temporarily suspended. We hope to restore the service as soon as possible but this may take some days. We will keep everyone informed of progress towards a resolution.
Thanks for the update. I can't work out whether this news feels like you'd consider the venture a success or failure!
The popularity proves up-to-the-second data is something developers really appreciate, but maybe is itself causing a lot more demand through the expectation that apps using it will themselves provide that "very latest" info. I'm sure most devs are responsible and cache & manually countdown data for a while between calls but I've not seen any guidelines to request that this is done.
I assume you are "beefing up" capacity, but am curious to know - has the problem been with number of individual calls or overall volume of data sent? Presumably the number of distinct users isn't huge so improving the usage patterns of the biggest 'callers' could possibly help.
Since the biggest hit would come from those running servers for apps who wanting the whole system picture several times a minute, maybe the pressure could be eased by having data for the whole network generated & published with a timestamp every 15 seconds (rather than each call for each line polling the database).
One feed could provide an entire picture broken down by line and lead car ID with destination & location text once and just "seconds to" for each station (rather than all that repeated location text info per station which you get if you poll an entire line, since we can easily ascertain whether there's a train in each station from the "seconds to" field).
A second feed could provide just "what's changed in the last 15 seconds" for the whole network with any updated LCIDs & locations, so as long as each call is returned successfully you hugely reduce the data being requested from those who constantly want updates.
It certainly looks like the "list by train LCID" approach would reduce the amount of data compared to the status of every station on each line. Serving it as JSON could be a good idea too - although maybe not everyone's favourite!
Finally, is there a place to post feedback or error reports? Some trains occasionally sail through stations unnoticed by the system (eg. a fair number of Edgware Road to Wimbledon services).
Hope to see it back soon - thanks again
Drew
Report abuse
Thu, 07/01/2010 12:41
Comment submitted by rew (not verified
I second the idea of having a call that returns the status of all the trains for feed's 4 and 5, just without the time to destinations.
I am trying to build a live tube map and therefore don't need the time to stations, just the state of all the trains in the network as I am plotting based upon TrackCode and Location.
And it would make sense to cache that data for 15-30 seconds in my opinion.
Report abuse
Fri, 07/02/2010 09:45
Comment submitted by Daniel Bartlett
I'll third Drew's request for the list by LCIDs. I've held off from obtaining an up-to-date list of trains because of the effort of sampling the data regularly. The count-downs per platform are extremely wasteful and not useful really the information we want.
Is there some reason you can only expose Train Lists, and not TrackerNet itself?
Report abuse
Fri, 07/02/2010 12:57
Comment submitted by ark (not verified
A lot of people *are* interested in time to station however, so perhaps splitting that feed by providing a less detailed variant is the way to go.
Anyway I look forward to the data coming back soon !
Report abuse
Fri, 07/02/2010 13:51
Comment submitted by ragomaskhalos (not verified
I'm really not surprised if it got overloaded. It seemed to be designed to handle personalised requests eg. from someone occasionally wanting info about a single station and needing all the textual descriptions each time.
App developers are bound to be the "biggest callers" and constantly need the whole network picture but without all the duplication, so how could we help here?
Firstly I presume SecondsTo info is summed from a lookup table of (average) time in seconds between each pair of trackcode positions. If that's the case then maybe we don't even need a SecondsTo returned for each & every station, every single time we ask.
That along with destination code (tied directly to destination text) and the location text (tied to trackcode) for each & every station was adding a mass of unnecessary returned data and all of that could also be read once and kept locally by developers.
By my reckoning, we really need:
1) A feed showing all station codes with their text names and track codes (which we call once)
2) A feed showing each track code with a text description for that location and a breakdown of destination codes possible from that point, with a list of times in seconds to each station(code) on that route (which,again, we call once). Incidentally I'm assuming we couldn't manage with *just * time between raw pairs of trackcodes since we' have to manually map out routes to each destination, hence suggesting the feed should at least provide summed timings to each station per route (to destination code).
Then the regularly published (every 15 seconds?) pages we call once we have that initial data:
3) A feed of all active trains for each line with LCID#, set #, track code, destination code (which we use periodically - already much less data to consume)
and..
4) A timestamped update feed of all changes since the previous feed of this type (15 secs ago) - moved trains, new LCIDs, removed LCIDs with *just* the data that has changed (track code, destination code, set#) - this would cut down hugely on data transfer for those users requiring complete network info constantly. If the timestamp returned was not the one expected then calling the "full feed" would get things.. back on track (sorry!)
Have I missed anything? The main purpose is really to find a way to reduce the repetition and per-line calls so this is just one paradigm suggestion.
Report abuse
Fri, 07/02/2010 17:51
Comment submitted by rew (not verified
I've not read anyone suggest this yet, but if I've missed it up there, my apologies.
As to "Time to Station", certainly trains are running more often than they are not, as stops in stations are brief, but one way to reduce the dataload would be to stop sending the value for the duration of station stops. As there will be many trains stopping simultaneously this reduction in dataload could be noteworthy. The client can then assume that if a value is not returned, it has not changed. Once the train starts again, the value would be refreshed.
Also, certainly "Time to Station" would follow definite patterns and could be built into the client instead of forming part of the dataload. It would mean less accurate reporting, but if the effect is a working Tube map against a non-working Tube map, I'd be inclined to accept it. Also, how accurate is the reported Time to Station anyway? If it's not, is it worth reporting beyond an "indicative" time, easy to build into the client.
Breaking it down to the essentials, really only "train ID" and location would be needed. The remainder can be built in to the client. Train ID could be a 2 byte integer and location the smallest useable geolocator. Same principle as above but train location would not be uploaded during stops.
Report abuse
Fri, 07/02/2010 18:31
Comment submitted by ndrew (not verified
Sorry, one more suggestion.
Channel 4's 4 On Demand used to, or still does, use the user's broadband to transmit packets. A requirement for use of the site could be that I agree to transmit to other users like in a bittorrent for as long as I'm on the site. I would agree to that as long as it's a small amount of bandwidth. But thousands of users could mean higher speeds.
Report abuse
Fri, 07/02/2010 18:34
Comment submitted by ndrew (not verified
Probably no need to panic too much regarding the massive number of hits requiring the service suspension. Stephen Fry tweeted about Matthew's excellent visualation and considering that Stephen Fry has c1.6m followers, it was probably what caused the traffic spike.
Roy
Report abuse
Sat, 07/03/2010 11:13
Comment submitted by oy Richards (not verified
OK, there's a bit of confusion in some of these comments. Let's try to work out what we want and keep a clear message.
Drew: the feed is based on the output of a TFL feed. Using timestamps would mean london.gov.uk storing the data it last sent you (or every update), so it could determine which data to send. This means extra work server-side, and removes the simplicity (which is the power) of a RESTful API. It would only be useful if there is a significant trade-off between bandwidth and server load.
Andrew: hiding the stationary trains would be a very small saving for a significant increase in complexity. For instance, how do you tell whether a unit has been taken out of service, or is simply at a platform? Far more significant savings (most of the bandwidth and probably half the server load) would be provided by Drew's suggestions 1 2 and 3.
Incidentally, the 4od and iPlayer apps used Kontiki for peer-to-peer sharing. This worked because the data is large, static and consumed by a large number of users who are happy to install a client. The APIs here are not.
Roy Richards: Matthew's server can cope with the load (you'll notice it's on :81, because he moved it when it started becoming popular). He downloads the TFL summary data once a minute (despite the feed updating every 2 seconds), and fills in the gaps using dead-reckoning. He is not likely to be causing a significant load on the servers.
I still agree the best solution would be to send the raw data per train, and document recommended refresh intervals and information on the track codes. However, I fear the "obvious" solution will be to throw caches and proxies at it. Changing the data involves more people than adding servers, and it's great that we have any data at all.
Report abuse
Sat, 07/03/2010 17:04
Comment submitted by ark (not verified
You virtually can't scale this kind of frequently often updated data if each of the endpoint have thousands of susbcribers. Twitter tries and keeps failing.
However, there is an easy solution : pushing your content to these susbcribers, rather than having them poll you.
A good technology for this : PubSubHubbub.
Let's do this :
- each train is an RSS/Atom feed. New events like "train 1234 got in Paddington Station" are the feed entries
- eacg station is an RSS/Atom feed as well. The entries are the same.
Now, whenever a train gets into a station, you ping a PubSubHubbub hub. This hub will then fat ping all the subscribers with the information. I'm not so familiar with the tube's traffic, but this shouldn't be more than a few pings per sec... which is very very little!
Superfeedr (full disclosure: my company) already pushes more than 20M of atom entries per day for the hubs we hosts. We could certainly addup the ones for the London trains!
Report abuse
Sat, 07/03/2010 19:41
Comment submitted by Julien
That's a good summary (Mark? I'm showing as "rew" so I assume "ark" is "Mark"!) - and I agree it's great that we have any data at all.
Important to note that while Matthew is being careful with his volume of calls, others may not be and there was nothing to advise them of a good calling policy.
For clarification - I wasn't suggesting the service should remember calls from specific users - quite the contrary since I don't think there should be a need for any server side processing per user call. As far as I can see the pages don't even need to be dynamic, just published frequently - say every 15 seconds. That should surely prevent any burden as it grows in popularity.
Those who want "all the data all the time" don't need per station or per line data - just one URL for "full network data" and another for rolling changes every 15 seconds.This was where timestamp came in - after getting the initial full picture you could just retrieve the much smaller "what's changed" updates and know if you've missed one, just by the timestamp - if so, then get the full data again.
I hope that's clearer.
Report abuse
Sat, 07/03/2010 19:51
Comment submitted by rew (not verified
Julien: PubSubHubbub is a great model for this kind of data, but I don't think it helps with performance. Trains are moving all the time, so each train you're following would ping every 10-15 seconds, and you'd still have a large pull request to get the data. A request for every train on the network would mean sending almost as much data to the server as you get back. Or is there some way to send the data with the pings?
Drew: Ahh sorry, I completely misunderstood the timestamp as something sent back to the server. I guess an incrementing sequence number would also do the job?
Mark
Report abuse
Sun, 07/04/2010 12:20
Comment submitted by ark (not verified
Keep It Simple:
1. Cached data for low-usage sites such as a single board at a single station.
2. One-time state download plus incremental updates for high-usage sites.
The Trackernet UI runs on numerous machines within TfL's network, and I don't believe it needs anything more complicated than a relatively simple algorithm.
Report abuse
Mon, 07/05/2010 08:46
Comment submitted by Peter Hicks
Mark, the beautify of PubSubHubbub is that it does fat-pings between the hub and subscribers.
For each ping you do, you would get at most 1 single pull (from the hub, which shouldn't trust the subscriber, and it's at most, because if nobody subscribed to the feed for which you ping, the hub should ignore this ping).
Then, none of the subscribers should poll you :)
You're right, it will not tehcnically increase the thruput, but it will drastically decrease the amount of requested content!
Please get in touch!
Report abuse
Mon, 07/05/2010 13:42
Comment submitted by Julien
Julien: then we'll need some people to commit to providing hubs, and get that coordinated with data.london.gov.uk. I don't think you can rely on users requesting only a few trains each - that's what TfL have found. All the excited programmers want to get back data on every train, and they're all moving at the same time.
(I believe the pshb pulls are actually individual requests for each feed, only in one HTTP session).
Cheers,
Mark
Report abuse
Mon, 07/05/2010 18:54
Comment submitted by ark (not verified
Not surprised. This is what I expected from TFL. just like endless signal failures and chaotic underground service.
But the question is where is £9.2bn spent ? there are countries with such budget.
shame on you.
Report abuse
Mon, 07/05/2010 20:09
Comment submitted by nonymous (not verified
If you had put your name to it, that could have been a bold statement, 'nonymous'!
I'd suggest the realtime data provision is a nicely transparent move and demonstrated to me very clearly how much of the service was running well at any time. There are so many ways it can help to counter the "chaos" which, after all, has a lot to do with sheer numbers of people using the trains.
Yes a solution for all the trains all the time is what we're after. I'll leave Mark & Julien to debate fat pings & the like, but it would seem nice simple published pages of all data as it changes would do the job. Mark a sequence number sounds like a good idea too!
Any news on when access might return?
Report abuse
Tue, 07/06/2010 12:51
Comment submitted by Drew (not verified
An update would be nice.
Report abuse
Wed, 07/07/2010 08:31
Comment submitted by Anonymous (not verified
What's happening TFL? The service has been down for a week now - could we get an update?
Thanks
mz
Report abuse
Thu, 07/08/2010 09:37
Comment submitted by ax Zorin (not verified
Over the two week mark of the service being down, any updates at all?
Report abuse
Wed, 07/14/2010 12:28
Comment submitted by Daniel Bartlett
Hi - it seems these comments are being moderated so perhaps somebody reading this could comment on the status of this API. We're only asking because we like it so much......
Report abuse
Tue, 07/20/2010 12:33
Comment submitted by Anonymous (not verified
That is real shame I love this service casino onlineonline1poker roulette online blackjack onlinecasininio live4gambling europa casino spin palace casino onlinecasinobaer online casino
Report abuse
Fri, 07/23/2010 21:39
Comment submitted by nonymous (not verified
Any news? I really love the live map!
Report abuse
Sun, 08/08/2010 08:52
Comment submitted by nonymous (not verified
Update please?
Report abuse
Mon, 08/23/2010 13:58
Comment submitted by ax Zorin (not verified
Is this service ever going to be turned on again?
I can understand turning off the service for whatever reasons you have - but not updating us on what is happening, that is not acceptable.
Report abuse
Wed, 08/25/2010 08:52
Comment submitted by User (not verified
Lending my voice for better communication around what is being planned for this beta API.
There is significant work involved in implementing the API and TFL displays no respect for the collaborative nature of achieving better services for all by communicating nothing on its future.
TFL communication is seriously overdue on the APIs future availability, what changes we should expect, and the roadmap to go from beta to full production.
Give me an appropriate level of information,
Ben Stewart
Caution Your Blast Ltd.
Report abuse
Thu, 08/26/2010 08:53
Comment submitted by Ben Stewart
I'm keen to hear what is happening with the live feed. also can you change this display so that it shows the most recent posting first? We are watching this with interest from Australia.
Report abuse
Wed, 09/01/2010 02:27
Comment submitted by Anonymous
Why are you being so rude and ignoring these requests for more information on what's going on? Any news is better than no news. It really is very poor, rude and childish to behave like this. The comments are (occasionally) moderated so somebody must be reading this.
Report abuse
Fri, 09/03/2010 22:12
Comment submitted by nonymous (not verified
If you had put your name to it, that could have been a bold statement, 'nonymous'!
I'd suggest the realtime data provision is a nicely transparent move and demonstrated to me very clearly how much of the service was running well at any time. There are so many ways it can help to counter the "chaos" which, after all, has a lot to do with sheer numbers of people using the trains.
SEO Backlinks
Yes a solution for all the trains all the time is what we're after. I'll leave Mark & Julien to debate fat pings & the like, but it would seem nice simple published pages of all data as it changes would do the job. Mark a sequence number sounds like a good idea too!
Report abuse
Mon, 09/06/2010 15:49
Comment submitted by nne (not verified
Err, good job deleting the valid comments along with the spam there. An update, any update, would be greatly appreciated.
Report abuse
Mon, 09/06/2010 17:14
Comment submitted by ark (not verified
So in the middle of a tube strike this would be interesting. You could see at a glance how effective or otherwise the strike was, the walk or ride decision would be easy. Would make a great TV backdrop. Or would it just be supplying "the enemy" with data.
Report abuse
Mon, 09/06/2010 19:43
Comment submitted by nonymous (not verified
Finally - some news!!!! http://data.london.gov.uk/blog/update-trackernet-feed
Report abuse
Tue, 09/07/2010 20:50
Comment submitted by User (not verified
Mark - apologies for any accidental deleting of comments. We seem to be getting a lot of spam lately and it can be quite hard to see the wood for the trees. Hopefully people have seen the blog with the latest update on Trackernet.
Report abuse
Wed, 09/08/2010 08:08
Comment submitted by areth Bake
Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.
Report abuse
Fri, 10/08/2010 16:19
Comment submitted by neil
Is the API available again?
Report abuse
Sun, 10/31/2010 01:32
Comment submitted by Mamun Ahmed
Is the API available again?
Thanks
Mamun Ahmed
Report abuse
Sun, 10/31/2010 01:34
Comment submitted by Mamun Ahmed
We realise that our last deadline passed but we will be in a position to give you dates for Trackernet return fairly soon. Main concern for TfL was making sure that the solution they have is robust so that when they open it up again it does not fall over like it did last time. So bear with us it will be back soon (and we might have other good news to announce as well). If you want to keep up you might think of following on @londondatastore in case we are not checking these comments often enough to keep up you can remind us through Twitter which is constantly checked
Report abuse
Thu, 11/04/2010 15:02
Comment submitted by Emer Coleman
I hope the Api is working for me, Thanks.
Report abuse
Mon, 05/02/2011 20:20
Comment submitted by Bestimweb24 Blog
Post new comment