Publishing transport data

iRail has come a long way. Not that long, as we -almost- exist for only 2 years, but we moved from being one student with a legal problem towards a community with the most talented people I have ever met. We have been given great opportunities in the past, going from speaking at CeBIT, OGDCamp, TEDxUHasselt or Re:Publica to speaking with policy makers on how to publish transport data. The latter has always been a very difficult question and so far we have come up with this answer:

There’s a lot to publishing transport data in the open. Daniel Dietrich once told me that open transport data is probably the most interesting kind of open data, and he’s totally right. You can publish statistics, you can publish dynamic data (as most of the transport companies are doing today) and you can publish real-time data. But it doesn’t stop by just publishing data, the feedback you can get from the crowd is immense. You can even work together on your data as a common resource, but let’s start at the beginning.

Static statistical data

Thanks to the Freedom of Information legislation, you have the right-to-know and request information held by your government. Surprisingly there aren’t a lot of public transport agencies which publish raw data about punctuality, disturbances, and so on. The cost for a public transport company to publish this data is almost nonexistent: any organisation needs to collect this data for internal use and publishing the data, as the data never changes once published, can be done by publishing raw datadumps. A simple link on a data portal should allow anyone to consult that dataset at anytime.

Dynamic transport data

Data is dynamic when it needs an update periodically. It can take for instance one month for a dataset to change slightly (e.g. a list of all bus stations) or the data can change weekly (e.g. time schedules of a transport company). Data can be published by static dumps, risking the chance of people using an older dataset, or using an Application Programming Interface (API) or web-service. With iRail we have developed The DataTank to create a web-service in no time.

For dynamic transport data there are a couple of standards, none leading to complete satisfaction. For publishing static data dumps the most used format seems to become GTFS (General Transit Feed Specification). It describes a scheme for CSV files. Using GTFS you can integrate your transport service with google maps, with mapnificent, open trip planner and so many more services. There are some common pitfalls with this format, for instance international trains are hard to track when they change from one railway network to another.

There are also some dynamic data standards, such as SIRI or BISON. None of them however seem to satisfy transit companies as, when they do publish an open API, they choose to implement their own specification: api.ns.nl, … But they’re not to blame, we as well have our own specification at data.iRail.be or api.iRail.be.

Real-time transport data

Let’s get one common misconception out of this world: a RESTful or SOAP or plain-old xml webservice can never be real-time. Real-time data means immediately informing all copies of the dataset about a certain change in the complete dataset, or informing all subscribers about a certain event, with almost no delay. This means we want a publisher-subscriber architecture (pubsubhub) where a subscriber can connect to a publisher, which will inform this subscriber when an event occurs. This architecture can be described like a chat-box. When you start a conversation with a certain vehicle, you subscribe to it. You say for instance: “Hi train 314, in the future, can you please keep me up to date of your possible delays? Thanks”. After a while, train 314 may tell you that he has some technical difficulties because it hit a lama. Or maybe someone else on the train may inform the servers that the train has hit a lama, resulting in the train telling all subscribers that it hit a lama once confirmed.

The best example of this is http://42transit.com/. A user-interface where you can subscribe to real-time information about the Dutch railway company.

Identifying things

Now we have three interfaces which give information about the same things. We can also say these things have different representations. To be able to identify these things, so that we can always be sure we are talking about the same object, we can use a Unified Resource Identifier (URI). For example: http://railwaycompanyA.com/train/314/. If this URI is always used to talk about this train, then we can also, after creating indexes, fetch all data concerning this train without any problem. The index of all resources could be returned when you direct your browser to this URI. This is a great way to make sure all different instances inside your organisation can request the same information.

Let’s not stop here. Once we have our URIs pointing to things, we may categorize all our things. These categories need a description, for instance: a train has a location, can hold an amount of people, exists out of different wagons, and so forth. This description can also be described by combining different URIs. This is called an ontology.

And let’s not stop here either, once we have all our things described by ontologies, we can link things from this transport company to things from other transport companies, and thus, query data from different companies as if they were one.

The technology for these rather abstract concepts are already in place for a while: RDF.

Open Data is not expensive

A second misconception is that open data is expensive. Everything that we need for an open data policy: structured data, identification, ontologies, meta-data, and so on, are things that every organisation should have internally anyway. And it doesn’t mean that because a company doesn’t have this in place yet, you can’t have an open data policy. As Bart Rosseau from the city of Ghent taught us at “V-ICT-OR shops IT“, an open data policy is the perfect tool to get your internal data structure in place. Use it to challenge your organisation. When you can say and prove that you have a 5-star data structure, you have something worth more than a certificate, for free.

Open data engagement

Until now we haven’t really, really spoken about open data. So far we have spoken about structuring the data to enter the 21st century, but we haven’t spoken about licenses, getting data feedback, replying to demand-driven requests, documentation, working together on data as a common resource and, in general, how to get an added value from an open data policy as such. But that’s something for a next blogpost.

Pieter

 

14. May 2012 by Pieter Colpaert
Categories: Open data | Leave a comment

ACTA

Hi everyone,

Remember SOPA? Or PIPA? These are American bills which, thanks to huge protest from sites like Wikipedia, have not been incorporated in the American legislation. Had these bills been passed, it would have harmed the Internet to an extent that sites supposedly using copyrighted material could be taken down without any form of trial. As the current copyright laws are open for interpretation, this would mean that practically all sites could be taken down by big corporations which are too lazy to innovate their business model.

But SOPA/PIPA wasn’t that bad. At least not when compared with its European brother ACTA (Anti-Counterfeiting Trade Agreement), that is. The central idea of this bill is to protect products from counterfeiting. However, this isn’t a clear concept: counterfeiting can be interpreted very broadly, ranging from trademark and copyright infringement, to patents and other restriction mechanisms. But ACTA also empowers customs officers to check your hard drive and USB sticks on copyrighted material when crossing borders. And of course, it also allows for “counterfeiting” sites to be taken down with minimal effort, closely resembling the late SOPA bill.

It’s especially that last thing that bothers iRail. A lot.

If ACTA would have been accepted by the Belgian government, the NMBS/SNCB would not have tried to stop iRail through a lawyer back in 2009. They would have filed a complaint and iRail would have been blocked straight away (without passing through court first). For good. And wouldn’t have stopped there: we might even have been arrested, just like the people from MegaUpload.

This is not promoting fair competition, neither is it improving the copyright holder’s situation. It is a war against democracy, open knowledge and the freedom of speech.

Tomorrow, the Polish government will be the first to sign ACTA. It’s time for us to wake up. Now.

 

Yours faithfully,
PieterTim & Yeri
iRail npo

25. January 2012 by Pieter Colpaert
Categories: Politics | Tags: , , | 2 comments

How iRail members do Qr-codes

iRail members are creative. That’s a fact. It’s a thing you can’t deny.

Today one of our finest members engineered a pragmatic and cheap manner to add iRail.be bookmarks on any wall you like.

 

Hannes is an engineering student. He is part of the iRail.be 3.0 team, a project that will be released in the autumn of 2012.

- Pieter

 

 

26. December 2011 by Pieter Colpaert
Categories: iRail, News & events | Leave a comment

A mission, a vision and a visual identity

First of all, as a new blogger on iRail, I will have to introduce myself. My name is Miet Claes and I’m a student Graphic Design at the MAD-faculty. For my master’s I was asked to find an NPO with a goal that interested me. I got the assignment to guide them in developing their visual identity and help build a campaign. A few weeks ago I contacted Yeri and Pieter after their talk at TEDxUHasselt, because I wanted their NPO. As you can see, we got along great!

Thursday, November 24th Pieter and I hooked up to discuss the mediaplan that’s going to take iRail a step further to the creation of a visual identity. From now on you will be able to see the iRail Mediaplan taking form at this blog. To get an idea, I put together a little presentation. It’s still being revised by the entire board and we’re working hard to complete it, but I’m happy to introduce the start of the iRail Mediaplan!

29. November 2011 by miet
Categories: Uncategorized | Tags: | Leave a comment

iRail servers

As you might have noticed, we’ve been struck by a bit of bad luck lately.

About 6 weeks ago, a major FS crash occurred on one of the main servers of iRail. As this machine was quite old (7 years, an old P4), our host decided to decommission this server and replace it by a brand new Xeon server.

However, the process to transfer all data took a couple of days, and the IBBT was kind enough to provide us for hosting during this transition. This gave us the opportunity to run some load tests on their servers as well.

Not 3 weeks later, another problem occurred on another server. Xen (who was admittedly outdated, but could not be updated at that time) froze the entire server, and refused to restart its networking after resetting the device. I then decided to go for a clean install (upgrade to Xen 4.x and switch from Ubuntu to Debian). This outage affected minor services such as the blog. API was up during this time.

Long story short, it’s all back up now since yesterday evening. We’ll be checking out cloud solutions in the near future to prevent this issues from happening again.

 

There will be additional changes in the future (a dedicated VM for a few iRail services), but this is what it looks like now:

Aleph (dom0) proxies HTTP requests (using nginx) to 2 different VMs:

  • dedicated BeLaws VM
  • an Apache server running the api and iRail.be (and a few other iRail services such as Trac).

Four (dom0) hosts several VMs hosting services like liveboards and this blog.

In the near future we’ll add caching again (it’s disabled at this moment), but still will be managed by TheDataTank.

20. November 2011 by Yeri Tiete
Categories: Errors | Leave a comment

← Older posts