Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing GTFS Schedule and Realtime with original_trip_id #534

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

davidr1234
Copy link

This pull request is related to issue #462

Context: In Switzerland we've introduced the Swiss Journey ID (documentation only in DE/FR/IT: https://www.oev-info.ch/de/datenmanagement/sid4pt-swiss-id-public-transport/swiss-journey-identification-sjyid).
This ID is valid for one operating day and across different days of a scheduled year. It therefore maps to one or more trip_ids.

Proposal: Based on the suggestion by @miklcct (in the referenced issue) we propose to use the original_trip_id (as defined in https://developers.google.com/transit/gtfs/reference?hl=en) in GTFS Schedule and GTFS Realtime to represent constructs such as our SJYID. With this it is possible to combine trips from GTFS Schedule and GTFS Realtime with other standards such as SIRI or NeTEx, which have a similar concept.

Implementation: Since the 12.12.2024 we offer the original_trip_id (filled with our SJYID) as part of GTFS Schedule (doc: https://opentransportdata.swiss/en/cookbook/gtfs/#tripstxt) and GTFS Realtime (doc: https://opentransportdata.swiss/en/cookbook/gtfs-rt/#Trip_updates). In one case our consumers use the original_trip_id in GTFS Realtime to match with timetable data in the (proprietary) HRDF format.

Generalizability: Based on early discussions with other public transport providers, we think this enhancement can benefit many other producers and consumers and increase the inter-operability of GTFS with other standards.

u239901 added 2 commits January 30, 2025 13:15
The original_trip_id is added to both GTFS Schedule and GTFS Realtime.

This field allows the association of trips across different realtime and schedule standards, e.g., NeTEx and SIRI.

It also allows matching between schedule and realtime.
Copy link

google-cla bot commented Jan 30, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@eliasmbd eliasmbd added GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime Status: Discussion Issues and Pull Requests that are currently being discussed and reviewed by the community. Change: Addition New function proposed to the specification. labels Jan 30, 2025
@miklcct
Copy link

miklcct commented Jan 31, 2025

You need to update the .proto file for the real time field. Please use a larger number and avoid the field numbers I am proposing in #504 , as I intend to produce it as soon as I can for integration with other systems such as Darwin (my static GTFS has this field already).

@davidr1234
Copy link
Author

davidr1234 commented Jan 31, 2025

You need to update the .proto file for the real time field. Please use a larger number and avoid the field numbers I am proposing in #504 , as I intend to produce it as soon as I can for integration with other systems such as Darwin (my static GTFS has this field already).

Thank you @miklcct, I missed that. I looked at #504 and it seems that 8 is available as field number for the original_trip_id (within TripDescriptor, underneath optional ModifiedTripSelector modified_trip = 7;). This is also the number we currently use in our implementation. Would that interfere with your work?

@miklcct
Copy link

miklcct commented Jan 31, 2025

That's great, so I can continue to use 5 and 6 for trip_headsign and trip_short_name respectively.

Are you using 5 or 6 for something else?

@davidr1234
Copy link
Author

That's great, so I can continue to use 5 and 6 for trip_headsign and trip_short_name respectively.

Are you using 5 or 6 for something else?

No, we only add optional string original_trip_id = 8;

I'll push that now.

According to discussions in google#534 to reflect the documentation in reference.md
@skinkie
Copy link
Contributor

skinkie commented Jan 31, 2025

Critical question: why is this important for passenger information. I read the interoperability argument, but not how that is used.

@miklcct
Copy link

miklcct commented Jan 31, 2025

Critical question: why is this important for passenger information. I read the interoperability argument, but not how that is used.

It allows consumers to match the GTFS data with external data from other sources.

@skinkie
Copy link
Contributor

skinkie commented Jan 31, 2025

Concrete examples please. The GTFS ecosystem is that GTFS can be matched with GTFS-RT. In what situation is it valuable to have other trip identifiers. I can think of some, but those should be written down.

@miklcct
Copy link

miklcct commented Jan 31, 2025

Concrete examples please. The GTFS ecosystem is that GTFS can be matched with GTFS-RT. In what situation is it valuable to have other trip identifiers. I can think of some, but those should be written down.

I am using the field to match upstream data from systems of Network Rail, where their IDs are only unique on a single day.

@eliasmbd
Copy link
Collaborator

Critical question: why is this important for passenger information. I read the interoperability argument, but not how that is used.

Are you asking why this matters to the rider?

@skinkie
Copy link
Contributor

skinkie commented Jan 31, 2025

Are you asking why this matters to the rider?

Exactly.

@davidr1234
Copy link
Author

Concrete examples please. The GTFS ecosystem is that GTFS can be matched with GTFS-RT. In what situation is it valuable to have other trip identifiers. I can think of some, but those should be written down.

We give an example in the introductory text under "Implementation": "In one case our consumers use the original_trip_id in GTFS Realtime to match with timetable data in the (proprietary) HRDF format."

Our consumers use both GTFS and HRDF. However, HRDF is able to better reflect certain services in Switzerland due to its more comprehensive and complex data structure, such as for linked trips.

This approach allows to maintain the efficient structure of GTFS (which is one of the reasons for its popularity with our consumers), while providing the full bandwidth of our available customer information by combining it with our other formats.

Concrete examples please. The GTFS ecosystem is that GTFS can be matched with GTFS-RT. In what situation is it valuable to have other trip identifiers. I can think of some, but those should be written down.

I am using the field to match upstream data from systems of Network Rail, where their IDs are only unique on a single day.

I would also like to point out this statement by @miklcct, which would be another motivation for this field. This is also true for our SJYID.

@leonardehrenfried
Copy link
Contributor

Could you say a little more why your consumers want to use the HRDF together with GTFS and not go either full GTFS/GTFS-RT, HRDF or Netex/SIRI? If you're already dealing with the complexities of linked trips in HDRF, would the extra complexity of (say) SIRI-ET make a difference?

To summarise: I'm a bit sceptical of changing the GTFS specification to accommodate non-GTFS workflows.

@skinkie
Copy link
Contributor

skinkie commented Feb 3, 2025

To summarise: I'm a bit sceptical of changing the GTFS specification to accommodate non-GTFS workflows.

I have the same skeptism. But there is fundamental thing both GTFS and NeTEx are overlooking. For every time in GTFS or NeTEx a property changes, a new identifier must be introduced. Now you could argue "this makes a lot of sense" and for some organisations (and even implementers) it does not. They are instead managing these validities of properties at different levels. That is why virtually everything is in conflict with each other once HRDF is mentioned. This is the absolute root cause.

@miklcct
Copy link

miklcct commented Feb 3, 2025

The very reason why this field is needed is that a consumer with local knowledge can use it to reference other passenger-facing systems outside the GTFS world using the original_trip_id provided, when other passenger-facing systems use an ID which is not the public code.

@ue71603
Copy link

ue71603 commented Feb 3, 2025

It is not just HRDF. As Stefan mentions there is a core problem in NeTEx and GTFS: uniqueness of the trip in the file. Traditional public transport has uniqueness of the ServiceJourney per operating day. Even when the trip is slightly different e.g. for Wednesday, it still is the trip that starts at 08:01 to Zürich. This can be expressed by the global id. We have to split the trip into different ones for GTFS, but for many other use cases (and systems). It is still useful/crucial to know that this is indeed the one.

@ue71603
Copy link

ue71603 commented Feb 3, 2025

So what the PR really does is to accomodate both "worlds".

@leonardehrenfried
Copy link
Contributor

leonardehrenfried commented Feb 3, 2025

Can you not do the same "split" when converting to GTFS-RT?

BTW, I don't doubt that it would be useful to your consumers but I doubt that it's GTFS's responsibility to deal with other representations of public transport.

@skinkie
Copy link
Contributor

skinkie commented Feb 3, 2025

From a GTFS standpoint it can also be interesting. For example aggregating all the "truly unique trips". It is a very specific use case, therefore I hope some more examples can be provided.

@ue71603
Copy link

ue71603 commented Feb 3, 2025

Non sequitur @leonardehrenfried: With your argumentation we could say: Why should we produce GTFS at all. HRDF and VDV 454 contain all necessary information. It can't be the responsibility of Switzerland to facilitite work for others?

We believe this PR is a simple way to simplify interactions between the different formats. In the ideal world one can use on realtime stream and a time table stream of your choice.

@leonardehrenfried
Copy link
Contributor

I'll grant you that: it's not a complicated proposal.

@miklcct
Copy link

miklcct commented Feb 3, 2025

It is not just HRDF. As Stefan mentions there is a core problem in NeTEx and GTFS: uniqueness of the trip in the file. Traditional public transport has uniqueness of the ServiceJourney per operating day. Even when the trip is slightly different e.g. for Wednesday, it still is the trip that starts at 08:01 to Zürich. This can be expressed by the global id. We have to split the trip into different ones for GTFS, but for many other use cases (and systems). It is still useful/crucial to know that this is indeed the one.

Also there is currently no facility in GTFS (unlike NeTEx) to specify that the modified Wednesday 08:01 is the same trip as the Wednesday 08:01, that if the timetable has been modified from the base timetable it is not currently possible for a client with saved pre-planned journey to know that the timetable has been changed. They will just fail to find the trip in the updated timetable.

I have a use case where a traveller can plan journeys up to months beforehand and saved in the user's device. If the timetable is changed, requiring a new ID in the GTFS, it is impossible for the client to find the new ID (however, I think this is worth another PR to associate a trip to a calendar exception, as it is not the purpose of original_trip_id).

Roughly speaking, it can be described in the following way:
Calendar Base: Mon - Fri, 1 Jan to 30 Jun 2025, with exception on Good Friday and Easter Monday

In such case, a trip running on a modified timetable on Easter can associate with the original Mon-Fri timetable by specifying another new field (NOT original_trip_id) with the ID for the base trip.

@skinkie
Copy link
Contributor

skinkie commented Feb 3, 2025

Also there is currently no facility in GTFS (unlike NeTEx) to specify that the modified Wednesday 08:01 is the same trip as the Wednesday 08:01, that if the timetable has been modified from the base timetable it is not currently possible for a client with saved pre-planned journey to know that the timetable has been changed. They will just fail to find the trip in the updated timetable.

I honestly think NeTEx did not standardise "global trip id" either. And yes, there is PrivateCode but that is not the "concept" that we mean here?

@davidr1234
Copy link
Author

Can you not do the same "split" when converting to GTFS-RT?

BTW, I don't doubt that it would be useful to your consumers but I doubt that it's GTFS's responsibility to deal with other representations of public transport.

Indeed, but this is not a one-way street. Would you argument that the other standards (such as NeTEx/SIRI) should not facilitate inter-operability with standards, such as GTFS, as well?

Also, and maybe we need to amend the description to clarify this:

The very reason why this field is needed is that a consumer with local knowledge can use it to reference other passenger-facing systems outside the GTFS world using the original_trip_id provided, when other passenger-facing systems use an ID which is not the public code.

Such systems can be as much as (in Switzerland) the national identification of journeys accross systems, standards, and operators.

@leonardehrenfried
Copy link
Contributor

Please don't get me wrong. I have doubts but I'm not the Guardian of GTFS Purity. :)

I would probably not vote at all.

@skinkie
Copy link
Contributor

skinkie commented Feb 3, 2025

OpenGeo has been producing this information as realtime_trip_id for over 10 years now. So you can add us towards the list of producers. We will modify our output once the vote has been done. Which is also part of the GTFS-RT extension. https://github.com/laidig/gtfs-rt-autodoc/blob/master/gtfs-realtime-OVapi.proto#L17

Not asking here to change the name in this proposal. But it has been added since 2014.

@ue71603
Copy link

ue71603 commented Feb 3, 2025

In NeTex some still think that the id attribute will suffice. They are wrong. We will in the profiles have to make clear that the content of the original_trip_id must be somewhere. (either in a KeyList with a given key or in the privateCodes with a given key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Change: Addition New function proposed to the specification. GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule Status: Discussion Issues and Pull Requests that are currently being discussed and reviewed by the community.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants