Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify intended use for timepoint in stop_times.txt #474

Merged
merged 8 commits into from
Aug 1, 2024

Conversation

isabelle-dr
Copy link
Collaborator

@isabelle-dr isabelle-dr commented Jun 7, 2024

Problem

Related issue: #61

In stop_times.timepoint (ref), the spec says:

0 - Times are considered approximate.
1 or empty - Times are considered exact.

The meaning of "empty" here is interpreted in different ways:

Sample 1 from Greater Glens Falls Transit where timepoint="" means times are exact:

stop_sequence arrival_time departure_time timepoint
0 13:15:00 13:15:00
1 13:30:00 13:30:00 0
2 13:40:00 13:40:00 0
3 13:50:00 13:50:00
4 14:00:00 14:00:00 0

Sample 2 from Squaxin Island Transit where timepoint="" is used when no times are provided:

stop_sequence arrival_time departure_time timepoint
1 8:30:00 8:30:00 1
2 8:31:01 8:31:01 1
3
4
5
6
7
8
9
10 8:45:00 8:45:00 1
11 8:55:00 8:55:00 1

Solution proposed

A spec amendment to clarify that empty, in this case, means no timepoint values are provided for any record in stop_times.txt, and if provided, all records that have times associated should have timepoint values of 0 or 1.

Historical context (provided by @barbeau here)

Originally, prior to the timepoint field, GTFS spec said you should only provide arrival and departure times for stop_times.txt records that are timepoints. So, if stops 1 and 4 were timepoints, but 2 and 3 were not, you'd have a valid GTFS that looks like this:

stop_sequence arrival_time departure_time
1 00:00:00 00:00:00
2
3
4 00:10:00 00:10:00

However, producers realized that for multiple consumers to show consistent scheduled arrival and departure times at each stop (i.e., so consumers didn't interpolate them and come up with their own values), they would need to share arrival/departure times for each stop in the trip. A large number of GTFS producers started doing the following, even though technically it was against the GTFS spec:

stop_sequence arrival_time departure_time
1 00:00:00 00:00:00
2 00:02:00 00:02:00
3 00:08:00 00:08:00
4 00:10:00 00:10:00

Now, to consumers, all the stops looked like timepoints, even though that wasn't the producer's intent.

The timepoint field was added to give producers a legitimate way to share times for each stop in the trip, while still correctly indicating which stops are timepoints.

So if producers want to provide times for every stop, they shouldn't be doing the above, and instead should provide the timepoint field:

stop_sequence arrival_time departure_time timepoint
1 00:00:00 00:00:00 1
2 00:02:00 00:02:00 0
3 00:08:00 00:08:00 0
4 00:10:00 00:10:00 1

This means as of today, IMHO, there are two valid ways to share arrival and departure times in GTFS. The first is the original GTFS spec without the timepoint field, where times are omitted for stops that are not timepoints:

stop_sequence arrival_time departure_time
1 00:00:00 00:00:00
2
3
4 00:10:00 00:10:00

Or, if they want to provide times for every stop, they should provide timepoint values for the entire stop-times.txt.

stop_sequence arrival_time departure_time timepoint
1 00:00:00 00:00:00 1
2 00:02:00 00:02:00 0
3 00:08:00 00:08:00 0
4 00:10:00 00:10:00 1

@isabelle-dr isabelle-dr linked an issue Jun 7, 2024 that may be closed by this pull request
@isabelle-dr isabelle-dr added GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule Change: Best Practice Changes focusing on recommendations for optimal use of the specification. labels Jun 7, 2024
@bdferris-v2
Copy link
Collaborator

What's your interpretation of this propose change with respect to existing feeds that use a mix of timepoint specified and timepoint empty values (aka sample 2 above)? Do you consider this a backwards-compatible change?

@isabelle-dr
Copy link
Collaborator Author

isabelle-dr commented Jul 3, 2024

@bdferris-v2 since we are working with WARNINGS, I consider this change backwards-compatible.

My initial intention was to consider that sample 2 would not trigger any WARNING.
I added the statement: "All records with defined arrival or departure times should have timepoint values populated." to avoid asking for a 0 value for records with no times, as it seems unnecessary.

But now that I think more about it, it creates a conflict: if the field is Recommended and can only be populated with values 0 or 1, then all cases of "empty" (no column provided or partial values) should trigger a warning with no exception.

I see two possible resolutions:

  • Resolution 1: the timepoint field is Recommended if times are provided, and can only take values 0 or 1. Sample 2 would not trigger any WARNING.

  • Resolution 2: the timepoint field is Recommended and can only take values 0 or 1.
    Sample 2 would need to be modified to have timepoint=0 for records with no times provided. If not, they would have a WARNING.

For what it's worth, we interpreted the current spec for the Canonical GTFS Schedule Validator as follows (which is along the lines of resolution 2):

  • if no timepoint column is provided, the feed gets a missing_recommended_column notice.
  • if the timepoint column is provided but not all records are populated, the feed gets missing_timepoint_value notices for all records with no value (regardless of the presence or not of times).

We are happy to change this logic to a better one.

Thoughts?

@isabelle-dr
Copy link
Collaborator Author

isabelle-dr commented Jul 8, 2024

On the Mobility Database:

  • 68% of feeds have timepoint populated for all records, some with omitted times where records with no times are associated with a timepoint 0.
  • 31% of feeds have no timepoint header, or a header with no values. This is the original intention for "empty" where times are considered exact.
  • only 1% of feeds have partially specified timepoint values. In this category, there is a bit of everything:
    • all times specified, partially specified timepoints values with only 0s.
    • all times specified, partially specified timepoints values with only 1s.
    • all times specified, partially specified values with 0s and 1s (impossible to interpret).
    • ommited times, timepoint values 1 only for records that have times.

Given how infrequent partially specified timepoint values are, I tend to think resolution 2 is the best, and I've added this statement:

Timepoint should be set to 0 for records with no arrival or departure times.

@bdferris-v2
Copy link
Collaborator

Coming back to look at this, I ultimately don't have much issue with either of your two proposed resolutions. I have a slight preference for Resolution 1 (recommended if times are provided), but I could see the argument for Resolution 2 as it's more straightforward.

@isabelle-dr
Copy link
Collaborator Author

I believe we've covered enough ground for this issue, I'll go ahead and open a vote.

This PR has been open for at least 7 days, as per the Spec Amendment Process, I am opening a vote to clarify what "empty" means for timepoint, because it is interpreted in unintended ways.
Voting ends on 2024-07-30 at 23:59:59 UTC.

Overview of this change
This change affects datasets that use mix of specified and empty timepoint values, which accounts for less than 1% of the datasets in the Mobility Database.
This change is not related to ERRORS or dataset validity; it backward-compatible.
This PR essentially adds specification language to support the existing behavior of the Canonical GTFS Schedule Validator, so there are no expected changes with WARNINGS either.

Here are the main cases with partially specified timepoint values and the clarification addressed by this PR:

  1. empty timepoint values used to inform that times are exact -> should be replaced with values 1
  2. empty timepoint values used to inform that times are approximate -> should be replaced with values 0
  3. empty timepoint values because no times are provided -> should be replaced with value 0. This is already what most datasets with omitted times are doing.

You can find previous discussions on the topic in the issue #61.

@westontrillium
Copy link
Contributor

I am definitely in support of clarifying these recommendations around timepoints and am prepared to vote +1, but I do have one concern which would require modifying the PR—I apologize for coming to the conversation only after a vote has already been called...

It doesn't make sense to recommend timepoints for flexible stop_times, i.e., stop times with no arrival/depature_time defined but with start/end_pickup_drop_off_windows. Can the language be modified to the following?

If no timepoint values are provided, all arrival/departure times are considered exact. If timepoint values are provided, then timepoint values of 0 should be present in stop_times.txt records where arrival_time, departure_time, start_pickup_drop_off_window, and end_pickup_drop_off_window are unspecified.

@isabelle-dr
Copy link
Collaborator Author

isabelle-dr commented Jul 16, 2024

Ah, thank you for bringing flexible stop_times @westontrillium!
I think this is an argument to pivot to the resolution 1 I described earlier. In this case, I would rather just remove the Recommended presence Requirement for timepoint and replace the current statement:

If timepoint values are provided and stop_times.txt has records with unspecified arrival and departure times, timepoint values should be set to 0 for these records.

With:

All records with defined arrival or departure times should have timepoint values populated.

With this, datasets with (1) no timepoint no times or timepoint whatsoever or (2) partially populated timepoint where the empty values correspond to records without times, won't trigger a WARNING.

Thoughts?

@westontrillium
Copy link
Contributor

Except, doesn't "All records with defined arrival or departure times should have timepoint values populated" conflict with "...datasets with...no timepoint whatsoever won't trigger a WARNING"?

@isabelle-dr
Copy link
Collaborator Author

Sorry, I meant datasets with no times or timepoint whatsoever (flex). Edited the comment.

@westontrillium
Copy link
Contributor

OK yes, I think that your proposed change works.

@isabelle-dr
Copy link
Collaborator Author

Okay, given that no one voted yet and that @westontrillium brought a good argument for resolution 1 rather than 2 as described here, I am canceling the first vote and will re-open it with the other proposed change.

@isabelle-dr
Copy link
Collaborator Author

isabelle-dr commented Jul 17, 2024

And here we go again!

As per the Spec Amendment Process, I am opening a vote to clarify what "empty" means for timepoint, because it is interpreted in unintended ways.
Voting ends on 2024-07-31 at 23:59:59 UTC.

Overview of this change
This change is not related to ERRORS or dataset validity; it backward-compatible.

This PR essentially changes two things:

  • timepoint goes from recommended at all times to recommended if departure/arrival_time is provided.
  • clarification that when provided, timepoint values should be set explicitly to 0 or 1.

This change affects the following type of datasets:

  1. Datasets that have all departure/arrival_time populated and that use a mix of specified and empty timepoint values (sample 1 above).

    • empty timepoint values used to inform that times are exact (example) -> should be replaced with values 1
    • empty timepoint values used to inform that times are approximate (example) -> should be replaced with values 0
  2. Datasets with omitted times and timepoint values 1 for records with times (sample 2 above, example): they won't trigger a WARNING anymore.

  3. Flex datasets that don't use departure/arrival_time and don't have timepoint defined: they won't trigger a WARNING anymore.

If this PR gets merged, we will make a modification of the canonical validator, as its logic is currently to give a WARNINGS in all cases of timepoint="".
You can find previous discussions on the topic in the issue #61.

@Transnnovation-GTFSMgr
Copy link

+1 from National RTAP and our rural/tribal partners. Empty timepoints are replaced with zero (as interpolated or estimated times)

@isabelle-dr isabelle-dr added the Status: Voting Pull Requests where the advocate has called for a vote as described in the changes.md label Jul 17, 2024
@westontrillium
Copy link
Contributor

+1 from Trillium

@bdferris-v2
Copy link
Collaborator

+1 from Google

@skinkie
Copy link
Contributor

skinkie commented Jul 22, 2024

What is the reason from reverting recommended to optional?

@isabelle-dr
Copy link
Collaborator Author

@skinkie timepoint goes from recommended at all times via the Recommended Presence condition to recommended if departure/arrival_time is provided via a statement in the description.

There will still be the validator WARNING, but we will adapt its logic so it depends on the presence of times.

@isabelle-dr
Copy link
Collaborator Author

The voting period ended on 2024-07-31 at 23:59:59 UTC.

With 3 votes in favor and no votes against, the vote passes.
The votes came from:
@Transnnovation-GTFSMgr (National RTAP)
@westontrillium (Trillium)
@bdferris-v2 (Google)

Thank you to everyone who participated!

@isabelle-dr isabelle-dr merged commit b2ee3c8 into google:master Aug 1, 2024
2 checks passed
@tzujenchanmbd
Copy link
Collaborator

This clarification has been incorporated into the Canonical GTFS Validator V6.0.0. Please refer to the release page: https://github.com/MobilityData/gtfs-validator/releases/tag/v6.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Change: Best Practice Changes focusing on recommendations for optimal use of the specification. GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule Status: Voting Pull Requests where the advocate has called for a vote as described in the changes.md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

timepoint=empty (with no times specified)
6 participants