Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11559: [C++] Use smarter Flatbuffers verification parameters #9447

Closed

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Feb 8, 2021

Flatbuffers is able to encode a virtually unbounded of schema fields in a small buffer size.
Verifying that many fields with the Flatbuffers verifier seems to result in potentially unbounded verification times, which is a denial of service risk.

To mitigate the risk, impose that a Flatbuffers buffer cannot represent one more than one Flatbuffers table per buffer bit, which should always be true for well-formed Arrow IPC metadata. Indeed, the only recursive table, the Field table in Schema.fbs, mandates the presence of its type member (though it's not marked as required in the Flatbuffers definition, it's validated by the IPC read routines).

TODO:

  • Add OSS-Fuzz regression file

@github-actions
Copy link

github-actions bot commented Feb 8, 2021

@pitrou
Copy link
Member Author

pitrou commented Feb 8, 2021

Note that we could adopt a similar strategy when validating the Parquet Thrift structures (related: https://issues.apache.org/jira/browse/PARQUET-1877).

@pitrou
Copy link
Member Author

pitrou commented Feb 8, 2021

cc @emkornfield

@pitrou pitrou force-pushed the ARROW-11559-fbb-verification-params branch 2 times, most recently from bad3f4c to b485e01 Compare February 8, 2021 19:35
Flatbuffers is able to encode a virtually unbounded of schema fields in a small buffer size.
Verifying that many fields with the Flatbuffers verifier seems to result in potentially
unbounded verification times, which is a denial of service risk.

To mitigate the risk, impose that a Flatbuffers buffer cannot represent one more than
one Flatbuffers table per buffer bit, which should always be true for well-formed
Arrow IPC metadata.  Indeed, the only recursive table, the `Field` table in Schema.fbs,
mandates the presence of its `type` member (though it's not marked as required in
the Flatbuffers definition, it's validated by the IPC read routines).
@pitrou pitrou force-pushed the ARROW-11559-fbb-verification-params branch from b485e01 to b3f1e6f Compare February 8, 2021 19:36
@pitrou
Copy link
Member Author

pitrou commented Feb 9, 2021

To quote an answer that I got on the Flatbuffers mailing-list: a buffer of a given size can basically encode a nearly unlimited number of tables, since Flatbuffers can deduplicate tables (e.g. you can have a Field with N times the same child, itself with M times the same child, etc., which can produce a combinatory explosion).

@pitrou pitrou deleted the ARROW-11559-fbb-verification-params branch February 10, 2021 08:42
@pitrou
Copy link
Member Author

pitrou commented Feb 10, 2021

Hmm, I'll add the regression file as a separate PR then.

nevi-me pushed a commit to nevi-me/arrow that referenced this pull request Feb 13, 2021
Flatbuffers is able to encode a virtually unbounded of schema fields in a small buffer size.
Verifying that many fields with the Flatbuffers verifier seems to result in potentially unbounded verification times, which is a denial of service risk.

To mitigate the risk, impose that a Flatbuffers buffer cannot represent one more than one Flatbuffers table per buffer bit, which should always be true for well-formed Arrow IPC metadata.  Indeed, the only recursive table, the `Field` table in Schema.fbs, mandates the presence of its `type` member (though it's not marked as required in the Flatbuffers definition, it's validated by the IPC read routines).

TODO:
* [ ] Add OSS-Fuzz regression file

Closes apache#9447 from pitrou/ARROW-11559-fbb-verification-params

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Micah Kornfield <[email protected]>
sgnkc pushed a commit to sgnkc/arrow that referenced this pull request Feb 17, 2021
Flatbuffers is able to encode a virtually unbounded of schema fields in a small buffer size.
Verifying that many fields with the Flatbuffers verifier seems to result in potentially unbounded verification times, which is a denial of service risk.

To mitigate the risk, impose that a Flatbuffers buffer cannot represent one more than one Flatbuffers table per buffer bit, which should always be true for well-formed Arrow IPC metadata.  Indeed, the only recursive table, the `Field` table in Schema.fbs, mandates the presence of its `type` member (though it's not marked as required in the Flatbuffers definition, it's validated by the IPC read routines).

TODO:
* [ ] Add OSS-Fuzz regression file

Closes apache#9447 from pitrou/ARROW-11559-fbb-verification-params

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Micah Kornfield <[email protected]>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Flatbuffers is able to encode a virtually unbounded of schema fields in a small buffer size.
Verifying that many fields with the Flatbuffers verifier seems to result in potentially unbounded verification times, which is a denial of service risk.

To mitigate the risk, impose that a Flatbuffers buffer cannot represent one more than one Flatbuffers table per buffer bit, which should always be true for well-formed Arrow IPC metadata.  Indeed, the only recursive table, the `Field` table in Schema.fbs, mandates the presence of its `type` member (though it's not marked as required in the Flatbuffers definition, it's validated by the IPC read routines).

TODO:
* [ ] Add OSS-Fuzz regression file

Closes apache#9447 from pitrou/ARROW-11559-fbb-verification-params

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Micah Kornfield <[email protected]>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
Flatbuffers is able to encode a virtually unbounded of schema fields in a small buffer size.
Verifying that many fields with the Flatbuffers verifier seems to result in potentially unbounded verification times, which is a denial of service risk.

To mitigate the risk, impose that a Flatbuffers buffer cannot represent one more than one Flatbuffers table per buffer bit, which should always be true for well-formed Arrow IPC metadata.  Indeed, the only recursive table, the `Field` table in Schema.fbs, mandates the presence of its `type` member (though it's not marked as required in the Flatbuffers definition, it's validated by the IPC read routines).

TODO:
* [ ] Add OSS-Fuzz regression file

Closes apache#9447 from pitrou/ARROW-11559-fbb-verification-params

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Micah Kornfield <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant