Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enum refactoring #6158

Closed
wants to merge 7 commits into from
Closed

enum refactoring #6158

wants to merge 7 commits into from

Conversation

krojew
Copy link
Contributor

@krojew krojew commented Oct 5, 2020

This PR is an alternative solution to unsoundess in #5581. The benefits over #6098 are:

  1. Enums are kept as proper enum types.
  2. Untrusted buffers now have fallible enum accessors for safe and future-compatible access.
  3. Trusted buffers have unsafe accessor which doesn't incur any performance penalty due to transmutation. The user is responsible for ensuring the buffer is trusted.

This solution is also compatible with potential future verifier returning trusted buffer types.

@rw @aardappel r?

@ImmemorConsultrixContrarie
Copy link
Contributor

Currently thiserror does not support no_std environment, and using this as a dependency would make transition to no_std harder.

And generated code for the checks could have some problems. For example, big matches are awful for debug builds, see this issue. This problem could be fixed by range matching, i.e.

match x {
    x if -1 <= x && x <= 2 => (),
    _ => return Err(()),
}
Ok(unsafe { transmute(x) })

@CasperN
Copy link
Collaborator

CasperN commented Oct 5, 2020

I think you need a way to write unrecognized enums into a flatbuffer.

An example use case is a proxy service that talks to embedded devices that report an integer that's really an enum. Maybe after a firmware update a new int is reported and a monitoring service behind the proxy recognizes it but the proxy hasn't been updated yet. In this scenario, being unable to write an invalid enum value will lead to either an outage or a coupling of releases between the proxy service and the embedded devices's firmware.

Imo Result<NativeEnum, i8> would be the better representation.

Also, I think instead of transmuting in unsafe, you should safely return the base type. Maybe _raw is a good suffix for that. Users can do the unsafe transmute themselves.

@krojew
Copy link
Contributor Author

krojew commented Oct 5, 2020

Rather than returning an int as an error, it's better to put it in an error. Will wait for more comments and make changes tomorrow.

@aardappel
Copy link
Collaborator

I will restate that I think it is a really bad idea for an enum to have an accessor to return Result. This is really unergonomic, and is going to cause future schema changes to unexpectedly break old code.

FlatBuffer enum values are a set of integers of which you may not know all the possible values. This concept apparently does not match cleanly onto Rust enums, so I'd suggest using an implementation technique that represents them better, such as #6098

FlatBuffers has its own semantics, and it is important that all participating languages are able to access that semantics as faithfully as possible, by whatever means they have available. For example, in Java, we don't use the built-in enum feature either, because it is a heap allocated object :) Heck, Java (and most languages) can't even represent FlatBuffer strings directly, and its rather painful.

@krojew
Copy link
Contributor Author

krojew commented Oct 5, 2020

@aardappel this PR contains direct int to enum casts exactly for the case you describe - when someone does not want to deal with a Result and trusts a buffer. It's exactly what the c++ version does. For people wanting to be extra safe, the Result version is also provided. This way we're satisfying both use cases while keeping the code idiomatic.

@krojew
Copy link
Contributor Author

krojew commented Oct 5, 2020

As for compatibility - this change also solves this problem with both apis present. There is no way to distinguish programmatically if an incoming invalid value comes from an updated schema, or from an invalid buffer, so using the Result for people wanting to protect against such situations, gives them the possibility to do so in a Rust way without sacrificing strong enum type.

@aardappel
Copy link
Collaborator

@krojew this seems to leak unsafe into the caller which the other PR doesn't seem to do.

Multiple accessors is not great. At the very least I'd switch the naming, such the ergonomic method has the field name, and the other one appends _checked or whatever. Or maybe we can emit a generic range-check function for people that want to check their enums manually.

@aardappel
Copy link
Collaborator

In the end, it's for you rustaceans to decide. I am just giving the FlatBuffers perspective :)

@krojew
Copy link
Contributor Author

krojew commented Oct 5, 2020

We can tweak the naming, although in Rust, the conversion is to use _unchecked where checks are omitted, not the other way around. Given people use IDEs with code completion, I suggest going the "traditional" route, although I'm fine either way.
The unsafe leak is intentional, since such c++ style casts are inherently unsafe. Rust wants this to be explicit, while c++ doesn't care, as usual 🙂 Such accessor is for people knowing that, hence unsafe.

@CasperN
Copy link
Collaborator

CasperN commented Oct 5, 2020

this PR contains direct int to enum casts exactly to the case you describe - when someone does not want to deal with a Result and trusts a buffer. It's exactly what the c++ version does. For people wanting to be extra safe, the Result version is also provided. This way we're satisfying both use cases while keeping the code idiomatic.

If you're referring to the unsafe transmute thing, that is not forwards compatible. Its UB. Its actually Rust-only UB too since C++ is not that opinionated about enums.

As for compatibility - this change also solves this problem with both apis present. There is no way to distinguish programmatically if an incoming invalid value comes from an updated schema, or from an invalid buffer, so using the Result for people wanting to protect against such situations, gives them the possibility to do so in a Rust way without sacrificing strong enum type.

I agree, the main distinction between this and #6098 is confining the handling of unrecognized variants to the Err of the Result. This allows exhaustive matching in the Ok branch, in which case code breaking is desirable. I would never do this in a project involving more than like 3 teams, but I think its a justifiable design choice.

I still think going the zero-copy + language neutral route in #6098 is the right way to go, but my (perhaps terrible) compromise is that we can offer the result-y thing behind the object API which is meant to be more native.

@rw pls decide for us

@krojew
Copy link
Contributor Author

krojew commented Oct 5, 2020

@CasperN for forwards compatibility the Result version is provided. Or rather will be because I forgot to put the actual invalid value in the error, which I'll fix tomorrow. The transmutation is for people trusting the buffer is valid at this point in time. The PR, in fact, combines the Rust and c++ approach into one. The user can decide which is appropriate for a given situation.

@rw
Copy link
Collaborator

rw commented Oct 5, 2020

Thanks @krojew for making this PR!

Here is my perspective:

  1. Transmute probably won't work for enum types that are wider than one byte.

  2. While we could add _unchecked methods to all accessors, I think that I'd rather solve the bigger and more fundamental problem of us writing a Verifier in Rust. Once a Verifier is written, we can use that to figure out a way to safely elide bounds checks (if the user desires to use unsafe). This would allow us to talk about the entire buffer as being "checked" or "unchecked", not just particular fields.

Idea: Allow users to specify a feature flag in the crate that prevents the use of unsafe code.

@CasperN CasperN mentioned this pull request Oct 5, 2020
@CasperN
Copy link
Collaborator

CasperN commented Oct 5, 2020

I think that I'd rather solve the bigger and more fundamental problem of us writing a Verifier in Rust

Let's consider that out of scope for now. I filed #6161 to track that effort.

@krojew
Copy link
Contributor Author

krojew commented Oct 6, 2020

@rw I absolutely agree buffer validation is the answer, but I think we should have something in place before we get one. Having c++-like casts solves the enum unsoundness issue we have for now. Also - why do you think transmute won't work for wider types? Their bit representation is the same, after all.

@krojew
Copy link
Contributor Author

krojew commented Oct 6, 2020

I brought back thiserror dependency since it makes life easier and we're not no_std anyway. We'll cross that bridge when we get there.

@rw
Copy link
Collaborator

rw commented Oct 6, 2020

@krojew transmute isn't endian-safe and we support big-endian hosts.

@krojew
Copy link
Contributor Author

krojew commented Oct 6, 2020

@krojew transmute isn't endian-safe and we support big-endian hosts.

get() already takes care of endianess, so we're transmuting proper scalars.

@krojew
Copy link
Contributor Author

krojew commented Oct 6, 2020

I think you need a way to write unrecognized enums into a flatbuffer.

An example use case is a proxy service that talks to embedded devices that report an integer that's really an enum. Maybe after a firmware update a new int is reported and a monitoring service behind the proxy recognizes it but the proxy hasn't been updated yet. In this scenario, being unable to write an invalid enum value will lead to either an outage or a coupling of releases between the proxy service and the embedded devices's firmware.

I'm on the fence about this. On one hand it's a legitimate use case, but on the other, it's putting invalid (at given point in time) values in the buffer in hope the receiving end will know how to interpret them. I personally see this as abusing the schema. Such fields should rather be integers than enums. @aardappel what do you think?

#[derive(Error, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash, Debug)]
pub enum ConvertError<T: Debug + Display> {
#[error("invalid value in buffer: {0}")]
InvalidValue(T),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be called UnknownVariant or UnrecognizedVariant and that it is incorrect to say a variant is "invalid" since it could be created by a future version of the schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.

}
}
}

#[allow(non_camel_case_types)]
pub const ENUM_VALUES_RACE: [Race; 4] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please deprecate ENUM_VALUES_RACE, ENUM_NAMES_RACE, and enum_name_race, and make them associated constants of Race; or leave a TODO to do so in a future pull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These might be useful. Are there reasons to deprecate them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its more idiomatic for these to be associated constants with namespaced names rather than old-c-style prefixy names.
e.g. Race::VALUES instead of ENUM_VALUES_RACE.

While we're renaming things we should give people some opportunity to refactor their code, i.e. deprecate and give a deletion timeline.

self._tab.get::<i8>(Monster::VT_SIGNED_ENUM, Some(-1)).map(|value| value.try_into()).unwrap()
}
#[inline]
pub unsafe fn signed_enum_unchecked(&self) -> Race {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an method for getting the raw i8 value from signed_enum directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add reverse TryFrom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather From, since it's infallible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather From, since it's infallible.

No, not a Enum as i8, but instead a safe infallible getter that gets i8 from the struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@aardappel
Copy link
Collaborator

@krojew

I'm on the fence about this. On one hand it's a legitimate use case, but on the other, it's putting invalid (at given point in time) values in the buffer in hope the receiving end will know how to interpret them. I personally see this as abusing the schema. Such fields should rather be integers than enums. @aardappel what do you think?

FlatBuffers semantics say that such unknown values are possible, and implementations should just ignore them if they don't know what they are. If you must assume that an unknown integer is a valid enum value "from the future", then being able to copy these values is legit as well.

Or put differently, given any particular schema, there are known integer values, and every other integer value is fair game for a future enum value. Thus, no integer value (that fits the underlying type) is ever illegal.

@krojew
Copy link
Contributor Author

krojew commented Oct 7, 2020

FlatBuffers semantics say that such unknown values are possible, and implementations should just ignore them if they don't know what they are. If you must assume that an unknown integer is a valid enum value "from the future", then being able to copy these values is legit as well.

From my understanding, we are not interpreting, nor copying an unknown value, but producing one.

@aardappel
Copy link
Collaborator

@krojew

From my understanding, we are not interpreting, nor copying an unknown value, but producing one.

while that is certainly more odd, it is not something that would count as illegal from the FlatBuffers side.

@krojew krojew closed this Oct 21, 2020
@krojew krojew deleted the rust-enums branch October 21, 2020 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants