Reimplement progressive type checking in Remap #6507

binarylogic · 2021-02-21T02:10:54Z

#6148 lifted type checking to a compile-time error since we felt it provided a better user experience. Specifically, we thought it provided the following benefits:

Typically infallible functions, like round, would no longer be fallible. It was awkward to document otherwise infallible functions as fallible just to handle the argument-type runtime error.
It decoupled type-mismatch errors from actual runtime errors. This forced the user to handle them separately and think about how they want to handle each.

But after updating a few VRL examples I'm questioning if this was a mistake since it adds considerable friction to the language. And while I understand we're at the finish line with the language, this change is worth a final discussion given how much it impacts the UX.

Examples

To facilitate this discussion let's run through two opposing examples:

Typically infallible operation (rounding a number)
Typically fallible operation (parsing a string)

Example1: Typically infallible operation

Typically infallible operations, like rounding a number, heavily influenced our decision to raise type checking to compile-time errors. For example:

.duration = round(.duration)

With compile-time type checking

The user will be met with a compile-time error like:

error[E110]: invalid argument type
  ┌─ :1:1
  │
1 │ .duration = round(.duration)
  │                   ^^^^^^^^^
  │                   │
  │                   this expression resolves to an unknown type
  │                   but the parameter "value" expects the exact type "string"
  │
  = try: guard against invalid type at runtime
  =
  =     .duration = float!(.duration)
  =     .duration = round(.duration)
  =
  = try: coerce with default value
  =
  =     .duration = to_float(.duration) ?? 0.0
  =     .duration = round(.duration)
  =
  = see language documentation at: https://vector.dev/docs/reference/vrl/
  = learn more at: https://errors.vrl.dev/110

And the correct program would look something like:

.duration = float!(.duration)
.duration = round(.duration)

Without compile-time type checking

The user will be met with a compile-time error like:

error[E103]: unhandled assignment runtime error
  ┌─ :1:1
  │
1 │ .duration = round(.duration)
  │ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  │ │
  │ This assignment does not handle errors since .duration is not
  | guaranteed to be a float.
  │
  = try: abort on invalid types
  =
  =     .duration = round!(.duration)
  =
  = try: coerce with default value
  =
  =     .duration = to_float(.duration) ?? 0.0
  =     .duration |= round(.duration)
  =
  = see language documentation at: https://vector.dev/docs/reference/vrl/
  = learn more at: https://errors.vrl.dev/110

Example 2: Typically fallible operation

Let's try a fallible operation, like parsing a Syslog log. This is an operation that can fail due to a malformed string; handling errors is expected:

. = parse_syslog(.message)

With compile-time type checking

The user will be met with a compile-time error like:

error[E110]: invalid argument type
  ┌─ :1:1
  │
1 │ . = parse_syslog(.message)
  │                  ^^^^^^^^^
  │                  │
  │                  this expression resolves to an unknown type
  │                  but the parameter "value" expects the exact type "string"
  │
  = try: guard against invalid type at runtime
  =
  =     .message = string!(.message)
  =     . = parse_syslog(.message)
  =
  = try: coerce with default value
  =
  =     .message = to_string(.message) ?? "Default string"
  =     .message = round(.message)
  =
  = see language documentation at: https://vector.dev/docs/reference/vrl/
  = learn more at: https://errors.vrl.dev/110

And the correct program would look something like:

.message = string!(.message)
. = parse_syslog(.message)

Without compile-time type checking

The user will be met with a compile-time error like:

error[E103]: unhandled assignment runtime error
  ┌─ :1:1
  │
1 │ . = parse_syslog(.message)
  │ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  │ │
  │ This assignment does not handle errors. .message is not guaranteed
  | to be a string and its value could cause a runtime error.
  │
  = try: coerce with default value
  =
  =     .message = to_string(.message) ?? "default"
  =     . |= parse_json(.message)
  =
  = try: abort on errors
  =
  =     . = parse_json!(.message)
  =
  = see language documentation at: https://vector.dev/docs/reference/vrl/
  = learn more at: https://errors.vrl.dev/110

Proposal

As you can see, compile-time type checking is less awkward for typically infallible operations (example 1) but more awkward for typically fallible operations (example 2). I'd argue that typically fallible operations (example 2) are what we should optimize for. For example, if I've written a few VRL programs and I write the following:

. = parse_syslog!(.message)

I should not be greeted with a compile-time type checking error. As far as I'm concerned, a malformed string and non-string values are the same in this context -- they can't parse.

I like this approach since type-safety is opt-in. The user can choose to add friction or not.

Let's discuss and note our decision for posterity. If we decide to relieve compile-time type checking we should do this before 0.12 is released.

The text was updated successfully, but these errors were encountered:

JeanMertz · 2021-02-22T09:28:24Z

So, as mentioned on Discord, in general I agree with the gist of this issue, which is also why I brought this to your attention.

If we do this (and again, I'm in favour), then the question becomes; how do we get there technically.

I think I have a way forward that gives us the best of both worlds:

We don't want to entirely go back to the old situation because it basically meant all functions accepted "any" type at compile-time, and then the function itself had to correctly implement the type_def() method to let the compiler know if providing the specific type would result in a fallible function call.
We don't want you to be able to pass in argument types that we know will always fail
We do want to allow functions that are marked as fallible to take in argument types that might fail if the type turns out to be incorrect.

Let's take an example to demonstrate this:

Maybe Fallible

Given this program:

ok, err = to_string(.foo)
upcase(ok)

This would never compile because ok can either be null or string, depending on if to_string succeeds or not (it would fail if you pass in an array, for example), and upcase expects a string, and nothing else.

If we were to use upcase!(ok), it would still fail because the upcase function is never fallible, and thus ! is rejected.

The solution to this is as @binarylogic already demonstrated, using to_string!(.foo) to force the compiler to stop if .foo isn't a string.

However, in this example, we can't actually be sure that upcase will fail, and so this is what we want to allow.

Always Fallible

Let's take a look at another example:

ok = []
upcase(ok)

In this example, we know upcase can never succeed because we know the accepted parameter type and the provided argument type don't intersect. That is, (string & array) == ().

The Solution

Given these two examples, the solution is as follows:

We keep type checking in the compiler, this keeps the function definitions simple, and avoid implementation errors (which we've had quite a few of in the past).
We loosen the type checking restriction from exact match to intersection. That is to say, previous this would do the following check: [string, null] == [string], where the lhs is the potential types of the argument and the rhs is the expected types of the function parameter. We now change this to become: [string, null] & [string] which would pass because the provided types contains at least one type that is accepted by the function parameter.
In the compiler, we do one of three things:
- If you provide a type that is always accepted, you can keep using the existing way or working (e.g. upcase("foo")).
- If you provide a type that might be accepted, the function accepts the argument, but the compiler marks the function as potentially fallible, which in turn means you are allowed to use !, ??, etc to handle the error result of the function call (e.g. upcase!(foo) where foo is one of string or null).
- If you provide a type that is never accepted, the compiler returns the same type error as it currently does (e.g. upcase([]) would always fail to compile, and using upcase!([]) does not change that fact). This means you can also use upcase!(.foo) because .foo (or any event field) starts as any and thus will always be accepted by any function, since it falls into the "maybe fallible" category above.

The Work

Update the compiler to support the above use-case
Update all functions to replace unwrap_<type> with try_<type>? (can get 90% of the way there with a simple search-and-replace)

jszwedko · 2021-02-23T14:42:48Z

Thanks for this write-up @JeanMertz !

I think I agree with your general approach here.

One suggestion I had was to allow null to be passed through more places. For example, upcase(null) could simply return null. This would avoid any need for error handling for the:

ok, err = to_string(.foo)
upcase(ok)

case. I think it will be common to have types that are (<some type> | null) with the way we have implemented error handling so allowing null to flow through more places could be advantageous.

Alternatively, and this is my preference, we lean even more towards the Go approach and have, ok, not be null if there was an error, but instead be the "zero value" string: "". This would also guarantee that ok is a string.

With either of those improvements, I still think your general idea of allowing compilation of programs where the types intersect, and then allowing failing at runtime, is a good approach.

JeanMertz · 2021-02-25T10:07:11Z

Alternatively, and this is my preference, we lean even more towards the Go approach and have, ok, not be null if there was an error, but instead be the "zero value" string: "". This would also guarantee that ok is a string.

I agree 👍 I'll see if I can sneak this in.

JeanMertz · 2021-02-26T15:29:50Z

Alternatively, and this is my preference, we lean even more towards the Go approach and have, ok, not be null if there was an error, but instead be the "zero value" string: "". This would also guarantee that ok is a string.

I agree 👍 I'll see if I can sneak this in.

While thinking this through, I realized what you propose (and what Go does) is not possible in our case.

(some of) our functions are allowed to return different types based on their input. For example, the slice function takes a string or an array, and returns a string or array based on the input type.

So, there is no sane "default" value that the function can return in an error-situation, unless we (arbitrarily) pick a default for the functions that can return more than one type.

There are still options to consider, but even though I agree a solution to this would be another boost to the language ergonomics, I'd probably lean towards tackling that after 0.12 ships, as there are trade-offs to all of them.

jszwedko · 2021-02-26T20:51:33Z

🤔

(some of) our functions are allowed to return different types based on their input. For example, the slice function takes a string or an array, and returns a string or array based on the input type.

Do you have another example because, with that one, it seems like you could just return an empty array which would have no inner type (or possibly just give it the inner type of the array it was sliced from). I could be misunderstanding.

binarylogic added type: enhancement A value-adding code change that enhances its existing functionality. needs: approval Needs review & approval before work can begin. domain: vrl Anything related to the Vector Remap Language labels Feb 21, 2021

binarylogic assigned JeanMertz Feb 22, 2021

binarylogic added this to the 2021-02-22 - Scottish Deerhound milestone Feb 22, 2021

StephenWakely mentioned this issue Feb 23, 2021

feat(remap): re-add progressive type-checking to the language #6535

Merged

binarylogic removed this from the 2021-02-22 - Scottish Deerhound milestone Feb 23, 2021

JeanMertz closed this as completed in #6535 Mar 2, 2021

jszwedko mentioned this issue Mar 2, 2021

Improve ergonomics of handling fallible function errors #6591

Closed

JeanMertz mentioned this issue Mar 11, 2021

enhancement(remap): improved infallible assignment types #6716

Merged

This was referenced Aug 26, 2021

chore(vrl): RFC for improved error diagnostics #8894

Merged

Correctly Diagnose Invalid Function Argument Types #8926

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement progressive type checking in Remap #6507

Reimplement progressive type checking in Remap #6507

binarylogic commented Feb 21, 2021 •

edited

Loading

JeanMertz commented Feb 22, 2021 •

edited

Loading

jszwedko commented Feb 23, 2021 •

edited

Loading

JeanMertz commented Feb 25, 2021

JeanMertz commented Feb 26, 2021

jszwedko commented Feb 26, 2021

Reimplement progressive type checking in Remap #6507

Reimplement progressive type checking in Remap #6507

Comments

binarylogic commented Feb 21, 2021 • edited Loading

Examples

Example1: Typically infallible operation

With compile-time type checking

Without compile-time type checking

Example 2: Typically fallible operation

With compile-time type checking

Without compile-time type checking

Proposal

JeanMertz commented Feb 22, 2021 • edited Loading

Maybe Fallible

Always Fallible

The Solution

The Work

jszwedko commented Feb 23, 2021 • edited Loading

JeanMertz commented Feb 25, 2021

JeanMertz commented Feb 26, 2021

jszwedko commented Feb 26, 2021

binarylogic commented Feb 21, 2021 •

edited

Loading

JeanMertz commented Feb 22, 2021 •

edited

Loading

jszwedko commented Feb 23, 2021 •

edited

Loading