Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify Constrain Errors #4239

Closed
vezenovm opened this issue Feb 2, 2024 · 3 comments
Closed

Unify Constrain Errors #4239

vezenovm opened this issue Feb 2, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@vezenovm
Copy link
Contributor

vezenovm commented Feb 2, 2024

Problem

This TODO was added in #4101. This is due to having two different strategies for resolving constrain instruction error messages which can be seen in this enum (https://github.com/noir-lang/noir/blob/afcb385daa572f990178eda51faf10dc20acd2d0/compiler/noirc_evaluator/src/ssa/ir/instruction.rs#L5730). We have two different strategies as sometimes when doing SSA codegen we want to include a constrain with a message, but we can't easily codegen a call to resolve_assert_message in the same way we do for user provided assertion messages. Thus, we have this assert_messages map for resolving messages specified by the compiler and separate calls to resolve_assert_message for handling assert messages specified by the user.

Happy Case

We should unify the strategy for specifying errors and error types. Ideally we would remove the assert_messages map from the Circuit type. We most likely could instead store errors and their respective types on the ABI rather than on the circuit. Upon circuit failure we can then use the ABI for fetching the appropriate error.

Some example pseudocode written by @TomAFrench from when we were originally planning #4101.

// This defines a class of errors, e.g. overflows
// This avoids us duplicating the definition of how to format this error message many times.
struct ErrorType {
    format_string: String
    types: Vec<AbiType>
}

// This defines a specific instance.
struct Error {
    type_id: u32
    inputs: Vec<(string, Range<Witness>)>
}

struct ABI {
    ...
    error_types: HashMap<u32, ErrorType>,
    errors: HashMap<u32, Error>
    ...
}

Unifying these constrain errors would essentially require full structured errors as part of the ABI. We could then have followup work to enable custom errors inside Noir similar to https://soliditylang.org/blog/2021/04/21/custom-errors/.

Alternatives Considered

No response

Additional Context

No response

Would you like to submit a PR for this Issue?

No

Support Needs

No response

@sirasistant
Copy link
Contributor

sirasistant commented Apr 18, 2024

Dynamic assert payloads

Current state

Consider the following code containing a "dynamic" assert payload:

fn option_expect<T, Err>(opt: Option<T>, err: Err) -> T {
    assert(opt.is_some(), err);
    opt.unwrap_unchecked()
}

fn main(option: Option<u8>) -> pub u8 {
    option_expect(option, "option is none")
}

vs the static assert payload version of the code:

fn main(option: Option<u8>) -> pub u8 {
    assert(option.is_some(), "option is none");
    option.unwrap_unchecked()
}

This is the generated SSA for the static version:

Initial SSA:
acir(inline) fn main f0 {
  b0(v0: u1, v1: u8):
    v3 = call f1(v0, v1)
    constrain v3 == u1 1 "option is none"
    v6 = call f2(v0, v1)
    return v6
}

And this one for the "dynamic" version:

Initial SSA:
acir(inline) fn main f0 {
  b0(v0: u1, v1: u8):
    v12 = call f1(v0, v1, [u8 111, u8 2⁴×7, u8 116, u8 105, u8 111, u8 110, u8 2⁵, u8 105, u8 115, u8 2⁵, u8 110, u8 111, u8 110, u8 101])
    return v12
}
acir(inline) fn option_expect f1 {
  b0(v0: u1, v1: u8, v2: [u8; 14]):
    inc_rc v2
    v4 = call f2(v0, v1)
    v8 = call f2(v0, v1)
    inc_rc v2
    constrain v4 == u1 1 call f3(v2, v8)
    v10 = call f4(v0, v1)
    dec_rc v2
    return v10
}
acir(inline) fn is_some f2 {
  b0(v0: u1, v1: u8):
    return v0
}
brillig fn resolve_assert_message f3 {
  b0(v0: [u8; 14], v1: u1):
    inc_rc v0
    v2 = not v1
    jmpif v2 then: b1, else: b2
  b1():
    call v3(v0, [u8 123, u8 34, u8 107, u8 105, u8 110, u8 100, u8 34, u8 58, u8 34, u8 115, u8 116, u8 114, u8 105, u8 110, u8 103, u8 34, u8 44, u8 34, u8 108, u8 101, u8 110, u8 103, u8 116, u8 104, u8 34, u8 58, u8 49, u8 52, u8 125], u1 0)
    jmp b2()
  b2():
    dec_rc v0
    return 
}
acir(inline) fn unwrap_unchecked f4 {
  b0(v0: u1, v1: u8):
    return v1
}

The dynamic version has undergone some modifications before conversion to SSA. Let's explain what are those:

Modifications at HIR

        if matches!(
            assert_message_expr,
            Expression { kind: ExpressionKind::Literal(Literal::Str(..)), .. }
        ) {
            return Some(self.resolve_expression(assert_message_expr));
        }

        let is_in_stdlib = self.path_resolver.module_id().krate.is_stdlib();
        let assert_msg_call_path = if is_in_stdlib {
            ExpressionKind::Variable(Path {
                segments: vec![Ident::from("internal"), Ident::from("resolve_assert_message")],
                kind: PathKind::Crate,
                span,
            })
        } else {
            ExpressionKind::Variable(Path {
                segments: vec![
                    Ident::from("std"),
                    Ident::from("internal"),
                    Ident::from("resolve_assert_message"),
                ],
                kind: PathKind::Dep,
                span,
            })
        };
        let assert_msg_call_args = vec![assert_message_expr.clone(), condition];
        let assert_msg_call_expr = Expression::call(
            Expression { kind: assert_msg_call_path, span },
            assert_msg_call_args,
            span,
        );
        Some(self.resolve_expression(assert_msg_call_expr))

At the HIR level, if the assert message is not a literal string, a call to resolve_assert_message is introduced. This is the implementation of resolve_assert_message:

#[oracle(assert_message)]
unconstrained fn assert_message_oracle<T>(_input: T) {}
unconstrained pub fn resolve_assert_message<T>(input: T, condition: bool) {
    if !condition {
        assert_message_oracle(input);
    }
}

This explains this added function in the SSA:

...
    constrain v4 == u1 1 call f3(v2, v8)
...

brillig fn resolve_assert_message f3 {
  b0(v0: [u8; 14], v1: u1):
    inc_rc v0
    v2 = not v1
    jmpif v2 then: b1, else: b2
  b1():
    call v3(v0, [u8 123, u8 34, u8 107, u8 105, u8 110, u8 100, u8 34, u8 58, u8 34, u8 115, u8 116, u8 114, u8 105, u8 110, u8 103, u8 34, u8 44, u8 34, u8 108, u8 101, u8 110, u8 103, u8 116, u8 104, u8 34, u8 58, u8 49, u8 52, u8 125], u1 0)
    jmp b2()
  b2():
    dec_rc v0
    return 
}

Modifications at monomorphization

After monomorphization, the assert_message oracle call is modified with metadata about the type it's being called with, in a similar way as the print opcode.

        if let ast::Expression::Ident(ident) = original_func.as_ref() {
            if let Definition::Oracle(name) = &ident.definition {
                if name.as_str() == "print" {
                    // Oracle calls are required to be wrapped in an unconstrained function
                    // The first argument to the `print` oracle is a bool, indicating a newline to be inserted at the end of the input
                    // The second argument is expected to always be an ident
                    self.append_printable_type_info(&hir_arguments[1], &mut arguments);
                } else if name.as_str() == "assert_message" {
                    // The first argument to the `assert_message` oracle is the expression passed as a message to an `assert` or `assert_eq` statement
                    self.append_printable_type_info(&hir_arguments[0], &mut arguments);
                }
            }
        }

The metadata is the serialized ABI of the type:

    fn append_printable_type_info_inner(typ: &Type, arguments: &mut Vec<ast::Expression>) {
        // Disallow printing slices and mutable references for consistency,
        // since they cannot be passed from ACIR into Brillig
        if matches!(typ, HirType::MutableReference(_)) {
            unreachable!("println and format strings do not support mutable references.");
        }

        let printable_type: PrintableType = typ.into();
        let abi_as_string = serde_json::to_string(&printable_type)
            .expect("ICE: expected PrintableType to serialize");

        arguments.push(ast::Expression::Literal(ast::Literal::Str(abi_as_string)));
    }

This explains the call to v3 (assert_message) in SSA:

brillig fn resolve_assert_message f3 {
  b0(v0: [u8; 14], v1: u1):
    inc_rc v0
    v2 = not v1
    jmpif v2 then: b1, else: b2
  b1():
    call v3(v0, [u8 123, u8 34, u8 107, u8 105, u8 110, u8 100, u8 34, u8 58, u8 34, u8 115, u8 116, u8 114, u8 105, u8 110, u8 103, u8 34, u8 44, u8 34, u8 108, u8 101, u8 110, u8 103, u8 116, u8 104, u8 34, u8 58, u8 49, u8 52, u8 125], u1 0)
    jmp b2()
  b2():
    dec_rc v0
    return 
}

As can be seen here, the oracle call to assert_message includes not only the string v0 but also a literal byte array, that being the abi_as_string.

Compilation and runtime

This strategy will make it so when evaluating this instruction constrain v4 == u1 1 call f3(v2, v8) (the assertion on is_some()) a brillig call will be issued to a resolve_assert_message with the string and the condition. Then the brillig function will conditionally call the assert_message oracle with the string + extra metadata if the condition is false.

But how is this assert_message oracle related with the original assertion?
The runtime (acvm_js and nargo) recognizes this special oracle, parses the type using the metadata, stores the parsed result, and when the circuit fails, it has in cache the last call to assert_message.

Comparison with static assert messages

With static assert messages, no transformations happen on the frontend. The generated SSA

Initial SSA:
acir(inline) fn main f0 {
  b0(v0: u1, v1: u8):
    v3 = call f1(v0, v1)
    constrain v3 == u1 1 "option is none"
    v6 = call f2(v0, v1)
    return v6
}

gets compiled to ACIR and in compilation to ACIR, the literal string is stored in a Map<OpcodeLocation, String> inside the circuit. So when the circuit fails, the runtime just picks what String is stored there for the failing opcode.

If the function is brillig, instead of being added to the Map<OpcodeLocation, String> the error string is directly returned as revert data.

Challenges with current state

This approach, altough a bit fragile (since modifications in SSA assertions need to account for dynamic assertions, or bugs like this one can happen) works fine for circuits. But it poses some challenges for public functions in aztec:

  • The runtime is required to know about dynamic assertions, storing the error payload for a later error. This is quite problematic for the implementation of the VM, since it would need to emulate what acvm_js is doing right now.
  • Metadata about the type is embedded in the bytecode. This would increase bytecode size and runtime cost for public functions, with this data only being relevant for tooling.
  • There is a small extra cost of adding an extra function call for constrains, but it should be easily removed by inlining.

Alternative approach

Instead of using an entirely new approach for dynamic assertion messages, we can follow the same approach that we have for static assertion messages.
The Map<OpcodeLocation, String> that the circuit struct holds could be extended to be a Map<OpcodeLocation, AssertionData>
The type AssertionData would be:

struct AssertionData {
    typing_metadata: PrintableType, // This is what is currently serialized into the bytecode.
    payload: AssertionPayload
};

enum AssertionPayload {
    Witness(Vec<Expression>),
    MemoryArray(BlockId),
    BrilligOutput,  // The brillig VM returns the data directly in revert data
}

So the example code:

fn option_expect<T, Err>(opt: Option<T>, err: Err) -> T {
    assert(opt.is_some(), err);
    opt.unwrap_unchecked()
}

fn main(option: Option<u8>) -> pub u8 {
    option_expect(option, "option is none")
}

would generate the following SSA:

Initial SSA:
acir(inline) fn main f0 {
  b0(v0: u1, v1: u8):
    v12 = call f1(v0, v1, [u8 111, u8 2⁴×7, u8 116, u8 105, u8 111, u8 110, u8 2⁵, u8 105, u8 115, u8 2⁵, u8 110, u8 111, u8 110, u8 101])
    return v12
}
acir(inline) fn option_expect f1 {
  b0(v0: u1, v1: u8, v2: [u8; 14]):
    inc_rc v2
    v4 = call f2(v0, v1)
    inc_rc v2
    constrain v4 == u1 1 v2 // Notice here constrain directly holds an SSA value
    v10 = call f4(v0, v1)
    dec_rc v2
    return v10
}
acir(inline) fn is_some f2 {
  b0(v0: u1, v1: u8):
    return v0
}
acir(inline) fn unwrap_unchecked f4 {
  b0(v0: u1, v1: u8):
    return v1
}

ACIR generation and Brillig generation need access to the PrintableType metadata that is generated from the HIR for all non-string assert payloads. It could be maintained in parallel or directly embedded in the constrain instruction in SSA, altough the latter sounds worse. This is necessary since we need to build the AssertionData defined above.

When an opcode fails, the ACVM immediately has enough data to:

  • Extract the payload of the failing opcode
  • Know the types of the failing opcode payload and how they could be printed for an user.

Advantages of this approach

  • Supporting static non-string assertion payloads
  • Removes any notion of PrintableType from the bytecode, maintaining metadata outside.
  • No more need to cache oracle calls to figure out the payload for a failing assertion.
  • The constrain instruction just holds a regular SSA value. No need to special case it for everything like we have to do with the call to resolve_assert_message that has a parallel version of the constrain condition, etc.
  • The distinction of static constrain payload or dynamic constrain payload happens after SSA optimization. So any payload that can be determined static via optimization wouldn't end up in the bytecode, unlike now (in fact my example above should be determined constant at inlining!)
  • AVM support would now be easy, since no extraneous oracles have been added. Brillig gen just has to properly serialize the passed in type (possibly using the same serialization strategy as oracle calls have).

Updates to the proposal:

No metadata in the circuit struct

We can avoid storing metadata about assertion payload types in the circuit struct if we put it in the ABI.
The circuit struct will have a map of Map<OpcodeLocation, AssertionPayload> instead of Map<OpcodeLocation, AssertionData>. The metadata about the type of the error payload will live in the ABI instead. Nargo and other runners are then responsible to properly format the error data.

@TomAFrench
Copy link
Member

No more need to cache oracle calls to figure out the payload for a failing assertion.

I think that's unnecessary now anyway now we have revert data in brillig so this is a constant between all proposals. This was just a hack in the short term.

@sirasistant sirasistant self-assigned this Apr 22, 2024
TomAFrench pushed a commit to AztecProtocol/aztec-packages that referenced this issue Apr 26, 2024
This PR implements this proposal:
noir-lang/noir#4239 (comment)
 - Removes legacy dynamic assertions via calling brillig + oracle
 - Removes legacy oracle handling for assertions
- Frontend adds the HirType in the ConstrainError instead of embedding
metadata about it in the program logic.
 - Constrain instruction in SSA now has `Values` for payload
 - SSA gen generates an error selector from the payload type
- ACIR gen and Brillig gen handle the payload ValueIds and error
selector
 - ABI has now error_types for non-string errors
 - ACVM now resolves the payload for non-constant AssertionPayloads
- Nargo decodes the error for non-string errors using the ABI, allowing
use in nargo tests
 
 Things to do in a followup PR:
- Add an entry point in noirc_abi_wasm that allows decoding a given
`Raw` (non-string) using the ABI in the same way that nargo does.
@TomAFrench
Copy link
Member

Closing this after AztecProtocol/aztec-packages#5949, we can reopen an issue for the next iteration.

@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Noir Apr 26, 2024
AztecBot pushed a commit to AztecProtocol/barretenberg that referenced this issue Apr 27, 2024
This PR implements this proposal:
noir-lang/noir#4239 (comment)
 - Removes legacy dynamic assertions via calling brillig + oracle
 - Removes legacy oracle handling for assertions
- Frontend adds the HirType in the ConstrainError instead of embedding
metadata about it in the program logic.
 - Constrain instruction in SSA now has `Values` for payload
 - SSA gen generates an error selector from the payload type
- ACIR gen and Brillig gen handle the payload ValueIds and error
selector
 - ABI has now error_types for non-string errors
 - ACVM now resolves the payload for non-constant AssertionPayloads
- Nargo decodes the error for non-string errors using the ABI, allowing
use in nargo tests
 
 Things to do in a followup PR:
- Add an entry point in noirc_abi_wasm that allows decoding a given
`Raw` (non-string) using the ABI in the same way that nargo does.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

3 participants