Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-trivial sum type exhaustiveness checking #330

Merged
merged 26 commits into from
Jun 1, 2021

Conversation

erszcz
Copy link
Collaborator

@erszcz erszcz commented May 23, 2021

I tried exercising sum type exhaustiveness checking on some practical examples and realised that these checks only worked on simple sum types whose variants were builtin types - e.g. an alternative of atoms:

-type allergen() :: eggs
                  | chocolate
                  | pollen
                  | cats.

I hoped I could rely on these checks when defining functions over a type with many variants, each carrying extra data. For example:

-type term_() :: {true, info()}
               | {false, info()}
               | {'if', info(), term_(), term_(), term_()}
               | {zero, info()}
               | {succ, info(), term_()}
               | {pred, info(), term_()}
               | {is_zero, info(), term_()}.

The first approach was to enable typechecker:refinable() and gradualizer_lib:pick_value() to handle {user_type, _, _, _} types. However, this first implementation was prone to falling into an infinite loop on recursive variants. This lead me to add the Trace parameter to refinable - in other words, I optimistically assume that a type T is refinable as soon as recursion on T's variants reaches T again. This assumes that the other non-recursive variants of T will ultimately determine its refinability. Thanks to storing a trace of already seen types instead of a binary seen | not-seen flag mutually recursive types are also handled gracefully.

Then it turned out that if the sum type is defined in another module, the check doesn't work. Obviously, data types like this might be handled in multiple modules of a project, so this was not acceptable. The fix I applied is too look up user types in gradualizer_db first and only if not found there try the available TEnv. I tried the reverse order (TEnv first, then gradualizer_db), but it seems to lead to more convoluted code since #tenv{} is not available in a header, therefore not shareable with gradualizer_lib - nothing impossible to handle, but the former fix was already good enough.

Ultimately, this allows to rely on exhaustiveness warnings when working with (mutually recursive) non-trivial variant types from external modules, which IMO is a significant ergonomics improvement. For a larger example, please see erszcz/tapl-erlang@300d1ed and try:

$ rebar3 compile
$ gradualizer -pa _build/default/lib/arith/ebin -- src/arith_core.erl
src/arith_core.erl: Nonexhaustive patterns on line 34 at column 9
Example values which are not covered:
	{false, 0}

Gradualizer correctly detects the missing clauses:

-type term_() :: arith_syntax:term_().

-spec eval1(term_()) -> term_().
eval1(T) ->
    case T of
        %{false, _} ->
        %    erlang:throw(constant);
        %{true, _} ->
        %    erlang:throw(constant);
        %{zero, _} ->
        %    erlang:throw(constant);
        %{'if', _, {true, _}, T2, _} ->
        %    T2;
        %{'if', _, {false, _}, _, T3} ->
        %    T3;
        %{'if', Info, T1, T2, T3} ->
        %    T1_ = eval1(T1),
        %    {'if', Info, T1_, T2, T3};
        {succ, Info, T1} ->
            T1_ = eval1(T1),
            {succ, Info, T1_};
        {pred, _, {zero, _}} ->
            {zero, 0};
        {pred, _, {succ, _, T1}} ->
            T1;
        {pred, Info, T1} ->
            T1_ = eval1(T1),
            {pred, Info, T1_};
        {is_zero, _, {zero, _}} ->
            {true, 0};
        {is_zero, _, {succ, _, _}} ->
            {false, 0};
        {is_zero, Info, T1} ->
            T1_ = eval1(T1),
            {is_zero, Info, T1_}
    end.

@codecov-commenter
Copy link

codecov-commenter commented May 23, 2021

Codecov Report

Merging #330 (5e85d5b) into master (3ad6dc4) will increase coverage by 0.07%.
The diff coverage is 93.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #330      +/-   ##
==========================================
+ Coverage   87.39%   87.46%   +0.07%     
==========================================
  Files          18       18              
  Lines        2744     2760      +16     
==========================================
+ Hits         2398     2414      +16     
  Misses        346      346              
Impacted Files Coverage Δ
src/gradualizer_lib.erl 88.17% <92.00%> (+1.68%) ⬆️
src/typechecker.erl 90.29% <95.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ad6dc4...5e85d5b. Read the comment docs.

Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice feature!

I have only some suggestions on style.

I have the feeling we now have multiple ways to lookup user types. Is it worth refactoring the other accesses to get less ways of doing it?

src/typechecker.erl Outdated Show resolved Hide resolved
src/typechecker.erl Outdated Show resolved Hide resolved
src/gradualizer_lib.erl Outdated Show resolved Hide resolved
src/gradualizer_lib.erl Outdated Show resolved Hide resolved
%% Let's check if the type is a known remote type.
case typelib:get_module_from_annotation(Anno) of
{ok, Module} ->
case gradualizer_db:get_opaque_type(Module, Name, Args) of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw you're using get_opaque_type here. I don't think want to do exhaustiveness-checking on opaque remote types, since they are opaque...

Copy link
Collaborator Author

@erszcz erszcz May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this was just an easy to way to avoid the opaque return value from gradualizer_db:get_type, which can be handled easily in refinable (opaque is not refinable), but will require an error to be thrown if it ever shows up being returned in pick_value. I wanted to avoid introducing a new kind of error as I don't see anything relevant to opaque in gradualizer_fmt, but maybe it's the best way to go? Do you have any suggestions?

Copy link
Collaborator

@zuiderkwast zuiderkwast May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. If we want to pick_value of type {tag, opaque_type()}, we'd need a value of the opaque type. But OTOH opaque values shouldn't be shown.

Since pick_value is only used for producing error messages, perhaps we can put an underscore, a variable name or three dots in there instead? {tag, _}, {tag, OpaqueValue1} or {tag, ...}, whatever is best. We could also change the exhaustiveness error messages slightly so that it doesn't explicitly say that it's a value of the type. Just some ideas. WDYT?

@erszcz
Copy link
Collaborator Author

erszcz commented May 25, 2021

I've pushed some changes - all the style / parameter order / spec points are addressed (unless you have more comments on these). The open issues I still have to think of are the throw(opaque, ...) from pick_value and the Args <-> Params substitution - any more feedback is appreciated :)

{ok, Ty} ->
pick_value(Ty, Types);
opaque ->
throw({opaque, user_type, Anno, {Name, length(Args)}});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: return {var, Anno, '_'} here, for now. Or generate a unique OpaqueValueNNN variable name using unique_integer or something.


-type g(T) :: ok | {generic, T}.

-spec generic_with_remote_opaque(g(exhaustive_user_type:opaque_t())) -> ok.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice that you managed to test this stuff!

src/typechecker.erl Outdated Show resolved Hide resolved
erszcz added 17 commits May 28, 2021 13:52
The unfortunate name clash lead to errors when using `-pa test_data`,
which is handy when tinkering.

Example errors:

  $ ./bin/gradualizer -pa test_data -- test/should_fail/exhaustive_remote_user_type.erl
  test/should_fail/exhaustive_remote_user_type.erl: escript: exception error: undefined function string:length/1
    in function  erl_pp:leaf/1 (erl_pp.erl, line 957)
    in call from erl_pp:'-lexprs/3-lc$^0/1-0-'/3 (erl_pp.erl, line 949)
    in call from erl_pp:'-lexprs/3-lc$^0/1-0-'/3 (erl_pp.erl, line 949)
    in call from erl_pp:tuple/3 (erl_pp.erl, line 937)
    in call from erl_pp:expr/1 (erl_pp.erl, line 173)
    in call from gradualizer_fmt:format_type_error/2 (src/gradualizer_fmt.erl, line 55)
    in call from gradualizer_fmt:handle_type_error/2 (src/gradualizer_fmt.erl, line 519)

  $ erl -pa ebin/ -pa test_data
  Erlang/OTP 24 [RELEASE CANDIDATE 3] [erts-12.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

  Eshell V12.0  (abort with ^G)
  1> a*** ERROR: Shell process terminated! ***
  2021-05-23T12:24:51.599301+02:00 error: FORMATTER CRASH: {"Error in process ~p with exit value:~n~p~n",[<0.65.0>,{undef,[{string,to_graphemes,["ap"],[]},{edlin,do_op,4,[{file,"edlin.erl"},{line,286}]},{edlin,edit,5,[{file,"edlin.erl"},{line,146}]},{group,more_data,6,[{file,"group.erl"},{line,668}]},{group,get_chars_loop,9,[{file,"group.erl"},{line,471}]},{group,io_request,6,[{file,"group.erl"},{line,181}]},{group,server_loop,3,[{file,"group.erl"},{line,117}]}]}]}
This enables to check exhaustiveness of recursive remote types,
see (test/should_fail) exhaustive_remote_user_type:local_alias_to_recursive_type/1.
While this is not 100% accurate for remote type exhaustiveness checking
it fixes the immediate problem.
Without this, we get these in tests:

  test:27: gen_should_pass (map_pattern.erl)...*failed*
  in function test:'-gen_should_pass/0-fun-0-'/1 (test/test.erl, line 27)
  in call from eunit_test:run_testfun/1 (eunit_test.erl, line 71)
  in call from eunit_proc:run_test/1 (eunit_proc.erl, line 522)
  in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 347)
  in call from eunit_proc:handle_test/2 (eunit_proc.erl, line 505)
  in call from eunit_proc:tests_inorder/3 (eunit_proc.erl, line 447)
  in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 337)
  in call from eunit_proc:run_group/2 (eunit_proc.erl, line 561)
  **error:{assertEqual,[{module,test},
                {line,27},
                {expression,"gradualizer : type_check_file ( File )"},
                {expected,ok},
                {value,nok}]}
    output:<<"test/should_pass/map_pattern.erl: Nonexhaustive patterns on line 23 at column 1
  Example values which are not covered:
      _Opaque
  ">>

  test:27: gen_should_pass (stuff_as_top.erl)...*failed*
  in function test:'-gen_should_pass/0-fun-0-'/1 (test/test.erl, line 27)
  in call from eunit_test:run_testfun/1 (eunit_test.erl, line 71)
  in call from eunit_proc:run_test/1 (eunit_proc.erl, line 522)
  in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 347)
  in call from eunit_proc:handle_test/2 (eunit_proc.erl, line 505)
  in call from eunit_proc:tests_inorder/3 (eunit_proc.erl, line 447)
  in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 337)
  in call from eunit_proc:run_group/2 (eunit_proc.erl, line 561)
  **error:{assertEqual,[{module,test},
                {line,27},
                {expression,"gradualizer : type_check_file ( File )"},
                {expected,ok},
                {value,nok}]}
    output:<<"test/should_pass/stuff_as_top.erl: Nonexhaustive patterns on line 12 at column 1
  Example values which are not covered:
      _Opaque
  test/should_pass/stuff_as_top.erl: Nonexhaustive patterns on line 19 at column 1
  Example values which are not covered:
      _Opaque
  test/should_pass/stuff_as_top.erl: Nonexhaustive patterns on line 28 at column 1
  Example values which are not covered:
      _Opaque
  ">>
This comes in handy for inspecting specific warnings for should_fail tests from the command line
with:

  gradualizer -pa ./test_data -- test/should_fail/<test_module>.erl
@erszcz
Copy link
Collaborator Author

erszcz commented Jun 1, 2021

I think this one is ready to merge, unless you have further comments.

Maybe the last commit introducing behaviour described by test/known_problems/should_fail/exhaustive_map_variants.erl is debatable, since AFAICT map typing is still not completely sound, and I'm not sure if you think map variants are a reasonable goal (I personally think they are). WDYT?

@zuiderkwast
Copy link
Collaborator

Yes, I will look through it again. I might have a few minor comment.

I think we should aim for handling map variants/unions.

Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may accept it as is, if you reply to the comments. :-)

Comment on lines +9 to +16
-type map_sum_t() :: #{field_one := _}
| #{field_two := _}.

-spec map_variants(map_sum_t()) -> ok.
map_variants(T) ->
case T of
#{field_one := _} -> ok
end.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably doesn't even work with the map union directly, i.e. without the user type. A minimal example of the known problem would be using the union type directly, without a user type.

Copy link
Collaborator Author

@erszcz erszcz Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really looked for the reason of this not being detected - I'm aware that typing maps in general is not 100% correct yet, so it might be because of this PR or because of the existing code. I just wanted to add map variants for completeness' sake, since ordinary tuples and records are already taken care of.

Makefile Outdated Show resolved Hide resolved
{ok, Ty} ->
pick_value(Ty, TEnv);
opaque ->
{var, Anno, '_Opaque'};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple opaque types in a picked value, using the same variable name would imply all opaque values are equal, as in {picked_tuple, _Opaque, _Opaque}. That's why I suggested adding a unique integer to the name or use just a _. Not a big deal though since this only for an error message. There is a point in not flooding the atom table as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this might be slightly misleading if more than one opaque type is used... I wanted to avoid passing extra mappings from types to names, but it shouldn't be hard to add - I'll do it in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for passing around extra mappings? I was thinking just

list_to_atom("Opaque" ++ integer_to_list(erlang:unique_integer([positive])))

Numbers might look a bit random in this way but at least they would be unique.

Copy link
Collaborator

@zuiderkwast zuiderkwast Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing extra mappings from types to names

The same type used twice doesn't mean it's the same value, so I think it would actually be wrong to use the same variable twice in this case. [Edit] Oh, well, since we're generating an example, using the same variable for the same type would be OK...

Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy now.

Do you wan to summarize a commit message for a squash merge?

@erszcz
Copy link
Collaborator Author

erszcz commented Jun 1, 2021

Maybe:

Add non-trivial sum type exhaustiveness checking

Enable exhaustiveness checking on sum types whose variants are tuples and records. Local and remote types, as well as (mutually) recursive types are supported. Map variant tests are added to known problems, but support is not present yet.

Please adjust it as you see appropriate.

@zuiderkwast zuiderkwast merged commit e226803 into josefs:master Jun 1, 2021
@zuiderkwast
Copy link
Collaborator

Awesome!

@erszcz erszcz deleted the exhaustive-user-type branch June 1, 2021 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants