Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the loop_break_value feature. #37487

Merged
merged 1 commit into from
Nov 23, 2016
Merged

Conversation

goffrie
Copy link
Contributor

@goffrie goffrie commented Oct 30, 2016

This implements RFC 1624, tracking issue #37339.

  • FnCtxt (in typeck) gets a stack of LoopCtxts, which store the
    currently deduced type of that loop, the desired type, and a list of
    break expressions currently seen. loop loops get a fresh type
    variable as their initial type (this logic is stolen from that for
    arrays). while loops get ().
  • break {expr} looks up the broken loop, and unifies the type of
    expr with the type of the loop.
  • break with no expr unifies the loop's type with ().
  • When building MIR, loops no longer construct a () value at
    termination of the loop; rather, the break expression assigns the
    result of the loop.
  • I have also changed the loop scoping in MIR-building so that the test
    of a while loop is not considered to be part of that loop. This makes
    the rules consistent with resolve: fix label scopes #37360. The new loop scopes in typeck also
    follow this rule. That means that loop { while (break) {} } now
    terminates instead of looping forever. This is technically a breaking
    change.
  • On that note, expressions like while break {} and if break {} no
    longer parse because {} is interpreted as an expression argument to
    break. But no code except compiler test cases should do that anyway
    because it makes no sense.
  • The RFC did not make it clear, but I chose to make break () inside
    of a while loop illegal, just in case we wanted to do anything with
    that design space in the future.

This is my first time dealing with this part of rustc so I'm sure
there's plenty of problems to pick on here ^_^

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Aatch (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

body_block = loop_block;
}

let might_break = this.in_loop_scope(
Copy link
Member

@nagisa nagisa Oct 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: formatting is extra-weird here. extracting destination into variable here might be better.

I also feel that you might not need the destination to be conditional (i.e. Option<T>) here either. Instead, just assign unit to the destination if there’s break without EXPR.

}

// The “return” value of the loop body must always be an unit, but we cannot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this well-formed-ness business being quite non-trivial, so I feel like this comment should stay. That being said its entirely possible that introducing break EXPR made this well-formedness story obsolete, in which case this whole code might be ready to be cleaned up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the comment is about reusing the value of the loop's block (which is always ()) as the value of the loop itself. When you consider that break EXPR exists, though, it becomes apparent (IMO) that this is nonsensical because the value of the loop is not always (). That's why I removed most of the comment.

let (exit_block, extent) = {
let loop_scope = self.find_loop_scope(span, label);
(exit_selector(loop_scope), loop_scope.extent)
};
self.exit_scope(span, extent, block, exit_block);
self.cfg.start_new_block().unit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please just inline this? Having a helper method with just 2 lines with 2 callers doesn’t seem to justify it anymore.

(break_block, extent, destination.clone())
};
if let Some(dest) = destination {
if let Some(value) = value.take() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: value.take() is superfluous and this is ignoring the continuation block returned by either branch. This whole function could instead look like this:

let scope = this.find_loop_scope(expr_span, label);
scope.might_break = true;
if let Some(ref dst) = scope.destination {
    block /* this is important! */ = match value {
        Some(value) => unpack!(block = this.into(dst, block, value)),
        None => this.cfg.push_assign_unit(block, source_info, dst),
    };
}
this.exit_scope(expr_span, scope.extent, scope.break_block, block);
this.cfg.start_new_block().unit()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why this code is so awkward is that we can't keep this borrowed when we get to further calls like this.into. (find_loop_scope borrows &mut self and that's tied to the returned &mut LoopScope).
Also, the first branch already takes care of the returned block (it uses unpack!; into returns BlockAnd<()>) while the second branch doesn't return a block (push_assign_unit returns ()).

.take() is a remnant though, my bad.

Copy link
Member

@nagisa nagisa Oct 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the first branch already takes care of the returned block (it uses unpack!

Ah, I missed the block = inside the unpack, my bad.

The reason why this code is so awkward is that we can't keep this borrowed

You can manually relinquish the borrow by doing something along the lines of:

let (extent, break_block) = {
let scope = this.find_loop_scope(expr_span, label);
scope.might_break = true;
if let Some(ref dst) = scope.destination {
    block /* this is important! */ = match value {
        Some(value) => unpack!(block = this.into(dst, block, value)),
        None => this.cfg.push_assign_unit(block, source_info, dst),
    };
}
(scope.extent, scope.break_block)
};
this.exit_scope(expr_span, extent, break_block, block);
this.cfg.start_new_block().unit()

This should work because both Extent and BasicBlock are Copy AFAIR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error[E0499]: cannot borrow `*this` as mutable more than once at a time
  --> src/librustc_mir/build/expr/stmt.rs:91:45
   |
87 |                     let scope = this.find_loop_scope(expr_span, label);
   |                                 ---- first mutable borrow occurs here
...
91 |                             unpack!(block = this.into(dest, block, value))
   |                             ----------------^^^^--------------------------
   |                             |               |
   |                             |               second mutable borrow occurs here
   |                             in this macro invocation
...
97 |                 };
   |                 - first borrow ends here

I don't think we can avoid cloning destination without unsafe code because the borrow checker doesn't know that this.into will never pop the loop scope (among other reasons).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can avoid cloning destination without unsafe code because the borrow checker doesn't know that this.into will never pop the loop scope (among other reasons).

Right, for this reason the clone is not unnecessary either, so you ought to remove the comment.


#![feature(loop_break_value)]

enum Void {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the ! type instead of defining Void.

@@ -228,4 +228,5 @@ pub impl Foo for Bar {
register_diagnostics! {
E0472, // asm! is unsupported on this target
E0561, // patterns aren't allowed in function pointer types
E0571, // `break` with an argument in a `while` loop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about break in for loops? Can we investigate desugaring while loops into regular loop instead of adding this special case?

@nagisa
Copy link
Member

nagisa commented Oct 30, 2016

This is technically a breaking change.

A silent and very breaking change. Might need discussion and crater runs.

@goffrie
Copy link
Contributor Author

goffrie commented Oct 30, 2016

While you are right, I just want to mention that the rules were inconsistent before and MIR building was in the minority, since both the "check loops" pass and label resolution (as of #37360) believed that a break in the test of a while could not refer to the while itself. Though, the construct itself is quite confusing and perhaps we should just ban it (users can write an explicit label if they want).

@goffrie
Copy link
Contributor Author

goffrie commented Oct 31, 2016

Hm, I just found out that while let and for are desugared into ExprLoop loops, not ExprWhile (which makes perfect sense in retrospect). I guess that checking pass needs to change regardless. Also my typeck code panics on break () in for loops --- but not while let, go figure.

@goffrie
Copy link
Contributor Author

goffrie commented Oct 31, 2016

Alright, I removed the conditional loop destination - break now always writes its argument (or ()). However, in the case of a while loop, a break will lead to two writes of () since we need to do the write in case the test evaluates to false. Do you think that's too ugly?

@eddyb
Copy link
Member

eddyb commented Nov 1, 2016

On that note, expressions like while break {} and if break {} no longer parse because {} is interpreted as an expression argument to
break.
But no code except compiler test cases should do that anyway because it makes no sense.

We try hard to avoid syntax breaking changes, this needs to be put under discussion.

cc @rust-lang/lang @rust-lang/compiler

r? @nikomatsakis

@rust-highfive rust-highfive assigned nikomatsakis and unassigned Aatch Nov 1, 2016
@eddyb eddyb added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Nov 1, 2016
@nikomatsakis
Copy link
Contributor

I would have assumed that while break { would parse as while (break) {, pursuant to our usual policy of excluding expressions that end in blocks from within a while. That said, I think that while return { () } has a similar problem, and we appear not to accept that. But I consider this a bug -- and I think there is even an open bug about it?

@nikomatsakis
Copy link
Contributor

@nagisa

A silent and very breaking change.

In what sense "silent"? I interpret that to mean "silently changes the semantics of your code, without causing a compilation error" -- is that the case here, or did you have another meaning in mind?

@nagisa
Copy link
Member

nagisa commented Nov 1, 2016

@nikomatsakis the exact example was provided (I guess I ought’ve’d quoted more context):

means that loop { while (break) {} } now terminates instead of looping forever.

This changes semantics of a certain code snippet from something to something else.

Though I wonder what the behaviour was before MIR, with the old trans.

@nikomatsakis
Copy link
Contributor

@nagisa Ah, I see! You were not referring to the parsing change, but rather this, which I overlooked:

I have also changed the loop scoping in MIR-building so that the test
of a while loop is not considered to be part of that loop. This makes
the rules consistent with #37360. The new loop scopes in typeck also
follow this rule. That means that loop { while (break) {} } now
terminates instead of looping forever. This is technically a breaking
change.

Interesting. This doesn't actually match my intuition -- but my intuition is weak here, and I suspect that it may change from day to day. But it does seem like the code is (or was) inconsistent in this respect. I think we should make a firm decision here and document it; we could also make some effort to detect code that may change semantics.

cc @jseyfried -- in preparing #37360, did you survey different parts of the compiler to get a feeling for what the "majority" of them tried to do with while break { }?

@bors
Copy link
Contributor

bors commented Nov 3, 2016

☔ The latest upstream changes (presumably #37540) made this pull request unmergeable. Please resolve the merge conflicts.

@nikomatsakis
Copy link
Contributor

We discussed some of the corner cases in the @rust-lang/lang meeting and came to these conclusions:

  • while break { } should parse as while (break) { }.
  • As for whether while break { } should break out of the while loop (as it does today) or out of the containing loop (as it does under this PR), it seems like neither is particularly intuitive. Our preference would be to simply make it an error to put break in the condition of a while loop.
    • We can make it a hard error and do a test, but it may be that we want to go through a warning period here.

@jseyfried
Copy link
Contributor

@nikomatsakis
I am aware of three sources of semantics for while break {}.

  • After tcx creation, I believe the semantics of while break {} are determined entirely by middle::liveness, which always considers the break to apply to the while loop as opposed to its enclosing loop.
  • The loops pass checks for breaks outside of loops and considers the break to apply to the while's enclosing loop.
  • resolve (well, its authors) used to consider the break to apply to the while loop (i.e. the break's label scope includes the while's label). However, due to an apparently unintended consequence of the old hygiene algorithm, the break's label scope happened to not include the while's label. When I removed the old hygiene algorithm in Simplify the macro hygiene algorithm #34570, the while's label got included. However, using the added label caused middle::liveness to mysteriously ICE, so I reverted my unintended change to the already unintended scope. I don't understand why using the label caused an ICE -- I'll investigate.

My intuition is that the break should apply to the while loop, not its enclosing loop. Since a while loop's condition is executed in each iteration of the loop, I think it should be treated like the loop body w.r.t. breaks. That is, I think of while cond { expr } as sugar for loop { if !cond { break }; expr }.

@eddyb
Copy link
Member

eddyb commented Nov 4, 2016

Interesting desugaring, I suppose I didn't consider inverting the condition before. Should we desugar while loops like that in HIR?

@jseyfried
Copy link
Contributor

@eddyb Yeah, I think that would be a good idea.

@nikomatsakis
Copy link
Contributor

@eddyb heh, for a while we were going to remove while altogether in favor of loop, but decided not to weird people out too much.

@jseyfried I agree that the logical place is that break should target the while loop. However, I think that this is natural to compiler authors -- and maybe people who stop to think very carefully -- but is not obvious at first glance when you just look over the code.

@nikomatsakis
Copy link
Contributor

@goffrie did you see my previous comment? how do you feel about trying to update the implementation as described?

@goffrie
Copy link
Contributor Author

goffrie commented Nov 8, 2016

Will do. I'll have some time to work on it tomorrow. As far as this PR goes

  • should we keep the existing inconsistent behaviour for break in the
    condition, and fix it later?

On Nov 8, 2016 1:57 PM, "Niko Matsakis" [email protected] wrote:

@goffrie https://github.com/goffrie did you see my previous comment?
how do you feel about trying to update the implementation as described?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#37487 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABLtTuiPZOXi6BWx35xPHUziuxA1n3Dwks5q8PBdgaJpZM4KkeCn
.

@nikomatsakis
Copy link
Contributor

@goffrie

should we keep the existing inconsistent behaviour for break in the condition, and fix it later?

seems good, maybe open an issue on this topic?

@goffrie
Copy link
Contributor Author

goffrie commented Nov 10, 2016

Alright, I'm now treating the break in while break {} to refer to the while. Also, while break {} parses correctly.

seems good, maybe open an issue on this topic?

There's #37576.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically looks good -- happy to r+ if you address nits plus answer that one question? (Maybe I'm missing something there...). Nice job!

Some(match self.hir_map.expect_expr(loop_id).node {
hir::ExprWhile(..) => LoopKind::WhileLoop,
hir::ExprLoop(_, _, source) => LoopKind::Loop(source),
_ => span_bug!(e.span, "break label resolved to a non-loop"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I usually find it helpful to include some more debugging info in cases like these. e.g.,

ref r => span_bug!(e.span, "break label resolved to a non-loop: {:?}", r),

that said, having the span is probably enough here

} else if let Loop(kind) = self.cx {
Some(kind)
} else {
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this case an error that will be reported elsewhere? if so, can you leave a comment saying so (and maybe hinting at where?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a comment. This case is a break outside of a loop, which will be caught just below.

} else {
// `break` without argument acts like `break ()`.
e_ty = tcx.mk_nil();
origin = TypeOrigin::Misc(expr.span);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably give better errors here if we used a more specific origin, sort of like we do for match arms.

// No way to know whether it's diverging because
// of a `break` or an outer `break` or `return.
self.diverges.set(Diverges::Maybe);

tcx.mk_nil()
ctxt.unified
} else {
tcx.types.never
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we have any tests around this case? I'm a bit surprised we don't get an "unconstrained type variable" for unified, at least in some cases. What happens if you do something like this, for example?

fn foo() {
    loop { };
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctxt.unified is only used if ctxt.may_break is true, i.e. there was at least one break, which indicates that the type variable has been unified with something (either the type of a break argument, or ()).

@bors
Copy link
Contributor

bors commented Nov 16, 2016

☔ The latest upstream changes (presumably #37545) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Nov 17, 2016

☔ The latest upstream changes (presumably #37732) made this pull request unmergeable. Please resolve the merge conflicts.

@nikomatsakis
Copy link
Contributor

Good enough. @goffrie after you rebase, ping me and I will r+.

@KalitaAlexey
Copy link
Contributor

Dear @nikomatsakis,
I can't find in the changes any changes in the Rust Book.
Is information about this feature going to be added in an another pull request?

@goffrie
Copy link
Contributor Author

goffrie commented Nov 18, 2016

@nikomatsakis alright, rebased!

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Nov 22, 2016

📌 Commit 712082e has been approved by nikomatsakis

@nikomatsakis
Copy link
Contributor

@KalitaAlexey

Is information about this feature going to be added in an another pull request?

before it is stabilized, yes -- but in the meantime, you can checkout RFC 1624

@bors
Copy link
Contributor

bors commented Nov 22, 2016

🔒 Merge conflict

@bors
Copy link
Contributor

bors commented Nov 22, 2016

☔ The latest upstream changes (presumably #37642) made this pull request unmergeable. Please resolve the merge conflicts.

This implements RFC 1624, tracking issue rust-lang#37339.

- `FnCtxt` (in typeck) gets a stack of `LoopCtxt`s, which store the
  currently deduced type of that loop, the desired type, and a list of
  break expressions currently seen. `loop` loops get a fresh type
  variable as their initial type (this logic is stolen from that for
  arrays). `while` loops get `()`.
- `break {expr}` looks up the broken loop, and unifies the type of
  `expr` with the type of the loop.
- `break` with no expr unifies the loop's type with `()`.
- When building MIR, `loop` loops no longer construct a `()` value at
  termination of the loop; rather, the `break` expression assigns the
  result of the loop. `while` loops are unchanged.
- `break` respects contexts in which expressions may not end with braced
  blocks. That is, `while break { break-value } { while-body }` is
  illegal; this preserves backwards compatibility.
- The RFC did not make it clear, but I chose to make `break ()` inside
  of a `while` loop illegal, just in case we wanted to do anything with
  that design space in the future.

This is my first time dealing with this part of rustc so I'm sure
there's plenty of problems to pick on here ^_^
@goffrie
Copy link
Contributor Author

goffrie commented Nov 22, 2016

rebased :/

@aturon
Copy link
Member

aturon commented Nov 22, 2016

@bors r=nikomatsakis

@aturon
Copy link
Member

aturon commented Nov 22, 2016

Since the r= notation doesn't seem to be working...

@bors r+

@nikomatsakis
Copy link
Contributor

@bors r-

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Nov 22, 2016

📌 Commit 9d42549 has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented Nov 22, 2016

⌛ Testing commit 9d42549 with merge 1cabe21...

bors added a commit that referenced this pull request Nov 22, 2016
Implement the `loop_break_value` feature.

This implements RFC 1624, tracking issue #37339.
- `FnCtxt` (in typeck) gets a stack of `LoopCtxt`s, which store the
  currently deduced type of that loop, the desired type, and a list of
  break expressions currently seen. `loop` loops get a fresh type
  variable as their initial type (this logic is stolen from that for
  arrays). `while` loops get `()`.
- `break {expr}` looks up the broken loop, and unifies the type of
  `expr` with the type of the loop.
- `break` with no expr unifies the loop's type with `()`.
- When building MIR, loops no longer construct a `()` value at
  termination of the loop; rather, the `break` expression assigns the
  result of the loop.
- ~~I have also changed the loop scoping in MIR-building so that the test
  of a while loop is not considered to be part of that loop. This makes
  the rules consistent with #37360. The new loop scopes in typeck also
  follow this rule. That means that `loop { while (break) {} }` now
  terminates instead of looping forever. This is technically a breaking
  change.~~
- ~~On that note, expressions like `while break {}` and `if break {}` no
  longer parse because `{}` is interpreted as an expression argument to
  `break`. But no code except compiler test cases should do that anyway
  because it makes no sense.~~
- The RFC did not make it clear, but I chose to make `break ()` inside
  of a `while` loop illegal, just in case we wanted to do anything with
  that design space in the future.

This is my first time dealing with this part of rustc so I'm sure
there's plenty of problems to pick on here ^_^
@bors bors merged commit 9d42549 into rust-lang:master Nov 23, 2016
@bors bors mentioned this pull request Nov 23, 2016
2 tasks
@goffrie goffrie deleted the break branch November 23, 2016 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants