Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: bring back v0.6 scope rules in the REPL #33864

Merged
merged 3 commits into from
Jan 28, 2020
Merged

Conversation

JeffBezanson
Copy link
Member

After thinking about it off and on for quite a while, @StefanKarpinski and I more-or-less decided that the best way to change the REPL scope situation (if at all) is just to bring back what v0.6 did. That's what SoftGlobalScope and IJulia already do (albeit somewhat approximately; implementing it internally it's much easier to get it 100%), and seems less disruptive than introducing a third behavior.

This needs to be finished up but is ready to try. Please give it a whirl.

I'm not sure what the best interface to it is; it seemed easiest just to drop a special expression in the AST itself.

fixes #28789

@JeffBezanson JeffBezanson added needs decision A decision on this change is needed REPL Julia's REPL (Read Eval Print Loop) compiler:lowering Syntax lowering (compiler front end, 2nd stage) needs tests Unit tests are required for this change needs docs Documentation for this change is required needs news A NEWS entry is required for this change minor change Marginal behavior change acceptable for a minor release existential crisis labels Nov 15, 2019
@andyferris
Copy link
Member

I couldn’t tell from the OP - does this affect the top level scoping rules in general, or just code typed into the REPL?

@ronisbr
Copy link
Member

ronisbr commented Nov 16, 2019

Wouldn't it break code that was assuming that a variable on the global scope was not being modified in local scope when running in REPL?

Like, today we have:

julia> aux = 1
1

julia> for i = 1:10
       aux = i
       end

julia> aux
1

The behavior in v0.6 would lead to aux = 10 right? Is it possible to at least print a warning? Anyway, this should be minor since AFAICT this PR does not change the behavior inside functions.

Just to make it clear, I always preferred the solution in v0.6.

@JeffBezanson
Copy link
Member Author

does this affect the top level scoping rules in general, or just code typed into the REPL

Just the repl. Any top-level expression can opt-in to it by including a :softscope expr, but this change only does that for the repl.

Wouldn't it break code that was assuming that a variable on the global scope was not being modified in local scope when running in REPL?

I'm not sure how code could assume that. It's always been possible to modify globals from the REPL; I don't see how code could be OK with modifying a global in the REPL, but not if that modification happens inside a for loop. It does break possible workflows though, which is maybe what you mean; e.g. if you're used to pasting loops into the repl and them not having global effects.

@andyferris
Copy link
Member

and seems less disruptive than introducing a third behavior.

Just the repl. Any top-level expression can opt-in to it by including a :softscope expr, but this change only does that for the repl.

So... is what you're saying is there will be three behaviours in Julia 1.x? One for scripts, one for the REPL and one for inside functions? (And how will people learn to write scripts if they are not exposed to the same rules at the REPL?)

(On the other hand, changing scoping rules at toplevel in general could potentially break what I'd call "production scripts" and the breakage wouldn't be detectable from open-source packages, so I'm not sure what else you could do in Julia 1.x?)

@StefanKarpinski
Copy link
Member

Inside functions and in scripts aren’t different, so that’s two.

@jlperla
Copy link
Contributor

jlperla commented Nov 18, 2019

Spectacular! Any hope for a top-level script solution that is easy for beginners? To be honest, I think the majority of issues occur when beginners move from jupyter to .jl with Juno, or people copy code from the inside of a function to a top-level script in the process of testing. I haven't seen writing for loops in a REPL.

The other worry is that I thought a lot of the UIs (both Juno or vscode)? work by shift-entering code from the .jl file and executes within the REPL. Which would mean that executing a script line-by-line would have a different behavior than running it all at once?

(On the other hand, changing scoping rules at toplevel in general could potentially break what I'd call "production scripts" and the breakage wouldn't be detectable from open-source packages, so I'm not sure what else you could do in Julia 1.x?)

I suspect this is true... But if there was a function we could call a function toplevel to :softscope at the top of the script? Or perhaps a command-line option? Later you could decide if it made sense to make it the default in a 2.0?

@JeffBezanson
Copy link
Member Author

I haven't seen writing for loops in a REPL.

What?? I thought that's what this was largely about, and for example why IJulia started using SoftGlobalScope.jl.

@JeffBezanson
Copy link
Member Author

It should be easy to add an option (with whatever default) to Juno to use this for shift-enter. It would also be possible to add @softscope in a script to change the behavior for that file. But that starts to worry me. I find this scope behavior convenient for running excerpts/snippets of code, which is what a REPL and shift-enter are for. But as soon as you have a whole program, you should really use functions. Juno also has an actual debugger now.

@JeffBezanson
Copy link
Member Author

Also I know @stevengj and @aviks at least have often mentioned wanting this in the REPL.

@jlperla
Copy link
Contributor

jlperla commented Nov 18, 2019

@JeffBezanson First of all, thanks again for considering this. I know what a pain this (and me!) has been. To me at least, it has always been about "scripts" rather than the REPL. But considering the connection between the two in the tooling, it might be hard to separate them and have different behavior.

To explain why the soft-scoping of scripts is intuitive, take a look at https://discourse.julialang.org/t/tips-to-cope-with-scoping-rules-of-top-level-for-loops-etc/24902/4 and https://discourse.julialang.org/t/tips-to-cope-with-scoping-rules-of-top-level-for-loops-etc/24902/12

But the basic issue is the intuition of people writing scripts don't think of the "global variables" as being globals unless they intended to be accessed from somewhere outside the top level script. Since that isn't possible in Julia (and I am not sure it should be), soft-scope makes things comparable to matlab/R/python and all other scripting languages that seem to have that behavior. (Note that in matlab, a "true" global (i.e. not just local to the script) needs to be declared, https://www.mathworks.com/help/matlab/ref/global.html ).

What?? I thought that's what this was largely about, and for example why IJulia started using SoftGlobalScope.jl.

@stevengj can correct me, but my guess is that the goal was to make scripting (i.e. top level code where the intuitive scoping level is the "script") intuitive moreso than making jupyter match the old REPL.

It would also be possible to add @SoftScope in a script to change the behavior for that file. But that starts to worry me. I find this scope behavior convenient for running excerpts/snippets of code, which is what a REPL and shift-enter are for. But as soon as you have a whole program, you should really use functions. Juno also has an actual debugger now.

Given your use of the word "program" rather than "script", I would agree with you.

I think that a lot of serious developers think of them as synonymous, but for many beginning (often permanently) programmers and scientists, a script patching together other people's serious packages is where they would stop. There is enough evidence that people like scripts (e.g. Jupyter is effectively a literate environment for scripting) that I don't think it should be discouraged through language features. We could educate them on the downsides separately, but scripts have their place.

With all of that said, I realize that "script" has no direct meaning in Julia and a .jl file is just an approximation.. but it is the best we have outside of jupyter.

@JeffBezanson
Copy link
Member Author

JeffBezanson commented Nov 18, 2019

my guess is that the goal was to make scripting (i.e. top level code where the intuitive scoping level is the "script") intuitive moreso than making jupyter match the old REPL.

Why does IJulia use it then? I see IJulia as primarily interactive.

But to be clear, the main problem here is that changing how scope works in e.g. ./julia x.jl (or include, or loading packages) would be breaking and we can't do it.

One thing that comes to mind is that we have a "stream REPL" that gets used when you redirect a file to julia (as opposed to passing it on the command line like julia x.jl). Soft scope is not yet hooked up to that in this PR, but it could be. Then you'd at least get soft scope for julia < x.jl. We might be able to get away with that since it's an unusual way to run code.

@jlperla
Copy link
Contributor

jlperla commented Nov 18, 2019

Why does IJulia use it then? I see IJulia as primarily interactive.

Because Juno and vscode are awesome, and jupyter is frequently infuriating!

I think that of scripting as being a superset of interactive. Many people (e.g. me!) would love to use Juno, Weave, etc. more for scripts where we can actually track git changes and use a proper IDE. It is how people write R, python, matlab, etc. scripts largely.

You may not have intended for the .jl without a module to be a top-level script, but people have come to that conclusion (and in the absence of any alternative "script" outside of jupyter, the conclusion is understandable).

But to be clear, the main problem here is that changing how scope works in e.g. ./julia x.jl (or include, or loading packages) would be breaking and we can't do it.

Yeah... Would it be breaking if there was a global setting users can tweak as an environment variable? Then in a v2.0 you could decide whether having that behavior the default (or only) scoping makes sense?

I think that it would be easy to teach intro users to setup an environment variable before using Juno and then serious programmers wouldn't need to worry about it.

One thing that comes to mind is that we have a "stream REPL"

I fear that would end up more confusing for beginners. My feeling is that the changes in global scope should be either (optionally) consistent across all "top level" scripting/interactive environments, or that we are better off waiting for a v2.0 to give the opportunity to break things.

@JeffBezanson
Copy link
Member Author

You may not have intended for the .jl without a module to be a top-level script, but people have come to that conclusion

During development and debugging, many of us (me included!) find soft scope more convenient. I agree it's what you want for ad-hoc loops in the REPL (which I certainly use), and for pasting code from functions (which is still useful sometimes in spite of a better debugger existing).

But my view is that as soon as you have any sort of coherent program that you might want to run repeatedly e.g. on new data, you should put code in functions. If all your code is loops at the top level, it will be slow anyway and there is much less reason to use julia at all. I just don't think you can design how scope works in a language based on what people seem to want in the first week. I also fail to understand the extreme resistance to writing functions. Not only is it not onerous, everybody agrees it is how you should do things anyway, so why not tell people about it? Setting environment variables and whatnot (giving disparate behavior in different people's environments) is preferable to just writing a function?

I don't like environment variables for this, since it makes it difficult to share code. Though it may be ugly, putting an annotation in the file is far better since it makes the code self-contained.

@jlperla
Copy link
Contributor

jlperla commented Nov 18, 2019

I don't like environment variables for this, since it makes it difficult to share code. Though it may be ugly, putting an annotation in the file is far better since it makes the code self-contained.

I agree with both. Then if they shift-enter in Juno it uses the REPL scoping (even if they forget to shift-enter the ugly annotation)- while if they run the whole script in CI/etc. then it always runs the ugly thing at the the top. Reproducible, and in v2 you could decide if you want the annotation call the default.

But my view is that as soon as you have any sort of coherent program that you might want to run repeatedly e.g. on new data, you should put code in functions. If all your code is loops at the top level, it will be slow anyway and there is much less reason to use julia at all.

It depends when things move between a simple script and a program. But with a rich package ecosystem, you can get a lot done in Julia with shockingly little code. Regardless, my feeling is that the top-level should be as convenient as possible and that teaching people to organize as functions is worthwhile but orthogonal.

Also, I am not sure that speed is always a worry since many useful top-level scripts are calling very elaborate calculations in real packages with function barriers. Depends on the circumstances, I guess.

@antoine-levitt
Copy link
Contributor

I fear that would end up more confusing for beginners. My feeling is that the changes in global scope should be either (optionally) consistent across all "top level" scripting/interactive environments, or that we are better off waiting for a v2.0 to give the opportunity to break things.

I second that. Whatever the decision is regarding scope at toplevel, having code typed up at the REPL and code included behave differently would be massively confusing to me. That is true in general, and is even more true for scoping issues, which routinely confuse semi-experienced programmers like me, let alone beginners. As far as I can see this PR would be the only instance where include and REPL behave differently, it would be a shame to break that.

I also fail to understand the extreme resistance to writing functions. Not only is it not onerous, everybody agrees it is how you should do things anyway, so why not tell people about it?

That might be going a bit off-topic, but I just want to point out that this is a huge cultural divide. There is a lot of things you can do without having to write a single function, and I would imagine much of data science happens in scripts. It's not necessarily bad performance if all you're doing is arrays. The usual workflow for a lot of people is: edit script.jl, alt tab to terminal, press up key and enter to run include("script.jl"), maybe print a few variables, alt tab back, repeat. The transition from this style to functions is annoying because it makes it much harder to examine the things you want to examine. This is somewhat helped by the debugger and the awesome Infiltrator.jl, but it's still not as convenient as just having everything in a script. This goes way beyond first week, and the lack of support in julia for this workflow (eg slow global variables, with const as the only workaround) is actually one of the things I (and several people I know) find most annoying with julia. I don't see any reason why Julia couldn't support this, while gently nudging users towards proper functions, modules and packages.

@JeffBezanson
Copy link
Member Author

There is a lot of things you can do without having to write a single function, and I would imagine much of data science happens in scripts. It's not necessarily bad performance if all you're doing is arrays.

I agree with that, but these things aren't mutually exclusive. Nobody is saying you should never write scripts. Rather, my position is that a mixture of (1) maybe put some code in functions, (2) avoid top-level loops since we're mostly calling library functions anyway, and (3) maybe write global a couple times if you insist on having a top-level loop, provides an acceptable trade-off. IMO it's not worth upending the language just to delete a couple occurrences of global inside loops in scripts --- in fact I thought the nagging presence of those global keywords was not a bad way to gently nudge people to functions, as you say. I guess it was not gentle enough.

eg slow global variables

Let me drill down on this a bit. Would you want global variables exactly as they are today, just faster? Or would "script-local" variables do; e.g. implicitly wrapping the script in function main()? If using global variables for debugging purposes, the issue is that it's not generally possible to have full performance and a good debugging experience. Even in C++ etc. you have to disable optimizations to get really good debugging.

@antoine-levitt
Copy link
Contributor

I agree with that, but these things aren't mutually exclusive. Nobody is saying you should never write scripts. Rather, my position is that a mixture of (1) maybe put some code in functions, (2) avoid top-level loops since we're mostly calling library functions anyway, and (3) maybe write global a couple times if you insist on having a top-level loop, provides an acceptable trade-off. IMO it's not worth upending the language just to delete a couple occurrences of global inside loops in scripts --- in fact I thought the nagging presence of those global keywords was not a bad way to gently nudge people to functions, as you say. I guess it was not gentle enough.

I'm completely fine with having globals. I think what people object to most is the surprise element, and to me a very acceptable way to close this issue could be just adding an explicit warning (something like Warning: the binding for x in this loop shadows the global binding for x; use global to access the global variable). That feels like the right balance between nudging and helpful (but it's a personal opinion of course and I'm sure many feel differently)

Let me drill down on this a bit. Would you want global variables exactly as they are today, just faster? Or would "script-local" variables do

What I really want is to be able to define ten numerical parameters at the beginning of my file and refer to them in functions without a prohibitive speed penalty. Right now I do const on all of them, which 1) is ugly 2) gives me a warning each time I replace eg const x = [1.0; 2.0] by const x = [2.0; 3.0] 3) makes me restart my session because I've written const T = 10 and then want to look at what happens at time 10.5 4) makes me restart my session because I have one script where I do const t = 3.4 because t is a hopping parameter and then I switch to another project where I do const t = range(0, 1, length=N) because t is time. I know the "proper" way to do that is pass the parameters around to functions, but it's just too annoying when you have a lot of them. Eg the code on which I've been working for a paper, and which is typical of this type of work for me, is one file with 250 lines, 15 parameters and 15 functions, some of which have tight loops, and every function uses a random subset of the parameters. For this kind of task, I just don't see any alternative that would be as convenient. Wrapping things in a function main() means I can't leisurely look at the outputs of my simulation in the REPL, which is bad. If it's really not possible for globals to be fast and debuggable, that's just life and it definitely won't stop me from using julia, but I was under the impression from #8870 (comment) that it could be possible with some wizardry.

@JeffBezanson
Copy link
Member Author

If it's really not possible for globals to be fast and debuggable, that's just life and it definitely won't stop me from using julia, but I was under the impression from #8870 (comment) that it could be possible with some wizardry.

They can definitely be faster, just not 100%. In the interim, by far the best thing to do is to annotate the type of the global at the point of use when it occurs in performance-critical code. That will give you nearly full performance, and you can still reassign the variable with no warning (you just might get a type error when running that particular function).

I can also imagine a utility here that runs a script by wrapping it in function main(), but also exports its variables to global at the end , so you can see all the values in the REPL after it runs.

something like Warning: the binding for x in this loop shadows the global binding for x; use global to access the global variable

This is a possible transition strategy that I think has been discussed. It could work either way --- e.g. we could switch to global-by-default, and warn people to add local declarations before then.

@antoine-levitt
Copy link
Contributor

They can definitely be faster, just not 100%. In the interim, by far the best thing to do is to annotate the type of the global at the point of use when it occurs in performance-critical code. That will give you nearly full performance, and you can still reassign the variable with no warning (you just might get a type error when running that particular function).

It's still pretty annoying for complicated mathematical expressions, but yeah that's another workaround that at least doesn't abuse the const mechanism (it's mentioned in the performance tips, looks like I skipped that one...)

I can also imagine a utility here that runs a script by wrapping it in function main(), but also exports its variables to global at the end , so you can see all the values in the REPL after it runs.

That could maybe work but it'd have to be pretty careful to be robust against ctrl-c (interrupting computations that run too long and looking at intermediate results is a common pattern) and not to hit the closure performance gotcha.

@jlperla
Copy link
Contributor

jlperla commented Nov 18, 2019

Nobody is saying you should never write scripts. Rather, my position is that a mixture of (1) maybe put some code in functions, (2) avoid top-level loops since we're mostly calling library functions anyway, and (3) maybe write global a couple times if you insist on having a top-level loop, provides an acceptable trade-off. IMO it's not worth upending the language just to delete a couple occurrences of global inside loops in scripts

It isn't the typing, it is the confusion for non-programmers - especially when it doesn't fit their intuition of what constitutes a script, and previous experiences with languages.

My feeling is that a language doesn't really support scripting if you have to annotate something as a global and it isn't "really" a global (i.e. it is "script-local" and is never accessed outside of the script, or from within a non-closure function). This is the reason people find it so confusing, and matlab or IJulia scoping so intuitive. Users don't think of these as global variables, and soft-scoping was letting them pretend they weren't.

Or would "script-local" variables do; e.g. implicitly wrapping the script in function main()

I think that is exactly the right mental model (and the exact scoping rule in some scripting languages such as matlab and python, as far as I can tell). The code is wrapped in a big let before it is run. Then, if someone declares something as global they really mean it, and it should smell.

In fact, a benefit of having softscope which is equivalent to "wrapping a script in a main()" is that you can manually exactly that. Get your code working as a script, and then literally just put a function around the whole block of code and call it, without any modifications, when you need more performance.

They can definitely be faster, just not 100%.

I think the speed of using global variables in Julia is a separate issue. Slow global variables are not a big issue since you easily tell people that they need to put code in functions when they need it to be fast. It is a teachable heuristic, even to the many scientists/economsists/etc. who would be scared if you started talking about "scope" (which someone coming from matlab would probably never even thought about, since using global variables (as opposed to script level) variables is heavily discouraged in matlab).

In a hypothetical world of "script-local" variables and true globals (which is my dream!) then people would start expecting the script-local to be faster in certain circumstances, but that could wait.

@JeffBezanson
Copy link
Member Author

then people would start expecting the script-local to be faster in certain circumstances, but that could wait.

Perhaps stating the obvious, if we did something like wrapping the script in let or function main(), it would be much faster immediately.

Maybe we should pursue this file-local scope thing seriously.

@jlperla
Copy link
Contributor

jlperla commented Nov 18, 2019

Maybe we should pursue this file-local scope thing seriously.

If it is feasible, then I think that is the ideal solution. It fits the mental model people have. Then the ugly annotation in v1.x could be something basically declaring a .jl as a script with a file-local scope... and decisions could be made later of whether that should be default in some circumstances.

I think you could even have the annotation around the whole code, i.e. myscript.jl becomes

@scriptlocal begin

#... CODE

end

where the tooling could be taught to ignore the macro with shift-enter?

My guess is that the REPL should have the current scoping fudge as well to make integration with Juno/vscode intuitive. (intuitively, tell people that they should think of the REPL is having something equivalent to a file-local scope, even if the performance is sometimes different).

After that, anything which is annotated as a global REALLY means a global.

@rapus95
Copy link
Contributor

rapus95 commented Jan 28, 2020

@JeffBezanson So if I understand it correctly, there are some signs in the AST (according to 1.) but the only way to use them for 3. would be to create a hook which the AST will be passed to when it already holds the corresponding nodes. Is that possible/Is there a step in the compilation pipeline in which those will always be visible if set?

@vtjnash
Copy link
Member

vtjnash commented Jan 28, 2020

Is it in a shape that current tooling (especially juno & vscode) can help exploring which scoping is used in which position?

This seems like it'd be the main reason to further alter this code and lift the warning outside of flisp entirely. It's already nearly there now, there's just not an API to separately access the list of warnings (either they get immediately printed or they get discarded). Probably doesn't really affect this PR right now, since that's largely just an internal question. But some REPLs in Base currently do explicitly separate the calls to Core.eval(Meta.lower(Meta.parse(code))), I'm guessing as a way to separate the source of warnings and errors between stages (parse / macros / lowering / eval / show)—though it doesn't end up reporting them any differently right now.

@JeffBezanson
Copy link
Member Author

Is that possible/Is there a step in the compilation pipeline in which those will always be visible if set?

I don't think this is the right question --- the interactive front-end is what decides whether to add the :softscope annotation, so it already knows whether it has been added.

@StefanKarpinski
Copy link
Member

OMG, finally.

@AzamatB
Copy link
Contributor

AzamatB commented Jan 28, 2020

This messed up the sections of the documentation that it modified. E.g.:
Screen Shot 2020-01-29 at 4 00 33 AM

@StefanKarpinski
Copy link
Member

There's a space in front of one of the closing triple backticks which most markdown implementations seem to be fine with but Julia's built-in markdown parser doesn't like. I've added a fix to #34558.

@rapus95
Copy link
Contributor

rapus95 commented Jan 29, 2020

I don't think this is the right question --- the interactive front-end is what decides whether to add the :softscope annotation, so it already knows whether it has been added.

Well sure, if the frontend is what does the analysis that works well. But what I was referring to is that then, there is no passive way to observe the scoping without actually influencing it. Feels like quantum behaviour. 😁 But as it is escalating into nonsense now, I might rather wait for solutions/ideas to come up in the tooling of Julia before asking such questions. Thanks!

@JeffBezanson
Copy link
Member Author

What analysis? Given an expression, you either add softscope to it or you don't, before handing it to the compiler. If you don't add it, you get the default 1.0 behavior, so if you don't touch anything it's safe to assume that.

@StefanKarpinski
Copy link
Member

I think the analysis he means is which assignments may be ambiguous.

@VivekTRamamoorthy

This comment was marked as off-topic.

@KristofferC

This comment was marked as outdated.

@WenjieZ

This comment was marked as off-topic.

@KristofferC

This comment was marked as outdated.

@JuliaLang JuliaLang locked as resolved and limited conversation to collaborators Feb 16, 2022
@vtjnash
Copy link
Member

vtjnash commented Feb 16, 2022

Since when did for loops start to have separate scopes?

They always did. Though this was a common misunderstanding prior to this change, because of some special toplevel rules that sometimes could make it appear as if they did not.

@StefanKarpinski
Copy link
Member

If anyone wants to discuss this further, they can do so on discourse.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler:lowering Syntax lowering (compiler front end, 2nd stage) existential crisis minor change Marginal behavior change acceptable for a minor release REPL Julia's REPL (Read Eval Print Loop)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Global variable scope rules lead to unintuitive behavior at the REPL/notebook