-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the behavior for multiple modules in a file? #13977
Comments
There's also a side concern with error handling and how implicit-modules-are-prototype-modules today. |
Also, what to do about privacy specifiers related to #13524 for the files-are-implicit-modules approach.. |
Note that one common situation with Example 1 (at least in the test system) is with e.g. some-test.chpl. Here One way to handle it would be to allow I'm not sure what the right direction here is. |
Another way would be to replace hyphens with underscores: It also fits more in-line with common GitHub project names since people tend to prefer hyphens. |
I think it's fair to find the behavior change confusing when a single module scope statement is added. I don't know that it means the current behavior is wrong, though. There's always going to be a jump - would it be better to have: M.chpl module M {
var x: int;
...
} mean you have to M.chpl module M { ... }
module N { ... } suddenly cause an outer module M to appear? Or for M.chpl module P {
var x: int;
} to mean there is no module M at all, just a module P? And sure, having the declared module at the top level of the file differ from the file name means you have to look in the file to determine what to use, but that seems better than having the behavior of this situation differ depending on if you're on a filesystem that is case insensitive: M.chpl // lower case module name, upper case filename
// Requires `use M.m;`? Or `use m.m;`? Or sometimes `use M.m;` and sometimes `use m;`?
// Or just always `use m;`?
module m {
var x: int;
} |
Michael rightly pointed out that the first single module case I added was a bit confusing. To add some additional context - I'm following a bit of a slippery slope argument to show that there isn't a single answer that will make things consistent. If the presence of two sibling modules at the top level necessitates the inclusion of a file-level wrapper module, why shouldn't the same be true of a single module? I don't think this is something we want for the reasons I've stated, but it's a case where the behavior changes and the motivation for it to do so seems reasonable. Is this a case of "1 module -> 2 modules, and therefore N modules -> N + 1 modules"? Not necessarily, but the rule is more strongly connected than "1 module -> 1 module and a different statement entirely" |
I think we should just make the tricky cases errors: M.chpl storing The only way for it not to be an error is if:
|
I'm not at all on-board with the direction the proposals in this issue are taking. I don't think that the status quo is as bad or broken as suggested here, and I think there are things we could do (some simple, some only slightly less so) to make it better. Jumping directly to Michael's proposal just above, I find it problematic for a few reasons (many of which apply to the more general thread of conversation taking place here):
I would be open to adding an opt-in flag to the compiler that would warn or error when it encountered code that didn't follow a convention like this to ease the ability of a user or community who liked this rule to enforce it in their code bases. But I don't think it should be the default behavior of the compiler, nor how the language is defined. |
@bradcray - I'm somewhat convinced that just making all of the challenging cases errors here isn't the right move. I copied a Chapel program to file-1.20.chpl recently... (but I think such things occur more with implicit file-level modules than with libraries proper). But I think there are other proposals in line with this issue. To take one example, @BryantLam was proposing above that
creates M.L and M.N. This would match better what some of us expected (e.g. me, vass ). Right now, the compiler adds an implicit module if there are any top-level statements other than module declarations and comments. The change would be to make it add an implicit module if there are any top-level statements other than comments and a single module declaration. How do you feel about that change? (I am intentionally leaving out the issue of what happens if the top level module name does not match; let's discuss the more first adjustment first). |
Negatively. I think having the compiler add an implicit module declaration should be the exceptional case that is supported only for convenience when the user doesn't bother adding one, not the rule. Top-level modules vs. sub-modules can result in distinct behaviors in certain cases. For example, if the two modules in your example were in different files and each declared a I don't particularly like this proposal (mostly because of the work it would involve), but just to throw it out there as part of the exploration: I'd much rather say that the implicit modules introduced by the compiler have no name that the user can refer to if it seems too confusing that sometimes M.chpl introduces a module named M and sometimes it doesn't. That would prevent anyone from ever relying on it having a certain name (and would also handle cases that we currently can't name such as filenames that don't have legal identifiers).
I'm unclear what the concern about parsing files is: (a) it doesn't have to be a full parse, it could just run the scanner looking for module statements outside of comments and strings. This would be quite fast (not that parsing is currently particularly slow, especially when you squash the building of the AST); (b) if that amount of time still gave people concerns, the result of the parse could be dropped into an index file in the directory that would only be incrementally updated when its timestamp lagged that of the source. For reasonably static library directories this would only require a scan when it was first installed (or maybe the installation could even carry the index files with it). |
I'm intrigued by this idea. How would you / could you call a function that was defined at module-level? Are the implicit modules still considered modules, or are their submodules inlined so that parent module can still access that submodule? I.e., // M.chpl
module M {
import L;
L.fn(); // [[1]]
}
// Submodule.chpl
module L { // Is L "inlined" into the anonymous implicit module?
config var someOtherConfigVar;
proc fn() { ... } // ... so that L.<symbol> is called at [[1]]?
}
config var someConfigVar; // Is this settable?
proc someFunc() { ... } // Is this callable?
writeln(...);
"Parse" was an incorrect choice of word. Python has bad filesystem behavior at scale because it searches everywhere for code on an (b) Regarding the index file,
(I'm not saying the index file is a bad idea since a variant may be needed for incremental compilation.) |
@bradcray - Another issue here is that I find the change in whether or not the implicit module is added, between
and
confusing. What if we made the implicit module only appear if the file did not contain any (top-level) module statements at all? Then the first case still behaves as you are hoping, and the 2nd case would just become an error (which it probably is anyway).
I'm not saying I'm for or against that but one of the sources of confusion is the difference in behavior between top-level modules and sub-modules (e.g. the example here that might still be confusing even if you couldn't refer to the implicit top-level module. |
In my mental model, I don't think you would be able to call such a function from outside of that module. Essentially, the module's contents wouldn't be accessible outside of the module because the module could not be named.
I'd still consider them modules, simply ones whose names can't be spoken.
I wonder if the situation might be worse for Python than for Chapel since, as an interpreted language, this searching goes on during execution time rather than at compile-time?
I'm currently considering this issue independently of any submodule directory feature, so would say that it would consider all of the
If a user didn't know what module
I'd expect the compiler to complain about the ambiguity and to require the user to disambiguate either by specifying one of the files directly on the command-line, or by removing the others from the search path. Again, though, I'm not a big fan of this proposal, just throwing it out as food for thought. |
My preference would be to extend the current warning we have so that it fires whenever there is a mix of module declaration statements and any other statements at the top-level, to make the user aware of the introduction of the implicit module (and possible mistake on their part). To me that feels like the behavior that is the most consistent, the easiest to implement, and the smallest change from the status quo. (e.g., could we generalize and reword the existing warning in such a way that someone who typed your example would not be confused? For example, "All Chapel code must be contained within a module, so an implicit module named |
I don't see why that's a bad thing. If modules are the same as files, then the module name is apparent in the filename. It makes it easier for the user to find the source of a line of code if they know the filename it comes from.
This is true, though not to be understated even if
I don't want Fortunately,
Isn't forcing the user to remove the conflicting files from the search path considered bad behavior? I certainly wouldn't care for it if I had to tinker on a module variant. Having a user create a symlink seems much more tenable for this corner case.
I don't personally like this approach, but it'd be fine with me. Wouldn't this also affect TIO code though? (TIO would always emit a warning.) I assert that the most reasonable approach is to create an implicit file-level module with an identifier equal to the filename, unless the file has one and only one module declaration of the same name. This covers the TIO case and is unambiguous how the compiler can find it. |
Note that the proposal I was replying to in that comment was not your original proposal (files are always modules), but Michael's (files are only modules when there are top-level statements other than comments and a single module declaration).
I hear you, but just to make sure it's not lost: it wouldn't parse every file every time, just when the file had been modified since the last run. And it seems like it could also remove many files from needing to be parsed by grepping for the module name in the file. I'm also assuming in this proposal that module paths are typically short (specifically, I almost never use them, and am assuming that mason packages won't need to either).
It would if the TIO code contained both one or more explicit
Sorry, I'm just not a fan. |
Sorry; again, "parse" was the wrong word. I'm looking at filesystem behavior. From the truncated strace output of Running a The behavior you're suggesting is O(n) with the number of modules instead of the compiler's current searching method of O(1). |
For the modules directories that are packaged with Chapel, I'm imagining that we would either (a) pre-populate the directories with index information saying what is contained in each file or (more likely) (b) require that filenames and module names match.
For mason packages, I'm imagining a world similar to the above: that either the package would itself contain any directory information required to find where things live or that mason packages would require filenames and module names to match. E.g., that the creation and establishment of the mason package would require one of these things to happen in order to be accepted as a legal package. In the former model, I could imagine that the directory information could be rolled up into a top-level cross-package mason index such that only one file would need to be checked for all mason packages in a hierarchy rather than one per package. Again, though, I'm not expert in existing package managers, so if those ideas don't make sense for some reason, let me know. (I almost posted something to this effect last night: that it's hard for me to link this issue back to the mason world very strongly because it seems to me that we can adopt much more strict or controlled requirements for a mason package without pushing that requirement onto every piece of code a user might write or sketch out quickly. It's also why I said that I tend to assume that search paths are fairly short—where I'm thinking of the user's manually search path, not directories that might be used for mason packages). |
I obviously prefer (b) approach over option (a). From a user-created project perspective, in 99% of the cases where people split a module into a separate file, they will name it the same. This issue discusses how to resolve the 1% of cases where it isn't true. I don't think Chapel should take option (a) because it's designing around a corner case. For (a),
Suffice to say, I'm skeptical of any approach that isn't a one-to-one correspondence with files as a module. Is there a language that was invented after 2000 not do option (b)? — Python, Rust, and Swift all use option (b) to find a module. It isn't a coincidence that all three of those languages also rely heavily on their package managers for success. |
@BryantLam - I feel that your last post didn't really answer @bradcray's main thought above, which I would rephrase as this question: "Is it sufficient for mason packages to enforce that top-level modules have file names that match even if the language itself does not mandate it?" |
My response is "no" strictly on the notion of not having the compiler become more complicated with how it searches and finds modules. There's also more complexity to a related concern with how Chapel today doesn't have a notion of packages (modules between packages can have conflicting identifiers). #12923 |
After re-reading @bradcray's #13977 (comment), here's another interpretation:
The merits of a mythical bug are impossible to disprove. With that said, if the bug is reproducible with two submodules, that would be sufficient. If the bug only manifests with two top-level modules, it is possible that it would not be reproducible in TIO anyway because of externals like With #13524, many bugs that would appear with two top-level module files would also appear with two submodule files. Using TIO to experiment with submodules is still possible. Ideally, there isn't a difference with experimenting between top-level modules and submodules in TIO.
I assume when you say "recompile", you mean something like An alternative is to use a symlink to
See previous about using a symlink to pick the canonical version that you are building your source with. Bonus: if you ever send that source to other users or if someone is working with you in the same working directory (pair programming), your colleague doesn't have to guess which
This is already not true today with module-level code creating an implicit module. |
Sorry for the hiccup in the conversation.
For mason packages, I was imagining that this index would be generated when the package was installed; or alternatively (but less ideally, I think), when it was published. Outside of mason packages, I'd imagine it to be generated by the compiler only when necessary (it can't find a given module; the index file for a given directory is out of date w.r.t. the source files).
At least some of the module / namespace proposals under consideration distinguish between top-level modules and sub-modules. For example, one proposal is that users have to always specify a full module path from top-level module to sub-module. This would make a program with top-level modules A and B behave differently than one in which they were sub-modules Filename.A and Filename.B (specifically,
It's true that this is an exception to my philosophy, although it's generally been a popular one due to its convenience and has been true since Chapel's origins. This exception is partially what led me to propose not having the file-scope module have a well-defined name (though I still find it more useful for it to do so to preserve the current behavior, and it also has the advantage of not requiring all existing tests that rely on it to be rewritten). Another possibility that I keep waiting to see if someone will propose is that when looking for a module named |
Catching a breath, here are the major approaches that I think have been proposed or discussed on this issue. If you think I've missed one, or want to propose another, or have another favorite example in mind, let me know and I'll add it:
By example: F.chpl
Every filename a module: Creates F F.chpl
Every filename a module: Creates F.F F.chpl
Every filename a module: Creates F.M F-1-1.chpl
Every filename a module: Creates F-1-1 F.chpl
Every filename a module: Creates F.M and F.N F.chpl
Every filename a module: Creates F.F and F.H F.chpl
Every filename a module: Creates F and F.M F-1-1.chpl
Every filename a module: Creates F-1-1 and F-1-1.M |
Since my taxonomy hasn't resulted in any comments yet, I thought I'd give my opinion about the approaches:
It's probably obvious by now, but w.r.t. the OP, I don't think the current behavior is particularly broken nor as complicated to understand as the issue suggests, where the guiding philosophy is "all Chapel code must be in some module, so the compiler will introduce a module for you in cases that you needed one but didn't do so." If the case of mixing file-scope code with explicit modules is considered to be too confusing (or a sign of a likely mistake on the user's part?), I think the proposed warning will help keep people on track while also serving the purpose of reinforcing the guiding philosophy. Michael has taken a stab at implementing such a warning in #14292. The other topic that came up in this issue which I think warrants more attention was the question of "How does the compiler find a module that it doesn't already know about?" and I think that's an interesting and important question, particularly given Bryant's assertion that we should never rely on O(size of all files in path) approaches (which I was originally proposing, but am now being more mindful of). I think we've kicked around some useful seedling ideas for that question here, and think we should fork off a separate thread to summarize and continue that part of the discussion. But in the meantime, I'd like to resolve the OP here if possible. |
What to do about this filename is arguably for another issue. One interpretation is to transform the minus (and only the minus) character into an underscore. This is the approach that Rust takes because (1) GitHub projects like using minuses a lot and (2) if you need to get to a submodule inside this file, there's no way to do it with this commonly used illegal identifier, so transform it into a legal identifier. Other illegal identifiers are still illegal and are not transformed.
The current behavior with multiple modules in a file is arguably okay despite some confusion it causes even to core Chapel developers, but it isn't okay when talking about the premise that this issue is derived from: submodules in different files #13524. Given that the two (or more) pieces are linked, I'm personally not comfortable with any resolution until there's a cohesive full-picture story for how the current behavior can work with that assertion and the assertion that an index file is unacceptabie. I really dislike cache and index files when I know they aren't simply necessary. Edit: An index file is equivalent to a Chapel-specific implementation of a filesystem. The index is the metadata that holds the "inodes" (files) to where code lives. Fundamentally, trying to separate a language from the real-world environment of filesystems is simply not attractive to me. It's an unnecessary level of mental indirection for users and compilers to find where code lives and it divorces a programming language from its effective use (a packaging+distribution system), further hampering a user's on-ramp. |
I think there are some philosophical discussions here that might be worth discussing on their own. I'm not sure what the most effective way to do that is, but I feel that we keep running into Philosophy A vs Philosophy B in these discussions.
I disagree. I think that we can make submodules in different files #13524 work reasonably well with something along the lines of the current behavior. I would simply expect that submodules-in-different-files would have some restrictions about naming. (E.g. for the F.chpl containing
Just to be clear, the compiler doesn't use such an approach today, right? (It might search all path components for Something.chpl, but it doesn't open all files in all path directories). |
I'm willing to be flexible if there's a design that can prove my assertion wrong. That said, the current behavior isn't acceptable to me. What do you think these two programs do: // In file: F.chpl
module F {}
module M {
compilerError("M in F.chpl");
}
// In file: M.chpl
module M {
compilerError("M in M.chpl");
}
// In file: main1.chpl
// compiled with: chpl main1.chpl
use F;
use M;
// In file: main2.chpl
// compiled with: chpl main2.chpl
use M;
use F; AnswersAnswer for main1.chpl:
Answer for main2.chpl
I'd like to think I know enough about the Chapel compiler looks for modules, but I couldn't intuit what was the behavior of these programs without trying it. Another example. Same as above, except M.chpl now only defines `P`.// In file: F.chpl
module F {}
module M {
compilerError("M in F.chpl");
}
// In file: M.chpl
module P {
compilerError("P in M.chpl");
}
// In file: main1.chpl
// compiled with: chpl main1.chpl
use F;
use M;
// In file: main2.chpl
// compiled with: chpl main2.chpl
use M;
use F; Answer for main1.chpl and main2.chpl are the same:
This is not a surprising result, but it is surprising to a developer. Who would have thought that M.chpl wouldn't have module M and they now need to grep for it in the entire source tree. Maybe this flexibility is warranted for some use case or could be considered bad practice. If Chapel isn't going to make this bad practice illegal, then there better be a really good design to motivate allowing it. |
@BryantLam - Couldn't the above examples be made to function in an obvious manner with relatively modest adjustments? In particular, if you have a
I think that these would be consistent with restrictions that exist now and also with the proposed restriction for submodules-in-different-files that I mentioned earlier:
In particular one could still create M-1.chpl containing |
From #13977 (comment), are you proposing the following behavior?
I think you're proposing that top-level modules and submodules-in-files will have different behaviors. I'm okay with this, though @bradcray may have a differing opinion. In general, I worry about the confusion, but if more allowed cases turn into errors, the compiler can provide fix-it hints. (Also, there's a slight concern with how searching works with (And maybe another concern regarding "more binaries". E.g., if |
No, not right now. Right now I'm just proposing that It's subtle but the key difference between this current proposal and previous ones is that it doesn't rule out any of the various scenarios for files and modules (e.g. |
I'm not sure I follow the distinction. Is your modest proposal of specifying the
If so, that's fine to me. I don't know how useful it actually is since the other proposed designs are still moving. |
I don't think so. But at this moment I'm not sure what else I could explain to clarify. Maybe an example would help? // In file: F.chpl
module F {}
module M {
compilerError("M in F.chpl");
}
// In file: main1.chpl
// compiled with: chpl main1.chpl
use F; // Error: `use F` needed to open `F.chpl` but `F.chpl` contains more than just `module M` But |
Wow that is subtle. So you're proposing that ((placeholder: most of the behaviors I want)) will be true / the default, but the user can override by specifying the filename directly on the Regarding #13524, naming an explicit submodule file via |
I don't think we have to allow command-line overriding of submodules at all. Sure, we could consider such a feature, but I don't think it's in any way essential for the proposal. For one thing, AFAIK, there is no expectation/use case for overriding submodules within a single file. I'm saying that we should consider that as a separate add-on feature and that we don't need to constrain our design thinking around submodules-in-different-files with it. (I am aware there is some desire to be able to replace standard modules with other versions - but even with submodules-in-different-files I think that could work with special additions to the module search path). |
I agree. I bring it up as a consideration because I don't see the desire to replace some |
Add warning for more implicit module cases This PR adds the warning suggested in #13977 (comment) (Note that we might choose to go further in adjusting implicit modules as #13977 discusses, but it seems that this addition helps the current situation and can be more easily agreed upon). For example ``` chapel // my-test.chpl writeln("X"); module M { writeln("hello"); } ``` my-test.chpl:2: warning: This file-scope code is outside of any explicit module declarations (e.g., module M), so an implicit module named 'code-outside-of-module-error' is being introduced to contain the file's contents. This PR updates ~150 tests for the new behavior. * primers and chpldoc tests include the warning in the .good * most tests gained a module OuterModule { } * some tests used a particular module name Reviewed by @lydia-duncan and discussed with @bradcray - thanks! - [x] full local testing
I'd like to propose that we continue to stick with the status quo here and retire this issue. @BryantLam, any chance I can convince you of that? |
Just to be clear - IIRC we have added some compiler warnings beyond what we had when this issue was filed, so the current state is probably better than it was. |
Impacts #13948
Forked from #13524 (comment) — description in first list, 1i; or the text below copied from end of that post:
There is one assumption in some of my counter arguments: files are modules. The overall rationale is sound because it is true in most situations, but there are some behaviors today that make it not true.
These current behaviors would have to be re-evaluated or changed. (This is for this issue.)
I copied the examples from the linked posts for further discussion.
Example 1
M.chpl:
Example 2
From @bradcray:
Example 3a
M.chpl:
(Omitting 3b because it's the same as 1, and 3c because it's the same as 3a.)
I believe the current behaviors for these examples should change. How these examples should change is an open question. With the addition of #13524, it will get more confusing when the compiler starts enforcing a module hierarchy; however, independent of that issue, the current behavior is already confusing to me.
Notably, Example 2 stands out to me because the module hierarchy is allowed to subtly and significantly change with inclusion of module-level statements anywhere in that file. I personally don't like behaviors like that because my hierarchy changes if I add/remove what feels like code that is independent of module declarations. Also, the explanation provided in Example 2 is just non-intuitive if the scope increases beyond this toy example into an application that has multiple dependencies.
Because such code in Example 2 already creates implicit modules, a satisfactory resolution would be to always create implicit modules (i.e., files-as-modules), which immediately answers what Example 1 and Example 3a should do.
It comes back around to whether that behavior could be seen as intuitive or not for Examples 1 and 3a. It would be reasonable for these to be errors if we take my suggestion (from #13524 (comment)) of simply enforcing the one, true module declaration in a file.
The one, true module declaration isn't ideal because:
sed
'd to fix their module identifier or the compiler won't allow it (if Example 1 is turned into a compile-time error)But there is one benefit: it makes it obvious to the reader that there is a module declaration that is this file without knowing the language. It does not benefit users that know this rule, but it would be a helpful guide for users unfamiliar with Chapel.
Of course, the con is that it's another special case, and I dislike special cases. I would prefer files-are-implicit-modules as the solution, possibly with some helpful style warnings for Example 3a because that's the case that is most likely to be unintended among the three examples. (3a would be an implicit module M that defines submodules named M and Q.)
The text was updated successfully, but these errors were encountered: