Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate module names in mason packages #8470

Closed
mppf opened this issue Feb 14, 2018 · 14 comments
Closed

duplicate module names in mason packages #8470

mppf opened this issue Feb 14, 2018 · 14 comments

Comments

@mppf
Copy link
Member

mppf commented Feb 14, 2018

This issue is a spin-off of issue #7847.

The problem is that 2 mason packages can both use a module name (say, Core for example). This causes problems if both of these mason packages are used as libraries by a single application, or if one of the mason packages depends upon the other.

For a complete mason example, consider:

https://github.com/mppf/ProjA
https://github.com/mppf/ProjB

I didn't actually publish projB to the Mason registry - ProjA README has instructions for how to pretend. ProjA depends on ProjB, and both ProjA and ProjB have a module named Core. You should be able to reproduce the issue by pasting the commands from ProjA README.

I know of two potentially reasonable solutions to this problem (and will add more here if people come up with them):

  1. Do not allow mason packages to contain multiple top-level modules. Instead, a mason package must contain a single top-level module that matches the package name. Nested modules would be allowed. Use a file include capability to allow submodules to be stored in different files.

  2. Adjust the compiler to consider modules within a mason package more similarly to nested modules, where each module in a directory would be considered "nested" under a module with the directory name. Such a strategy might draw inspiration from how Python packages work.

These solutions rely on the fact that we are expecting mason packages to have globally-unique names (since there is just 1 official mason registry and it keys off of the package names).

@mppf
Copy link
Member Author

mppf commented Feb 14, 2018

Here is a "story" for why this issue prevents Mason packages from being reasonable "libraries":

Suppose we have two libraries, in the form of mason packages, called SuperLib and NanoLib. Further suppose that these are developed by different groups that are unaware of each other.

Suppose there is an application using both of these.

... use SuperLib, NanoLib ...

Further, suppose that each of these packages consists of a main module and a helper module:

  package SuperLib
     module SuperLib
     module Helper
  package NanoLib
     module NanoLib
     module Impl

OK, application can use both of these modules, everything can work.

Now suppose that SuperLib developer adds a module called "Impl" to add some new component. Then, the application developer upgrades. Now we have

  package SuperLib
     module SuperLib
     module Helper
     module Impl
  package NanoLib
     module NanoLib
     module Impl

Now the application does code does not compile, because there are two Impl modules. But the SuperLib developer can't possibly know that some other library possibly used with SuperLib used that module name.

@mppf
Copy link
Member Author

mppf commented Feb 14, 2018

Here is a sketch / straw-man of how a Python-package-like nested module strategy might look:

A variation on that might be to have Mason rename all modules in sources that it deals with to munge their names with their path to guarantee / improve uniqueness.

Right, I think it'd be reasonable for Mason to convince the compiler to include the package name and version in the "module" the compiler actually works with. But another interesting idea would be to allow one to treat a directory as a module, so that all files inside of that directory would be considered nested modules. But we still have a problem in that the module searching has to respect a particular order that will make sense to Mason - in particular, to look in the package containing the current module first, and then the dependencies (possibly in a particular order).

E.g. in in the ProjA / ProjB example, what if we could "use" a directory?
It might look something like this:

  ProjA/Core.chpl
    ...
  ProjA/Main.chpl
    use Core, ProjB; // Note 2
    ...
  ProjB/Core.chpl
    ...
  ProjB/Main.chpl
    use Core; // Note 1
    ...

Note 1: Inside of ProjB/Main.chpl, the compiler needs to know to look first for Core (or any other module) in ProjB/.
Note 2: Inside of ProjA.chpl, the 'use' of ProjB is actually 'use'ing the directory ProjB/. The compiler would have some rule, like always load up 'Main.chpl' in the directory if present; all modules if not.

It seems to me that this proposal adds flexibility (you can now do something - use a directory - you couldn't before) while leaving the old functionality alone.

@BryantLam
Copy link

This is important to fix, whether through Mason or through the language.

Does it make sense to "wrap all files" in a package with a package-level module?

@bradcray
Copy link
Member

Further, suppose that each of these packages consists of a main module and a helper module:

@mppf: In your example, is there a compelling reason that Helper and Impl are top-level modules rather than being sub-modules of SuperLib and/or NanoLib? Put another way, do you think it's important that these were top-level modules in some way, or are we just positing protecting against a programmer who didn't do a good job of organizing their code?

@BryantLam
Copy link

BryantLam commented Aug 29, 2018

are we just positing protecting against a programmer who didn't do a good job of organizing their code?

Yep. Unless the language can enforce it, someone will do it. :)

File-level modules also cause this behavior by simply creating a file called "Helper.chpl" without an explicit file-level module in it.

@bradcray
Copy link
Member

someone will do it

But do you really want to be using that someone's packages? :D

This feels like more of a mason issue than a language issue to me, so tagging @ben-albrecht and @Spartee to make sure they're aware of this (I'm pretty sure they are).

@ben-albrecht
Copy link
Member

There has been a lot of discussion in this issue and #7847, so I think I am mostly reiterating certain points / opinions that have previously been made.

This feels like more of a mason issue than a language issue to me,

I consider this a language issue that mason happens to be exposing. It seems plausible that any sufficiently large Chapel code base could run into this problem. The original issue #7847 is an example of a user running into this problem in their code outside of mason.

In your example, is there a compelling reason that Helper and Impl are top-level modules rather than being sub-modules of SuperLib and/or NanoLib? Put another way, do you think it's important that these were top-level modules in some way, or are we just positing protecting against a programmer who didn't do a good job of organizing their code?

Breaking code up across files is generally considered a good practice for keeping a code base more readable and maintainable. Currently, all submodules must be contained within a single file, which means we'd be encouraging/enforcing bad coding practices for mason package developers if we required them to keep all code within submodules in their current form.

I think allowing submodules to live outside of the top-level module file is the most appealing solution that has been discussed so far. Submodules already provide a clear solution to this problem, i.e. conflicting names can be disambiguated with fully qualified identifiers like A.Utils.foo() and B.Utils.foo().

For mason, we could enforce packages to only contain a single top-level module. Everything else would need to be a submodule, but we could still allow for users to break up their code across multiple files/directories.

@bradcray
Copy link
Member

For mason, we could enforce packages to only contain a single top-level module.

That's essentially what Michael's solution 1 in the issue proposes (and it's my preference too). It's also why I consider this issue a mason issue and tagged you on it. It may be that the compiler has a role in helping mason answer this question, but I don't consider this issue to be a language issue itself. (If it said "I'd like to be able to declare two variables or modules at the same scope and have that work", I'd agree it's a language issue and say "We're not going to do that" and close it).

I consider this a language issue that mason happens to be exposing. It seems plausible that any sufficiently large Chapel code base could run into this problem.

I disagree. The intent of this issue seems to be "how do we prevent people from writing bad mason packages that will conflict with one another," which is why I think the issue is mason-specific (note that both of Michael's solutions refer to mason as well). The appropriate thing for the language to do when a user tries to define two conflicting symbols at the same scope is to complain about it, which we're already doing. For example, if I declare two variables at the same scope:

var x: int;

var x: real;

we appropriately say:

testit2.chpl:1: error: 'x' has multiple definitions, redefined at:
  testit2.chpl:3

And when compiling a program that has two modules of the same name (which is what this issue is concerned about):

module M {
  writeln("In module M");
}

module M {
  writeln("In module M again");
}

we say:

testit.chpl:1: error: 'M' has multiple definitions, redefined at:
  testit.chpl:5

And again, I think this is appropriate—I tried to compile a program with two modules with the same name at the same scope and that doesn't make sense. It's hard for me to imagine that the language should do any more than this.

Since the issue here seems to be asking how to make composition of two independent mason packages completely bullet-proof, I think the solutions are mason-centric and come down to:

(a) trust that authors of mason packages will not to do this (they'll only define a single top-level module), or
(b) have mason validate that there is only one top-level module and warn/error if that's not the case (essentially Michael's option 1), or potentially
(c) have mason push all top-level modules in a package into a new package-level module (essentially, Michael's option 2).

My opinions: (a) is the status quo and this issue implies that it isn't acceptable (though I'm personally OK with it); (b) seems like it should be pretty easy to implement to me, and corresponds to Michael's option 1 (and you seem open to it as well); (c) feels crufty to me and not as obvious how much work it would be, but I'm not categorically opposed to it.

Breaking code up across files is generally considered a good practice for keeping a code base more readable and maintainable.

I agree that this is desirable, but I think it's a separate concern (and one that's being discussed in issues #10909 and #10796).

@ben-albrecht
Copy link
Member

(b) have mason validate that there is only one top-level module and warn/error if that's not the case (essentially Michael's option 1), or potentially

This is my preference at the moment, but I think this is unreasonable to enforce without being able to separate submodules into separate files.

I agree that this is desirable, but I think it's a separate concern

Constraining mason packages to contain all code into a single source file is not a feasible solution IMO. I wouldn't want to commit to enforcing single top-level modules until being able to split submodules into separate files was a release cycle away.

and one that's being discussed in issues #10909 and #10796).

Thanks for the reference - I'll continue discussion of those strategies in those issues.

@bradcray
Copy link
Member

bradcray commented Aug 30, 2018

I think this is unreasonable to enforce without being able to separate submodules into separate files.

Since I'm OK with the status quo, I'm also OK with dragging our feet on this issue. But as a thought experiment, I'm curious how many of our packages in packages/*.chpl would trigger this warning if they were converted to mason packages and it were implemented today. With a quick grep, I'd guess not many.

I'm also curious whether you would consider having mason generate such a warning to constitute "enforcing", or whether you're thinking of something stricter when you use that term... (like "we'd preclude such packages from being added to the mason registry.") Thanks.

@ben-albrecht
Copy link
Member

I'm curious how many of our packages in packages/*.chpl would trigger this warning if they were converted to mason packages and it were implemented today. With a quick grep, I'd guess not many.

Yeah, I wouldn't expect many package modules to be impacted by this. There has been somewhat of an unwritten rule to keep package/standard modules as a stand-alone monolithic file. Taking a look at a few packages written by users, but not yet published on the registry (mostly due to non-Chapel dependencies), this constraint would impact all of them:

I'm also curious whether you would consider having mason generate such a warning to constitute "enforcing", or whether you're thinking of something stricter when you use that term...

I think we'd have to enforce it via some sort of tooling. Otherwise, this would be another source of package incompatibility, which will contribute to user confusion and frustration.

@mppf
Copy link
Member Author

mppf commented Sep 4, 2018

Further, suppose that each of these packages consists of a main module and a helper module:

@mppf: In your example, is there a compelling reason that Helper and Impl are top-level modules rather than being sub-modules of SuperLib and/or NanoLib? Put another way, do you think it's important that these were top-level modules in some way, or are we just positing protecting against a programmer who didn't do a good job of organizing their code?

I think it's reasonable for the user to want to store them in different files and to make them conceptually (sub)modules. I think we agree at this point that at end of the day, they need to be sub-modules somehow. The question is just - especially in the context of Mason packages - should there be an easy way to create submodules? If different files for submodules are desired, how will that work?

I don't think #10909 is the best/only place for that discussion since it is about one approach ("include statements") where I still prefer another approach ("modules can be directories"). And #10796 seems to have gotten even more side-tracked than this issue... So I created #10946 to specifically propose a "modules can be directories" approach.

Regarding the choice between approaches 1 and 2 in this issue description, it seems to me at this point we could more or less separately choose between include statement (#10909) and modules-can-be-directories (#10946). I think @bradcray is right that we could decide upon these relatively independently from mason (although I think it's important to keep in mind the main use case). Then, the mason-specific questions become:

  1. Is it possible to create a file storing a top-level module within a mason package's directory? Or does mason automatically put all code within some sort of nested module?
  2. Can a mason package have more than one top-level module?
  3. Can a mason package have a top-level module with a name other than the package name?
  4. Are mason packages required to have a particular structure for code to store in the top-level file (e.g. MasonPackage/src/MasonPackage.chpl or MasonPackage/src/Main.chpl or or MasonPackage/Main.chpl ...) ?

Note that if #10946 is to solve the problem, it doesn't obviously handle the src subdirectory, so something would need adjustment for that to work.

@mppf
Copy link
Member Author

mppf commented Mar 18, 2019

@bradcray - I'm just noting that I think we should attend to this issue somehow soon - in other words I think we should move it out of Icebox.

@mppf
Copy link
Member Author

mppf commented Feb 18, 2021

I think that #15279 resolves this issue - the top-level packages need to have unique names but a mason package can include submodules in different files. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants