Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: unified mkdir api with options #14408

Closed

Conversation

IronCore864
Copy link

@IronCore864 IronCore864 commented Aug 22, 2024

In Pebble, there was an osutil.MkdirAllChown function ported from snapd I believe, but there was an issue with umask settings which might affect the permissions of the directories created. See this issue here for more information.

As a solution, we refactored Pebble's osutil.MkdirAllChown to a unified osutil. Mkdir function with options, see this PR here. We also added a Chmod flag in the options so that umask won't affect the permissions, see this PR here.

Now we would like to port these refactors and updates back to snapd.

Copy link

github-actions bot commented Aug 22, 2024

Everyone contributing to this PR have now signed the CLA. Thanks!

@IronCore864 IronCore864 force-pushed the unified-mkdir-with-options branch from fada8db to 18cc245 Compare August 22, 2024 12:48
Copy link

codecov bot commented Aug 22, 2024

Codecov Report

Attention: Patch coverage is 73.04965% with 38 lines in your changes missing coverage. Please review.

Project coverage is 78.77%. Comparing base (6c8d657) to head (18cc245).
Report is 7 commits behind head on master.

Files Patch % Lines
osutil/mkdir.go 56.52% 23 Missing and 7 partials ⚠️
overlord/snapstate/backend/copydata.go 90.62% 0 Missing and 3 partials ⚠️
asserts/gpgkeypairmgr.go 87.50% 0 Missing and 1 partial ⚠️
cmd/snap/cmd_warnings.go 87.50% 0 Missing and 1 partial ⚠️
osutil/user.go 87.50% 0 Missing and 1 partial ⚠️
overlord/devicestate/users.go 87.50% 0 Missing and 1 partial ⚠️
overlord/snapshotstate/backend/reader.go 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #14408      +/-   ##
==========================================
+ Coverage   78.76%   78.77%   +0.01%     
==========================================
  Files        1073     1073              
  Lines      143946   144123     +177     
==========================================
+ Hits       113385   113539     +154     
- Misses      23448    23467      +19     
- Partials     7113     7117       +4     
Flag Coverage Δ
unittests 78.77% <73.04%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@IronCore864
Copy link
Author

Test coverage improved in the latest commit.

@IronCore864
Copy link
Author

There are 7 failed checks, after verification, I don't think they are related to the code changed in this PR.

Running the UT locally succeeded.

For your information, these are the failed checks:

@IronCore864 IronCore864 marked this pull request as ready for review August 23, 2024 07:54
@pedronis pedronis requested review from zyga and bboozzoo August 23, 2024 09:04
// Have a lock so that if one goroutine tries to Mkdir /foo/bar, and
// another tries to Mkdir /foo/baz, they can't both decide they need
// to make /foo and then have one fail.
var mu sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of forcing synchronization, maybe simply document that for the caller? although as a consumer of the API I would not expect anything different than the racy behavior, which I think is consistent with how things generally work across different languages/libraries.

And in fact you cannot do anything about racing with other processes, so I'm not sure it's worth to even worry about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for pointing it out, but this logic is in the current code base, see here.

This PR only refactors the API interface and does not add new or change existing logic. The API is renamed, and so is the file name, making it hard to show a diff for reviewing.

func mkdir(path string, perm os.FileMode, options *MkdirOptions) error {
cand := path + ".mkdir-new"

if err := os.Mkdir(cand, perm); err != nil && !os.IsExist(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beyond this point the candidate is left behind on errors, isn't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I did not quite understand your question, what do you mean by "the candidate is left behind on errors"? Could you elaborate? (FYI, this logic too is from the existing code.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that a temporary directory <name>.mkdir-new gets created here, but there is no defererred cleanup on errors, so should any error occur later in Chmod/Chown/Rename, that directory will be left behind.

And yes, it' looks like the problem exists in the code we have in master, so even more reason to fix when introducing a new API.

Copy link
Contributor

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look and I'm not very happy about the API.

I have two main complaints, and one minor:

  • It's not really done right IMO. We should use openat/mkdirat (we have existing code in Go that does it, it's just used in snap-update-ns) as this is properly safe against concurrent mutation from the rest of the OS and is not susceptible to symlink attacks.
  • The trickery used to change to specific permissions and ownership might need more polish as it's not really atomic and I'm not sure there's value in extra effort - if we want to do this properly we should switch to the right uid/gid, set umask and mkdir then, perhaps in a locked OS thread. As much as annoying that is probably better.

My minor point is about the locking behavior. It is unclear what kind of semantics we are after. Normally we want "mkdir -p" and the fact that other directories already exist is a non-issue. This can be done with file system primitives and needs no locking at all. In the case where you absolutely want to make the directory or fail, it should only apply to the leaf element IMO. In that case you again do not need any locking as the FS tells you if you have won the race and this works across processes as well.

I'd be interersted to know what the use case for this is in pebble.

Lastly, for leaf non-directories there's openat with O_TMPFILE, followed by linkat. This shows the proper model for making non-trivial files with precise permissions and content. There's no such analgoy for directories, sadly.

As such I'm making a comment but please feel this is -0.1 given my remarks about synchronization and complex semantics.

In snapd's environment, we may also need to be mindful of permissions that this now requires (.mkdir-new suffix).

@@ -35,7 +35,7 @@ require (
gopkg.in/yaml.v3 v3.0.1
)

require go.etcd.io/bbolt v1.3.9
require go.etcd.io/bbolt v1.3.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note-to-self: check if this is packaged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a downgrade? What is this about? It looks entirely unrelated.

Chown bool
UserID sys.UserID
GroupID sys.GroupID
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add the mutex here, so that any goroutines that want to synchronize, can do so explicitly. Alternatively ExistOK could be a function that grabs a mutex and returns false, alternatively ExistOK could be default, since we have other tools for making temporary directories.

Having said that, I'm not sure what's the right approach


path = filepath.Clean(path)

if s, err := os.Stat(path); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a descending openat + mkdirat tree. This is just racy with the outside world and is not correct.


// Create a single directory and perform chown/chmod operations according to options.
func mkdir(path string, perm os.FileMode, options *MkdirOptions) error {
cand := path + ".mkdir-new"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is safe. If we don't want someone to see the new directory with old permissions or old ownership, we don't really prevent that. Perhaps mkdir zero permissions and set to non-zero after doing chown?

@IronCore864
Copy link
Author

IronCore864 commented Aug 26, 2024

I had a quick look and I'm not very happy about the API.

I have two main complaints, and one minor:

* It's not really done right IMO. We should use openat/mkdirat (we have existing code in Go that does it, it's just used in snap-update-ns) as this is properly safe against concurrent mutation from the rest of the OS and is not susceptible to symlink attacks.

* The trickery used to change to specific permissions and ownership might need more polish as it's not really atomic and I'm not sure there's value in extra effort - if we want to do this properly we should switch to the right uid/gid, set umask and mkdir then, perhaps in a locked OS thread. As much as annoying that is probably better.

My minor point is about the locking behavior. It is unclear what kind of semantics we are after. Normally we want "mkdir -p" and the fact that other directories already exist is a non-issue. This can be done with file system primitives and needs no locking at all. In the case where you absolutely want to make the directory or fail, it should only apply to the leaf element IMO. In that case you again do not need any locking as the FS tells you if you have won the race and this works across processes as well.

I'd be interersted to know what the use case for this is in pebble.

Lastly, for leaf non-directories there's openat with O_TMPFILE, followed by linkat. This shows the proper model for making non-trivial files with precise permissions and content. There's no such analgoy for directories, sadly.

As such I'm making a comment but please feel this is -0.1 given my remarks about synchronization and complex semantics.

In snapd's environment, we may also need to be mindful of permissions that this now requires (.mkdir-new suffix).

As mentioned in another comment, this PR only refactors the API interface and does not add new or change existing logic. The API is renamed, and so is the file name, making it hard to show a diff for review, but the detailed you mentioned like not using mkdirat and moving "cand" around, is already in the file snapd/osutil/mkdirallchown.go.

The only added new feature is the Chmod flag, it's required in Pebble because Pebble has a mkdir subcommand/API so that users can ask Pebble to create directories, whose permissions are susceptible to umask settings; see this issue here and this one here. If I understand correctly, snapd will run as root in most of the cases, and might not need this feature. I can remove it if it's not wanted.

@IronCore864 IronCore864 changed the title feat: unified mkdir api with options refactor: unified mkdir api with options Aug 26, 2024
@zyga zyga self-assigned this Sep 20, 2024
@zyga zyga self-requested a review September 20, 2024 10:23
Copy link
Contributor

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a rename-refactor alone it's probably okay/harmless. Can you push again without the bolt dependency changes? Tip: git add -p is your friend when you commit. git gui is also useful here.

Separately I think we should drop the osutil mkdir code and generalize the one from snap-update-ns as it's both race-free and lock-free. What we have here is really another incarnation of a hack, just packaged with nicer outer function surface.

I guess I don't need to oppose it much but I think the effort would have been better spent by adapting the implementation under the same API and then porting that back to pebble.

If you don't have cycles on it and @ernestl agrees I can pick it up next week.

@@ -35,7 +35,7 @@ require (
gopkg.in/yaml.v3 v3.0.1
)

require go.etcd.io/bbolt v1.3.9
require go.etcd.io/bbolt v1.3.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a downgrade? What is this about? It looks entirely unrelated.

@IronCore864
Copy link
Author

After discussing on Mattermost, we decided to close this PR for now. @zyga will redesign and rewrite the existing code base when he has time. Thanks!

@IronCore864 IronCore864 deleted the unified-mkdir-with-options branch September 23, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants