-
-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nsncd: 23.05 regression compared to 22.11 #218813
Comments
cc @flokli |
@sir4ur0n hmmh, we do have a What is the user you ssh in as? Is oslogin enabled, or how does it work? |
cc @NinjaTrappeur |
I can't reproduce locally or in a VM test. Sounds like we have a weird interaction with the google NSS module happening here. @sir4ur0n: Could you dump the nsncd logs around the time you call sudo? If you can, dumping all the journald logs around the time you call sudo could also help. Are you using os-login? Edit: I suspect this is triggerred by the google OS-login NSS module crashing. |
cc @ConnorBaker , who inherited the workaround :D |
We're also being bitten by this switch due to twosigma/nsncd#37. Our errors aren't related to |
As written by @NinjaTrappeur, we still need some logs / socket dumps to understand what's going on. This is not visible outside Google Cloud. |
Hi, I also got bitten by this. It works for 22.11 but not 23.05, and I can reproduce it consistently:
I ran If you were to switch the first step to point to
If anyone could advise me what logs to post here, and exactly how to get them, I'd be happy to. 😄 |
Could you check if #263634 fixes this issue as well? |
i was following this guide using master branch and had this issue. checking out from |
Thanks for testing this! Closing the issue. (edit: read this the other way around) |
NixOS instance built from Image generated from current master(version is inferred to be 24.05) also causes the same error when running on GCP with |
The steps to reproduce: git clone [email protected]:NixOS/nixpkgs.git && cd nixpkgs git checkout 4a6c1d765bd660f8379f961e641f5c9fccd312dc Then I run the steps described in https://nixos.wiki/wiki/Install_NixOS_on_GCE inside Docker because I need linux amd64 image, but my machine is Mac OS M1. docker run --rm -it --platform=linux/amd64 -v `pwd`:nixpkgs --workdir /nixpkgs nixpkgs/nix:latest bash Now, set nix path to unstable to use google-cloud-sdk. NIX_PATH=nixpkgs=channel:nixos-unstable nix-shell -p 'google-cloud-sdk' export NIX_CONFIG=$'system-features = benchmark big-parallel nixos-test uid-range kvm\\nfilter-syscalls = false\\nexperimental-features = nix-command flakes' These options come from
Now, make sure you unset nix path pointing to unstable to build os with expected version. If you forget doing, nix complains system.stateVersion not found and fallback to runtime version. Then, I create a Google Storage bucket BUCKET_NAME=<BUCKET_NAME> nixpkgs/nixos/maintainers/scripts/gce/create-gce.sh NOTE: If it failed to upload the artifacts for some reasons, copy the artifacts from the container to your local machine and upload tar.gz in the artifacts to GCS bucket and manually create disk image according to the instruction here https://cloud.google.com/compute/docs/import/import-existing-image. Then, create a Google Compute Engine instance from the image. After the instance starts running, connect to it via SSH. gcloud compute ssh --zone "<ZONE>" "<INSTANCE_NAME>" --project "<PROJECT>" Now, I'm inside Compute Engine instance. sudo -i It results in the following error. sudo: PAM account management error: User not known to the underlying authentication module
sudo: a password is required |
Thanks for the details! Could this be the Google PAM module segfaulting and crashing the Nsncd daemon? Is Nsncd generating any useful logs? Is it crashing when you see this error? ( (I do not use gcloud, I cannot diagnose this by myself :( ) |
The NSS module segfaulting and crashing ns(n)cd indeed did happen than once - see #214811 and the linked upstream bug. |
@picnoir error still happens, cant check logs :( $ journalctl -u nscd
Hint: You are currently not seeing messages from other users and the system.
Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
Pass -q to turn off this notice.
No journal files were opened due to insufficient permissions.
|
here are the logs: systemd[1]: nscd.service: Main process exited, code=killed, status=11/SEGV
systemd[1]: nscd.service: Failed with result 'signal'.
nscd.service: Consumed 32ms CPU time, received 7.9K IP traffic, sent 1.3K IP traffic.
nscd.service: Scheduled restart job, restart counter is at 3.
Starting Name Service Cache Daemon (nsncd)...
nsncd[213941]: Aug 06 07:16:58.391 INFO started, config: Config { ignored_request_types: {}, worker_count: 8, handoff_timeout: 3s }, path: "/var/run/nscd/socket"
systemd[1]: Started Name Service Cache Daemon (nsncd). |
Right, the google NSS module is segfaulting. The segfault seem to bring Nsncd down. |
yeah i tried a overlay with the older nsncd and co versions
but i still got the error (like you said something else is bringing nsncd down), for now i just use this (seems working so far):
|
Yeah, seems like Nsncd needs to better handle the NSS segfaults. That's not on my todolist though. PR welcome, I'll review it. Alternatively, google could also work on fixing their NSS module to prevent it to continuously segfault. |
Describe the bug
In 23.05
nsncd
is enabled by default.However we have found this seems to have changed in a breaking manner a behavior in one of our NixOS machines (CI runner).
Steps To Reproduce
Steps to reproduce the behavior:
nixos-22.11
versionsudo
commands, e.g.sudo ls
orsudo su
nixos-unstable
versionsudo
:services.nscd.enableNsncd = false;
to your NixOS configurationsudo
commands workExpected behavior
Either the same configuration should work in
nixos-unstable
/nixos-23.05
(once it exists), or this breaking change / a migration guide should be added in https://nixos.org/manual/nixos/unstable/release-notes.html#sec-release-23.05-incompatibilitiesAdditional context
I don't know if any of this is relevant, but just in case:
/etc/passwd
nor inusers.users
in the NixOS configurationNotify maintainers
As I don't really know what is the problem (documentation, code, other?) I am unsure who to ping 😐
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.nixos-22.11
version: 6a0d270nixos-unstable
version: 7f5639fCC @yorickvP
The text was updated successfully, but these errors were encountered: