-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2535 - Crash Loop Back offs Error #2577
Conversation
@@ -5,28 +5,6 @@ kind: Template | |||
metadata: | |||
name: ${NAME} | |||
objects: | |||
- apiVersion: image.openshift.io/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, what was the effect of having it before and not having it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer your question straight, the configuration was not configured properly and it was not at all used in our project.
More on the tags:
The imagestream is usually used to mention or represent what is the container's base image and the imagestreamtag is used to associate a specific tag version of the image.
These configurations(only imagestreamtag) inturn helped us to override the base image defined in the Dockerfile which was defined at the top of it "FROM".
In our case, for our deployments we used the same base image for (Api, web, workers, queue-consumers, load-test-gateway & db-migration), so there is no need to override the once in the dockerfile, as we run all our builds using Github action. Even in the future if we wanted to specifically change the base image for a particular container, we can change the dockerfile for the particular base image and run the Github action, it should eventually work.
@@ -1,5 +1,5 @@ | |||
# Base Image | |||
FROM artifacts.developer.gov.bc.ca/redhat-access-docker-remote/ubi8/nodejs-18:1-71.1697652955 | |||
FROM artifacts.developer.gov.bc.ca/redhat-access-docker-remote/ubi8/nodejs-18:1-81 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I support the node upgrade either way and the research looks great, we are still missing the root cause.
Can you please evaluate if the below assumptions make sense?
1 - Even the Openshift docker image defining a non-root user 1001, the container will be executed with a random user, as the error in the ticket also points out.
The same is also supported by the below documentation.
https://docs.openshift.com/container-platform/4.11/openshift_images/create-images.html#use-uid_create-images
2 - Checking the BC git I found at least one entry applying the recommended solution due to a npm 9 issue.
Image source: https://github.com/bcgov/common-hosted-form-service/blob/master/Dockerfile
The above would also explain why the error started when we moved from node 16 to 18 (npm 8 to 10).
Should we consider also applying the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see and I agree what @andrewsignori-aot about the OpenShift container running with arbitrarily assigned user.
I am 100% on same page that we need to run a fix permission command to have the highest level of certainty that the issue is taken care at it's root cause.
When I look at the openshift, see that write permissions are not present outside the owner for the directory.
There is one more thing which I want to share here @guru-aot @andrewsignori-aot @cditcher . It may also be a possible solution or may be not. But I would recommend to try.
Please go through this thread, sclorg/s2i-nodejs-container#396
There is a mention about same error (we are using npm ci)
Following screenshots are the highlights linking to our issue
Docker s2i example:(May be by this way we make the container to run with 1001 user instead of default arbitrarily assigned one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions @andrewsignori-aot and @dheepak-aot , as suggested, i have updated the group permissions for user in group 0 to have write access to the folder./.npm and tested the build and deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the further verifications @dheepak-aot 😉
@@ -242,47 +242,47 @@ build-db-migrations: | |||
test -n "$(BUILD_REF)" | |||
test -n "$(DB_MIGRATIONS_BUILD_REF)" | |||
@echo "+\n++ BUILDING DB migrations with tag: $(BUILD_REF)\n+" | |||
@oc -n $(BUILD_NAMESPACE) process -f $(BUILD_TEMPLATE_PATH) -p TAG=$(BUILD_REF) -p SOURCE_REPOSITORY_REF=$(BUILD_REF) -p BASE_IMAGE_NAME="nodejs-18" -p BASE_IMAGE_TAG="1-71.1697652955" -p BASE_IMAGE_REPO="artifacts.developer.gov.bc.ca/redhat-access-docker-remote/ubi8/" -p SOURCE_CONTEXT_DIR=$(SOURCE_CONTEXT_DIR)backend -p DOCKER_FILE_PATH=apps/db-migrations/Dockerfile -p NAME=$(DB_MIGRATIONS_BUILD_REF) | oc -n $(BUILD_NAMESPACE) apply -f - | |||
@oc -n $(BUILD_NAMESPACE) process -f $(BUILD_TEMPLATE_PATH) -p TAG=$(BUILD_REF) -p SOURCE_REPOSITORY_REF=$(BUILD_REF) -p SOURCE_CONTEXT_DIR=$(SOURCE_CONTEXT_DIR)backend -p DOCKER_FILE_PATH=apps/db-migrations/Dockerfile -p NAME=$(DB_MIGRATIONS_BUILD_REF) | oc -n $(BUILD_NAMESPACE) apply -f - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Please retry analysis of this Pull-Request directly on SonarCloud. |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. 👍
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great research and thanks for the walk-though @guru-aot 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the changes, looks good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for the explanations.
Note: The branch name is created without a # in the front, to test my build and deployments from my local.
What does EACCES error means in node - https://betterstack.com/community/guides/scaling-nodejs/nodejs-errors/#13-eacces
On Analysing the issue, i was able to understand that this issue happens in certain version of node & npm libraries, some related bugs reported in the years are given below for reference.
The possible solutions given by the people are alwauys these 2
https://catalog.redhat.com/software/containers/ubi8/nodejs-18/6278e5c078709f5277f26998?architecture=amd64&image=65302e01ec5935b621691d22&container-tabs=packages
https://catalog.redhat.com/software/containers/ubi8/nodejs-18/6278e5c078709f5277f26998?architecture=amd64&image=6543c3d67371c4bd3014291a&container-tabs=packages
Analyzing the changelogs of npm js, there has been bugs related to cache that has been fixed.
https://docs.npmjs.com/cli/v9/using-npm/changelog#981-2023-07-18
https://docs.npmjs.com/cli/v9/using-npm/changelog#967-2023-05-17
npm/cli#6464
The bug reported may not state the same issue we are facing but its related to the cache error that is happening in the version we were using.
So updating the redhat image to the latest version is what was taken as an action to solve this issue.
Note: https://app.zenhub.com/workspaces/student-information-management-system-5fce9df5aa1b45000e937014/issues/gh/bcgov/sims/2453 is also done as part of this PR.
As suggested by @andrewsignori-aot #2577 (comment) changed the permission of users in the group 0 for the folder ./.npm to have write access
Container before assigning the write permission
Container after assigning the write permission
Addded only the permissions for the ./.npm folder as in the past month the failed logs show this error happening only in the ./.npm folder.
https://kibana-openshift-logging.apps.silver.devops.gov.bc.ca/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:now-30d,mode:quick,to:now))&_a=(columns:!(message),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'8841c680-a15b-11eb-a4dc-e5bf19f04239',key:kubernetes.namespace_name,negate:!f,params:(query:'0c27fb-test',type:phrase),type:phrase,value:'0c27fb-test'),query:(match:(kubernetes.namespace_name:(query:'0c27fb-test',type:phrase))))),index:'8841c680-a15b-11eb-a4dc-e5bf19f04239',interval:auto,query:(language:lucene,query:'%22sudo%20chown%20-R%20%22'),sort:!('@timestamp',desc))