Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python fails to load after loading nighres module #874

Open
m13slash9 opened this issue Nov 28, 2024 · 27 comments
Open

Python fails to load after loading nighres module #874

m13slash9 opened this issue Nov 28, 2024 · 27 comments
Assignees

Comments

@m13slash9
Copy link

m13slash9 commented Nov 28, 2024

Running Neurodesk App and providing command python right after start results in Python running correctly, i.e.
Python 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35)

When providing the command module load nighres and trying to run python after that results in
FATAL: exec /usr/bin/python failed: input/output error

Unloading the module with module unload nighres and running python again results back in functioning Python
Python 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35)

Neurodesk docker image is neurodesktop:2024-10-22
(as far as I remember, this was not an issue with the previous image, but I'm not sure about that)

@stebo85
Copy link
Contributor

stebo85 commented Nov 29, 2024

Dear @m13slash9,

I can unfortunately not reproduce the problem :(

image

It works exactly as it should. Can you share more details about the system you are running this at? Is this a MacOS with Apple Silicone? What Docker version are you running?

@stebo85
Copy link
Contributor

stebo85 commented Nov 29, 2024

also working for me on macos:
image

@stebo85
Copy link
Contributor

stebo85 commented Nov 29, 2024

I have the suspicion that this could be a firewall/deep packet inspection issue in your institution. To confirm this could you run this and let me know if you see an IO error there:

cvmfs_config stat -v neurodesk.ardc.edu.au

Would you have a chance to run this in another network (e.g. hotspot from your phone?)

@stebo85
Copy link
Contributor

stebo85 commented Nov 29, 2024

additonally can you run the following tests:

curl --head cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/32/ed9653fe3342a4a5045d8adc58895bfff86036
curl --head cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/12/b8af64775fed6d311eac67510455e6e7c1599aX

and post the output here?

@m13slash9
Copy link
Author

@stebo85 Thank you for suggestions

It works exactly as it should. Can you share more details about the system you are running this at? Is this a MacOS with Apple Silicone? What Docker version are you running?

This is running under Windows 10 with a WLS image of Description:Ubuntu 18.04.4 LTS Release: 18.04 (not sure this matters, although Docker might interact with that one), Docker 4.34.2 (167172)

I have the suspicion that this could be a firewall/deep packet inspection issue in your institution.

I doubt that - the output from cvmfs_config stat -v neurodesk.ardc.edu.au is

(base) jovyan@neurodesktop-2024-10-22:~$ cvmfs_config stat -v neurodesk.ardc.edu.au
Version: 2.11.2.0
PID: 92
Uptime: 1 minutes
Memory Usage: 29980k
...
No. Active File Catalogs: 2
Cache Usage: 6994k / 5120001k
File Descriptor Usage: 0 / 130560
No. Open Directories: 0
No. IO Errors: 0
Connection: http://cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au through proxy DIRECT (online)
Usage: 0 open() calls (hitrate 0.000%), 213 opendir() calls
Transfer Statistics: 3398k read, avg. speed: 18773k/s

Both curl requests produce a HTTP response 200 with something like:
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 414
Connection: keep-alive
Date: Thu, 28 Nov 2024 13:33:54 GMT
Server: Apache/2.4.37 (CentOS Stream) mod_wsgi/4.6.4 Python/3.6
Accept-Ranges: bytes
Cache-Control: max-age=3600, stale-while-revalidate=1800, stale-if-error=86400
Expires: Sun, 01 Dec 2024 13:33:54 GMT

@m13slash9
Copy link
Author

Would you have a chance to run this in another network (e.g. hotspot from your phone?)

Not sure I'll be able to do this with the desktop, but it looks like it's not a networking issue, or it is still a suspect?

@m13slash9
Copy link
Author

Also, I've just realized, I've had a memory upgrade on the machine between NeurodesktopApp installation and this bug.
Could this cause an issue?
Maybe a reinstall should help.

@stebo85
Copy link
Contributor

stebo85 commented Nov 29, 2024

a memory upgrade couldn't explain what you see and everything else you report seems ok, so a network bug in a deep packet inspection filter in your institution is the most likely explanation. To confirm the network problem run:

cvmfs_config stat -v neurodesk.ardc.edu.au
#this gives you 0 IO Errors (as confirmed by your test above)
ml nighres
python

cvmfs_config stat -v neurodesk.ardc.edu.au

#now you should see an i/o error:

#No. IO Errors: 1

Can you confirm that this is happening?

@stebo85
Copy link
Contributor

stebo85 commented Nov 29, 2024

If you see the IO error, can you reach out to our IT department and ask which deep inspection filter they use and if they can check which file triggers the blocking. Then they need to reach out to their deep packet inspection vendor and ask for the false alarm to be fixed.

If you could share these details with me that would be great: [email protected]

@m13slash9
Copy link
Author

Can you confirm that this is happening?

Indeed, I got No. IO Errors: 1

Let's see if I can sort this out.

@stebo85
Copy link
Contributor

stebo85 commented Dec 2, 2024

Could you run the following commands and post the outputs here?

sudo touch /var/log/cvmfs_debug.log.cachemgr
sudo chown cvmfs /var/log/cvmfs_debug.log.cachemgr
sudo touch /var/log/cvmfs_debug.log
sudo chown cvmfs /var/log/cvmfs_debug.log
echo -e "CVMFS_DEBUGLOG=/var/log/cvmfs_debug.log" | sudo tee -a /etc/cvmfs/default.local
sudo cvmfs_config umount
sudo mount -t cvmfs neurodesk.ardc.edu.au /cvmfs/neurodesk.ardc.edu.au

ml nighres
python

cat /var/log/cvmfs_debug.log
cat /var/log/cvmfs_debug.log.cachemgr

# the files might be too big to cat, so just attach as files directly

@stebo85
Copy link
Contributor

stebo85 commented Dec 2, 2024

We think it's this file:
http://cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/ce/80a5473acd3770d67c5a296ecf750f21b94541

Can you try to open this in your Browser? Is it blocked? Do you get an error?

@stebo85
Copy link
Contributor

stebo85 commented Dec 2, 2024

What happens with curl on this file?

curl --head http://cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/ce/80a5473acd3770d67c5a296ecf750f21b94541

This should work and give you an OK

@stebo85
Copy link
Contributor

stebo85 commented Dec 2, 2024

This file is /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 inside the container - and our scans show it is ok.

@stebo85
Copy link
Contributor

stebo85 commented Dec 2, 2024

and can you run:

curl http://cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/ce/80a5473acd3770d67c5a296ecf750f21b94541 --output test

you should see that it starts downloading and then gets interrupted by the deep packet inspection - which confirms our suspicions.

@stebo85 stebo85 moved this from New to Active in NeuroDesk Dec 3, 2024
@m13slash9
Copy link
Author

m13slash9 commented Dec 3, 2024

Could you run the following commands and post the outputs here?

sudo touch /var/log/cvmfs_debug.log.cachemgr
sudo chown cvmfs /var/log/cvmfs_debug.log.cachemgr
sudo touch /var/log/cvmfs_debug.log
sudo chown cvmfs /var/log/cvmfs_debug.log
echo -e "CVMFS_DEBUGLOG=/var/log/cvmfs_debug.log" | sudo tee -a /etc/cvmfs/default.local
sudo cvmfs_config umount
sudo mount -t cvmfs neurodesk.ardc.edu.au /cvmfs/neurodesk.ardc.edu.au

ml nighres
python

cat /var/log/cvmfs_debug.log
cat /var/log/cvmfs_debug.log.cachemgr

# the files might be too big to cat, so just attach as files directly
(base) jovyan@neurodesktop-2024-10-22:~$ sudo touch /var/log/cvmfs_debug.log.cachemgr
(base) jovyan@neurodesktop-2024-10-22:~$ sudo chown cvmfs /var/log/cvmfs_debug.log.cachemgr
(base) jovyan@neurodesktop-2024-10-22:~$ sudo touch /var/log/cvmfs_debug.log
(base) jovyan@neurodesktop-2024-10-22:~$ sudo chown cvmfs /var/log/cvmfs_debug.log
(base) jovyan@neurodesktop-2024-10-22:~$ echo -e "CVMFS_DEBUGLOG=/var/log/cvmfs_debug.log" | sudo tee -a /etc/cvmfs/default.local
CVMFS_DEBUGLOG=/var/log/cvmfs_debug.log
(base) jovyan@neurodesktop-2024-10-22:~$ sudo cvmfs_config umount
Unmounting /cvmfs/neurodesk.ardc.edu.au: OK
(base) jovyan@neurodesktop-2024-10-22:~$ sudo mount -t cvmfs neurodesk.ardc.edu.au /cvmfs/neurodesk.ardc.edu.au
CernVM-FS: running with credentials 107:115
CernVM-FS: loading Fuse module... done
CernVM-FS: mounted cvmfs on /cvmfs/neurodesk.ardc.edu.au
(base) jovyan@neurodesktop-2024-10-22:~$ ml nighres
(base) jovyan@neurodesktop-2024-10-22:~$ python
FATAL:   exec /usr/bin/python failed: input/output error
(base) jovyan@neurodesktop-2024-10-22:~$ cat /var/log/cvmfs_debug.log
(base) jovyan@neurodesktop-2024-10-22:~$ cat /var/log/cvmfs_debug.log.cachemgr

So pretty much no output in cat

@m13slash9
Copy link
Author

We think it's this file: http://cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/ce/80a5473acd3770d67c5a296ecf750f21b94541

Can you try to open this in your Browser? Is it blocked? Do you get an error?

Blocked by the browser, but downloaded something (around 114kb) after allowing it.

@m13slash9
Copy link
Author

What happens with curl on this file?

curl --head http://cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/ce/80a5473acd3770d67c5a296ecf750f21b94541

This should work and give you an OK

(base) jovyan@neurodesktop-2024-10-22:~$ curl --head http://cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/ce/80a5473acd3770d67c5a296ecf750f21b94541
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 116247
Connection: keep-alive
Date: Tue, 03 Dec 2024 09:34:32 GMT
Server: Apache/2.4.37 (CentOS Stream) mod_wsgi/4.6.4 Python/3.6
Accept-Ranges: bytes
Cache-Control: max-age=3600, stale-while-revalidate=1800, stale-if-error=86400
Expires: Fri, 06 Dec 2024 09:34:32 GMT
X-Cache-Lookup: MISS from ip-172-31-13-172.eu-central-1.compute.internal:80
Via: 1.1 ip-172-31-13-172.eu-central-1.compute.internal (squid/4.15), 1.1 b10069b378f22e10f0382c21d0a9578e.cloudfront.net (CloudFront)
X-Cache: Hit from cloudfront
X-Amz-Cf-Pop: AMS58-P1
X-Amz-Cf-Id: f-77v-7LYww77Wq1Ibu0wDZr4lkU75a75Utg-OMjcseymhXo-fmu2Q==
Age: 197

@stebo85
Copy link
Contributor

stebo85 commented Dec 3, 2024

Ahh, it probably didn't add a line break in /etc/cvmfs/default.local

Can you check if this file is correct and if not fix the line break and do the rest again fron there

@m13slash9
Copy link
Author

and can you run:

curl http://cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/ce/80a5473acd3770d67c5a296ecf750f21b94541 --output test

you should see that it starts downloading and then gets interrupted by the deep packet inspection - which confirms our suspicions.

Seems like it

(base) jovyan@neurodesktop-2024-10-22:~$ curl http://cvmfs.neurodesk.org/cvmfs/neurodesk.ardc.edu.au/data/ce/80a5473acd3770d67c5a296ecf750f21b94541 --output test
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 99  113k   99  112k    0     0  1503k      0 --:--:-- --:--:-- --:--:-- 1520k
curl: (18) transfer closed with 1047 bytes remaining to read

@stebo85
Copy link
Contributor

stebo85 commented Dec 3, 2024

Cool! Thank you for running these checks. If you could find out what company your institution uses for the deep packet inspection and what rule is being triggered by this file we should be able to come up with a workaround

@stebo85
Copy link
Contributor

stebo85 commented Dec 4, 2024

Dear @m13slash9 ,

We have the suspicion that the deep packet inspection finds this glibc vulnerability: https://security.snyk.io/vuln/SNYK-UBUNTU2204-GLIBC-6674187

We are now trying to update the container to a newer glibc version in the hope that this fixes the problem.

But please let us know should you be able to get more information

@stebo85 stebo85 self-assigned this Dec 4, 2024
@stebo85
Copy link
Contributor

stebo85 commented Dec 5, 2024

Have you got more information from your IT team? To confirm our suspicion, can you try some of the following containers in Neurodesk and see if you get the same error?

vesselvio_1.1.2
topaz_0.2.5
slicer_5.0.3
rstudio_2023.12.1
rstudio_2023.09.1
rstudio_2022.07.2
relion_4.0.1.sm61
quickshear_1.1.0
nipype_1.8.3
mriqc_24.0.2
julia_1.9.4
julia_1.10.1
itksnap_4.0.2
itksnap_4.0.1
ilastik_1.4.0
hnncore_0.3
fmriprep_24.1.0
fmriprep_23.2.1
fmriprep_23.0.0
dsistudio_2024.06.12
conn_22a
brkraw_0.3.11
brainnetviewer_1.7.20191031
bidstools_1.0.4
bart_0.9.00
aslprep_0.7.2
aslprep_0.7.0
ants_2.5.3
afni_23.0.00
afni_22.3.07

If some of them work, then it’s a false positive in the deep packet inspection rule. If none of them work, then they detect the signature of this specific glibc version.

Can you also try these containers?

afni_21.2.00
afni_23.0.04
aidamri_1.1

They are using older versions of glibc which also contain this vulnerability.

This container for example contains a version of glibc where the vulnerability is fixed:
afni_24.3.00_20241003

@m13slash9
Copy link
Author

Have you got more information from your IT team? To confirm our suspicion, can you try some of the following containers in Neurodesk and see if you get the same error?

vesselvio_1.1.2 topaz_0.2.5 slicer_5.0.3 rstudio_2023.12.1 rstudio_2023.09.1 rstudio_2022.07.2 relion_4.0.1.sm61 quickshear_1.1.0 nipype_1.8.3 mriqc_24.0.2 julia_1.9.4 julia_1.10.1 itksnap_4.0.2 itksnap_4.0.1 ilastik_1.4.0 hnncore_0.3 fmriprep_24.1.0 fmriprep_23.2.1 fmriprep_23.0.0 dsistudio_2024.06.12 conn_22a brkraw_0.3.11 brainnetviewer_1.7.20191031 bidstools_1.0.4 bart_0.9.00 aslprep_0.7.2 aslprep_0.7.0 ants_2.5.3 afni_23.0.00 afni_22.3.07

If some of them work, then it’s a false positive in the deep packet inspection rule. If none of them work, then they detect the signature of this specific glibc version.

Can you also try these containers?

afni_21.2.00 afni_23.0.04 aidamri_1.1

They are using older versions of glibc which also contain this vulnerability.

This container for example contains a version of glibc where the vulnerability is fixed: afni_24.3.00_20241003

That gets a bit more confusing (at least for me) now, as i was running these either via Neurodesktop (where GUI was available), and via the terminal in NeurodeskApp, and in the end I was able to run all of them. Either the GUI started up, or I could do tool_name --version-like command.

In the end, I have tried to run nighres container in the Neurodesktop, and it worked (I could run Python and do import nighres). Finally, I have retried running nighres in the NeurodeskApp terminal again, and it worked now

(base) jovyan@neurodesktop-2024-10-22:~$ module load nighres
(base) jovyan@neurodesktop-2024-10-22:~$ python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nighres
>>> 

Considering I got my local IT support ticket redirected a few times and still left with no definitive answer, I am happy that it works (for now), but confused about the whole situation.

@stebo85
Copy link
Contributor

stebo85 commented Dec 5, 2024

Thank you so much for testing all containers! Given that it works now for you I conclude that this was a wrong signature that was added to the deep package inspection software in the last days and they realised the mistake and removed it again.

Our lesson from this is:

  1. We will implement more security scanning on our side to be able to quickly understand which libraries are present in which container, so we can see patterns faster
  2. We will fix the "offline" mode in neurodesk as soon as possible where a users can download the whole container onto their system - this will be a good workaround for such issues in the future

Thank you for bringing it to our attention! I hope we will eventually find out what this was!

@stebo85 stebo85 closed this as completed Dec 5, 2024
@github-project-automation github-project-automation bot moved this from Active to Completed in NeuroDesk Dec 5, 2024
@m13slash9
Copy link
Author

Thank you so much for testing all containers! Given that it works now for you I conclude that this was a wrong signature that was added to the deep package inspection software in the last days and they realised the mistake and removed it again.

Our lesson from this is:

1. We will implement more security scanning on our side to be able to quickly understand which libraries are present in which container, so we can see patterns faster

2. We will fix the "offline" mode in neurodesk as soon as possible where a users can download the whole container onto their system - this will be a good workaround for such issues in the future

Thank you for bringing it to our attention! I hope we will eventually find out what this was!

Not sure it's a reason for reopening the issue, but I found a weird reproducible behavior with the same error:

Whenever I launch a new Neurodesk session and try

(base) jovyan@neurodesktop-2024-10-22:~$ module load nighres
(base) jovyan@neurodesktop-2024-10-22:~$ python

I get

FATAL:   exec /usr/bin/python failed: input/output error

But if I try another module before that, e.g.,

(base) jovyan@neurodesktop-2024-10-22:~$ module load afni
(base) jovyan@neurodesktop-2024-10-22:~$ module load neurodesk

And then actually interact with the said module, as in this case

(base) jovyan@neurodesktop-2024-10-22:~$ afni
Precompiled binary linux_ubuntu_24_64: Oct  1 2024 (Version AFNI_24.3.00 'Elagabalus')
** Version check disabled: AFNI_VERSION_CHECK forbids
Thanks go to J Bodurka for useful feedback
Initializing: X11Error: Can't open display: 
++ AFNI is detached from terminal.

Right after that, python runs fine and is able to run nighres

(base) jovyan@neurodesktop-2024-10-22:~$ python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux

@stebo85 stebo85 reopened this Dec 8, 2024
@github-project-automation github-project-automation bot moved this from Completed to New in NeuroDesk Dec 8, 2024
@stebo85
Copy link
Contributor

stebo85 commented Dec 8, 2024

that's interesting. Let's see what your IT support finds out. I still think it's a deep packet inspection issue. Once we have a new version of nighres we should be able to work around this.

@stebo85 stebo85 moved this from New to Active in NeuroDesk Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Active
Development

No branches or pull requests

2 participants