Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump bioformats2raw version from 0.2 to 0.4 #13

Closed
BioinfoTongLI opened this issue May 23, 2022 · 11 comments
Closed

Bump bioformats2raw version from 0.2 to 0.4 #13

BioinfoTongLI opened this issue May 23, 2022 · 11 comments
Assignees
Milestone

Comments

@BioinfoTongLI
Copy link
Contributor

bump the tif-to-zarr conversion from 0.2 to the latest stable version (0.4.0).
currently using this image (https://hub.docker.com/layers/bioformats2raw/openmicroscopy/bioformats2raw/0.4.0/images/sha256-29e650dca4610898d2c5d7639c350f172d3f4d0d0aea7078454b76e10245b0c7?context=explore)

Vitessce works with this version as well.

Tho, the conversion is currently only done locally. Use this option to write directly to s3.
glencoesoftware/bioformats2raw#89

@BioinfoTongLI BioinfoTongLI added this to the 0.0.1 milestone May 23, 2022
@BioinfoTongLI BioinfoTongLI self-assigned this May 23, 2022
@BioinfoTongLI
Copy link
Contributor Author

writing directly to s3 leads to a super long error log, which does happen when saving locally. Here's the end of the error log


2022-05-26 07:40:08,940 [pool-1-thread-1] ERROR c.g.bioformats2raw.Converter - Failure processing chunk; resolution=0 plane=1 xx=16384 yy=16384 zz=0 width=304 height=736 depth=1
java.lang.NullPointerException: null
	at com.upplication.s3fs.S3AccessControlList.hasPermission(S3AccessControlList.java:39)
	at com.upplication.s3fs.S3AccessControlList.checkAccess(S3AccessControlList.java:50)
	at com.upplication.s3fs.S3FileSystemProvider.checkAccess(S3FileSystemProvider.java:470)
	at java.nio.file.Files.isAccessible(Files.java:2455)
	at java.nio.file.Files.isReadable(Files.java:2490)
	at com.bc.zarr.storage.FileSystemStore.getInputStream(FileSystemStore.java:61)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:103)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:96)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:92)
	at com.glencoesoftware.bioformats2raw.Converter.processChunk(Converter.java:1039)
	at com.glencoesoftware.bioformats2raw.Converter.lambda$saveResolutions$4(Converter.java:1286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2022-05-26 07:40:09,207 [pool-1-thread-1] ERROR c.g.bioformats2raw.Converter - Failure processing chunk; resolution=0 plane=2 xx=16384 yy=16384 zz=0 width=304 height=736 depth=1
java.lang.NullPointerException: null
	at com.upplication.s3fs.S3AccessControlList.hasPermission(S3AccessControlList.java:39)
	at com.upplication.s3fs.S3AccessControlList.checkAccess(S3AccessControlList.java:50)
	at com.upplication.s3fs.S3FileSystemProvider.checkAccess(S3FileSystemProvider.java:470)
	at java.nio.file.Files.isAccessible(Files.java:2455)
	at java.nio.file.Files.isReadable(Files.java:2490)
	at com.bc.zarr.storage.FileSystemStore.getInputStream(FileSystemStore.java:61)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:103)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:96)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:92)
	at com.glencoesoftware.bioformats2raw.Converter.processChunk(Converter.java:1039)
	at com.glencoesoftware.bioformats2raw.Converter.lambda$saveResolutions$4(Converter.java:1286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2022-05-26 07:40:09,207 [main] ERROR c.g.bioformats2raw.Converter - Error while writing series 0
java.util.concurrent.CompletionException: java.lang.NullPointerException
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
	at java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1298)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1321)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
	at java.util.concurrent.CompletableFuture.allOf(CompletableFuture.java:2238)
	at com.glencoesoftware.bioformats2raw.Converter.saveResolutions(Converter.java:1314)
	at com.glencoesoftware.bioformats2raw.Converter.write(Converter.java:691)
	at com.glencoesoftware.bioformats2raw.Converter.convert(Converter.java:646)
	at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:477)
	at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:92)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
	at picocli.CommandLine.access$1300(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
	at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
	at picocli.CommandLine.call(CommandLine.java:2761)
	at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:1808)
Caused by: java.lang.NullPointerException: null
	at com.upplication.s3fs.S3AccessControlList.hasPermission(S3AccessControlList.java:39)
	at com.upplication.s3fs.S3AccessControlList.checkAccess(S3AccessControlList.java:50)
	at com.upplication.s3fs.S3FileSystemProvider.checkAccess(S3FileSystemProvider.java:470)
	at java.nio.file.Files.isAccessible(Files.java:2455)
	at java.nio.file.Files.isReadable(Files.java:2490)
	at com.bc.zarr.storage.FileSystemStore.getInputStream(FileSystemStore.java:61)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:103)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:96)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:92)
	at com.glencoesoftware.bioformats2raw.Converter.processChunk(Converter.java:1039)
	at com.glencoesoftware.bioformats2raw.Converter.lambda$saveResolutions$4(Converter.java:1286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@6b67034): java.lang.NullPointerException
	at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
	at picocli.CommandLine.access$1300(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
	at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
	at picocli.CommandLine.call(CommandLine.java:2761)
	at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:1808)
Caused by: java.lang.NullPointerException
	at com.upplication.s3fs.S3AccessControlList.hasPermission(S3AccessControlList.java:39)
	at com.upplication.s3fs.S3AccessControlList.checkAccess(S3AccessControlList.java:50)
	at com.upplication.s3fs.S3FileSystemProvider.checkAccess(S3FileSystemProvider.java:470)
	at java.nio.file.Files.isAccessible(Files.java:2455)
	at java.nio.file.Files.isReadable(Files.java:2490)
	at com.bc.zarr.storage.FileSystemStore.getInputStream(FileSystemStore.java:61)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:103)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:96)
	at com.bc.zarr.ZarrArray.open(ZarrArray.java:92)
	at com.glencoesoftware.bioformats2raw.Converter.processChunk(Converter.java:1039)
	at com.glencoesoftware.bioformats2raw.Converter.lambda$saveResolutions$4(Converter.java:1286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	

@joshmoore
Copy link

@BioinfoTongLI : can you include the command you used? (i.e. is --entrypoint-url involved?)

@BioinfoTongLI
Copy link
Contributor Author

Not using entrypoint-url. The $image is a local image. And the conversion was correct when not using s3 writing
/opt/bioformats2raw/bin/bioformats2raw --output-options s3fs_path_style_access=true ${image} s3://${accessKey}:${secretKey}@webatlas.cog.sanger.ac.uk/deleteme/

@joshmoore
Copy link

If you were accessing this via aws, I think this would be:

aws --entrypoint-url https://cog.sanger.ac.uk s3://webatlas/...

with webatlas being the bucket. Adding the bucket to the front of the entrypoint ("webatlas.cog.sanger.ac.uk") is virtual-hosted-style as opposed to path-style:

https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#path-style-access

So perhaps try setting s3fs_path_style_access=false (or just omitting it).

@BioinfoTongLI
Copy link
Contributor Author

Still seeing the same null pointer error.


  2022-05-27 08:36:54,083 [main] ERROR c.g.bioformats2raw.Converter - Error while writing series 0
  java.util.concurrent.CompletionException: java.lang.NullPointerException
  	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
  	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
  	at java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1298)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1321)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
  	at java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1317)
  	at java.util.concurrent.CompletableFuture.allOf(CompletableFuture.java:2238)
  	at com.glencoesoftware.bioformats2raw.Converter.saveResolutions(Converter.java:1314)
  	at com.glencoesoftware.bioformats2raw.Converter.write(Converter.java:691)
  	at com.glencoesoftware.bioformats2raw.Converter.convert(Converter.java:646)
  	at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:477)
  	at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:92)
  	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
  	at picocli.CommandLine.access$1300(CommandLine.java:145)
  	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
  	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
  	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
  	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
  	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
  	at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
  	at picocli.CommandLine.call(CommandLine.java:2761)
  	at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:1808)
  Caused by: java.lang.NullPointerException: null
  	at com.upplication.s3fs.S3AccessControlList.hasPermission(S3AccessControlList.java:39)
  	at com.upplication.s3fs.S3AccessControlList.checkAccess(S3AccessControlList.java:50)
  	at com.upplication.s3fs.S3FileSystemProvider.checkAccess(S3FileSystemProvider.java:470)
  	at java.nio.file.Files.isAccessible(Files.java:2455)
  	at java.nio.file.Files.isReadable(Files.java:2490)
  	at com.bc.zarr.storage.FileSystemStore.getInputStream(FileSystemStore.java:61)
  	at com.bc.zarr.ZarrArray.open(ZarrArray.java:103)
  	at com.bc.zarr.ZarrArray.open(ZarrArray.java:96)
  	at com.bc.zarr.ZarrArray.open(ZarrArray.java:92)
  	at com.glencoesoftware.bioformats2raw.Converter.processChunk(Converter.java:1039)
  	at com.glencoesoftware.bioformats2raw.Converter.lambda$saveResolutions$4(Converter.java:1286)
  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  	at java.lang.Thread.run(Thread.java:748)

Command error:
  OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp921420814146520776/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
  It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
  Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@6b67034): java.lang.NullPointerException
  	at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
  	at picocli.CommandLine.access$1300(CommandLine.java:145)
  	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
  	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
  	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
  	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
  	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
  	at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
  	at picocli.CommandLine.call(CommandLine.java:2761)
  	at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:1808)
  Caused by: java.lang.NullPointerException
  	at com.upplication.s3fs.S3AccessControlList.hasPermission(S3AccessControlList.java:39)
  	at com.upplication.s3fs.S3AccessControlList.checkAccess(S3AccessControlList.java:50)
  	at com.upplication.s3fs.S3FileSystemProvider.checkAccess(S3FileSystemProvider.java:470)
  	at java.nio.file.Files.isAccessible(Files.java:2455)
  	at java.nio.file.Files.isReadable(Files.java:2490)
  	at com.bc.zarr.storage.FileSystemStore.getInputStream(FileSystemStore.java:61)
  	at com.bc.zarr.ZarrArray.open(ZarrArray.java:103)
  	at com.bc.zarr.ZarrArray.open(ZarrArray.java:96)
  	at com.bc.zarr.ZarrArray.open(ZarrArray.java:92)
  	at com.glencoesoftware.bioformats2raw.Converter.processChunk(Converter.java:1039)
  	at com.glencoesoftware.bioformats2raw.Converter.lambda$saveResolutions$4(Converter.java:1286)
  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  	at java.lang.Thread.run(Thread.java:748)
  

Afaik, this s3 is not on aws at all. Instead, we used CEPH (https://www.redhat.com/en/technologies/storage/ceph). But should be similar to what EBI is using s3.embassy.ebi.ac.uk/idr-upload...
@prete any ideas?

@joshmoore
Copy link

I assume then that we will need to start using our own s3filesystem. See https://imagesc.zulipchat.com/#narrow/stream/212929-general/topic/ome-zarr.20basics.3A.20writing.20.20to.20s3/near/281819192 for a related conversation. Would be good to know how much time/space uploading directly will get you to know how important it is to prioritize this.

@BioinfoTongLI
Copy link
Contributor Author

BioinfoTongLI commented May 27, 2022

I see - it is currently not the most urgent task. nextflow can do the push and works fine with our CEPH storage. The cost is that we need to duplicate the data before the push. Tho, this might be an issue when it comes to the real atlas dataset (100 + whole embryo images). Let's priotize this in the next milestone.

@prete
Copy link
Collaborator

prete commented May 27, 2022

Afaik, this s3 is not on aws at all. Instead, we used CEPH (https://www.redhat.com/en/technologies/storage/ceph). But should be similar to what EBI is using s3.embassy.ebi.ac.uk/idr-upload...

Indeed it's Ceph's rados gateway. Note: aws that Josh used is the awscli tool that can also talk to S3 compatible storages (like our "Sanger S3"). Think of it as a s3cmd alternative for you.

Uploading from bioformats2raw should work like this for you:

bioformats2raw \
    --output-options "s3fs_access_key=${accessKey}|s3fs_secret_key=${secretKey}|s3fs_path_style_access=true" \
    ${image} \
    s3://cog.sanger.ac.uk/webatlas/deleteme/

Keep in mind that uploading straight to S3 will slow down the process, because uploading is slower than disk I/O. But, like you said, won't duplicate the data and you won't have to copy it afterwards... so it's up to you!

@BioinfoTongLI
Copy link
Contributor Author

Thanks @prete! Interestingly your syntax works....
I am pretty sure the original authentification version works as well, since I do have files created by bioformats2raw. But it seems that passing them through output-options is the right way.

@joshmoore worth an issue to glencoe?

@joshmoore
Copy link

My best guess is that s3://cog.sanger.ac.uk/webatlas vs. s3://webatlas.cog.sanger.ac.uk/. All this comes down to the fact that the S3 "standard" is a far cry from POSIX. You can open an issue on bioformats2raw but this is more a question of the underlying FileSystem implementation -- https://github.com/lasersonlab/Amazon-S3-FileSystem-NIO2 -- and if you look at the upstream repo issues (https://github.com/Upplication/Amazon-S3-FileSystem-NIO2/issues) you'll see that the latest one is "is this dead?". I've brought this up a few times on image.sc. Ultimately, we will likely need to work on a single implementation as a community.

@BioinfoTongLI
Copy link
Contributor Author

All seems to be working fine. Closing this for now. Reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants