Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Using apache ignite with non AWS S3 bucket as working directory #24

Open
pbelmann opened this issue May 14, 2021 · 0 comments
Open

Using apache ignite with non AWS S3 bucket as working directory #24

pbelmann opened this issue May 14, 2021 · 0 comments

Comments

@pbelmann
Copy link

pbelmann commented May 14, 2021

Hello,

my goal is use s3 for the working directory instead of a share file system. The ideal scenario would be for me that the remote files are stored on the worker nodes in a scratch directory where the process is executed and the result is then uploaded again to S3. For me it doesn't matter if the actual executor is 'slurm', 'ignite', etc.
My first try was using apache ignite in cominbation with the -w parameter.
However I'm using s3 API of ceph that is part of our openstack installation: https://docs.ceph.com/en/latest/radosgw/s3/.
I created an example repository https://github.com/pbelmann/ignite-s3 that shows my approach.

Nextflow Version

      N E X T F L O W
      version 21.04.0 build 5552
      created 02-05-2021 16:22 UTC 
      cite doi:10.1038/nbt.3820
      http://nextflow.io

Nextflow Error reported

While the file is correctly staged in s3 by the master node, the worker node fails with the message:


Error executing process > 'runBBMapDeinterleave (test1)'

Caused by:
  java.io.IOException: No space left on device

Command executed:

  reformat.sh in=interleaved.fq.gz out1=read1.fq.gz out2=read2.fq.gz

Command exit status:
  -

Command output:
  (empty)

Work dir:
  s3://staging/staging/9d/38a8cf157159b7df900b867731c4ea

Looking at the node-nextflow.log the actual error is the following:

May-14 07:12:44.708 [pool-2-thread-1] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
May-14 07:12:44.721 [pool-2-thread-1] DEBUG nextflow.file.FileHelper - AWS S3 config details: {}
May-14 07:12:47.444 [pool-2-thread-1] ERROR nextflow.executor.IgBaseTask - Cannot execute task > runBBMapDeinterleave (test2)
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: *********************; S3 Extended Request ID: ********************)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4914)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4860)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4854)
        at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:880)
        at com.upplication.s3fs.AmazonS3Client.listObjects(AmazonS3Client.java:105)
        at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:113)
        at com.upplication.s3fs.S3FileSystemProvider.readAttributes(S3FileSystemProvider.java:669)
        at java.base/java.nio.file.Files.readAttributes(Files.java:1764)
        at nextflow.util.CacheHelper.hashFile(CacheHelper.java:239)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:186)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:178)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:111)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:107)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:103)
        at nextflow.file.FileHelper.getLocalCachePath(FileHelper.groovy:645)
        at nextflow.executor.IgFileStagingStrategy.stage(IgFileStagingStrategy.groovy:81)
        at nextflow.executor.IgScriptStagingStrategy.super$2$stage(IgScriptStagingStrategy.groovy)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164)
        at nextflow.executor.IgScriptStagingStrategy.stage(IgScriptStagingStrategy.groovy:55)
        at nextflow.executor.IgScriptTask.beforeExecute(IgScriptTask.groovy:56)
        at nextflow.executor.IgBaseTask.call(IgBaseTask.groovy:120)
        at nextflow.scheduler.SchedulerAgent$AgentProcessor.runTask0(SchedulerAgent.groovy:350)
        at nextflow.scheduler.SchedulerAgent$AgentProcessor$1.run(SchedulerAgent.groovy:339)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

I believe that the reason for this error is the incompatibility between amazon S3 API and S3 API offered by ceph.
Is there any way to get the actual S3 call that fails?

@pbelmann pbelmann changed the title Using apache ignite with non AWS S3 bucket as work directory Using apache ignite with non AWS S3 bucket as working directory May 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant