Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NextSV No Output #17

Open
543090lee opened this issue Oct 12, 2022 · 33 comments
Open

NextSV No Output #17

543090lee opened this issue Oct 12, 2022 · 33 comments

Comments

@543090lee
Copy link

After successfully running the python command to get the output directory and work.sh, I ran the work.sh file. However, I got this output.
running script
1 input file(s):
output prefix: /u/home/m/mdistler/project-zarlab/consensus/nextsv_output/clean_reads/nextsv_output
processing input file: /u/home/m/mdistler/project-zarlab/mouseBAM/A_J.chr19.fastq
number of clean reads: 29888081
number of filtered reads: 0
1 input file(s):
/u/home/m/mdistler/project-zarlab/consensus/nextsv_output/clean_reads/nextsv_output.clean.fastq
output directory: /u/home/m/mdistler/project-zarlab/consensus/nextsv_output/longread_qc
/u/home/m/mdistler/project-zarlab/consensus/nextsv_output/clean_reads/nextsv_output.clean.fastq.gz exists -- overwrite (y/n)? y

real 3m11.933s
user 9m48.323s
sys 0m18.873s
first
[M::mm_idx_gen::3.4670.88] collected minimizers
[M::mm_idx_gen::4.068
1.14] sorted minimizers
[M::main::4.0681.14] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::4.294
1.13] mid_occ = 133
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::4.444*1.13] distinct minimizers: 8335668 (88.18% are singletons); average occurrences: 1.322; average spacing: 5.573

real 0m35.122s
user 1m39.821s
sys 0m13.164s

real 0m0.011s
user 0m0.004s
sys 0m0.005s

real 0m0.006s
user 0m0.001s
sys 0m0.002s
Traceback (most recent call last):
File "/u/scratch/m/mdistler/nextsv/bin/check_bam_and_remove_file.py", line 32, in
main()
File "/u/scratch/m/mdistler/nextsv/bin/check_bam_and_remove_file.py", line 23, in main
ret = subprocess.check_output([samtools, "quickcheck", "-vvv", out_bam_file], stderr=subprocess.STDOUT).decode()
File "/u/home/m/mdistler/project-jflint/anaconda3/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/u/home/m/mdistler/project-jflint/anaconda3/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/u/home/m/mdistler/hap.py-install/bin/samtools', 'quickcheck', '-vvv', '/u/home/m/mdistler/project-zarlab/consensus/nextsv_output/minimap2_bam/nextsv_output.minimap2.sorted.bam']' returned non-zero exit status 8.

real 0m35.400s
user 1m39.859s
sys 0m13.215s
second
/u/home/m/mdistler/project-zarlab/consensus/nextsv_output/sniffles_calls/sniffles.minimap2.nextsv_output.sh: line 3: /u/project/jflint/mdistler/anaconda3/lib/python3.8/site-packages/sniffles: Is a directory

real 0m0.003s
user 0m0.000s
sys 0m0.001s

real 0m0.041s
user 0m0.001s
sys 0m0.005s
third

real 0m0.074s
user 0m0.001s
sys 0m0.020s
ngmlr 0.2.7 (build: Jun 25 2018 10:17:59, start: 2022-10-11.16:21:35)
Contact: [email protected]
Writing output (SAM) to stdout
Reading encoded reference from /u/home/m/mdistler/project-zarlab/mouseBAM/chr19_new.fa-enc.2.ngm
Reading 61 Mbp from disk took 0.39s
Reading reference index from /u/home/m/mdistler/project-zarlab/mouseBAM/chr19_new.fa-ht-13-2.2.ngm
Reading from disk took 4.20s
Opening query file /u/home/m/mdistler/project-zarlab/consensus/nextsv_output/clean_reads/nextsv_output.clean.fastq.gz
Mapping reads...
Exception bad_alloc occured in thread 5. This usually means you ran out of physical or virtual memory (try ulimit -v)
Terminating

real 0m4.695s
user 0m0.119s
sys 0m0.594s

real 0m0.019s
user 0m0.001s
sys 0m0.008s

real 0m0.010s
user 0m0.000s
sys 0m0.004s

real 0m4.992s
user 0m0.163s
sys 0m0.636s
/u/home/m/mdistler/project-zarlab/consensus/nextsv_output/sniffles_calls/sniffles.ngmlr.nextsv_output.sh: line 3: /u/project/jflint/mdistler/anaconda3/lib/python3.8/site-packages/sniffles: Is a directory

real 0m0.004s
user 0m0.001s
sys 0m0.000s

real 0m0.047s
user 0m0.002s
sys 0m0.003s

real 0m0.956s
user 0m0.793s
sys 0m0.084s
end

I don't see any VCF files in sniffles_call, neither any output in nextsv_results folder.

Thank you for your help

@543090lee
Copy link
Author

I have looked through each line of work.sh and came to a conclusion that /u/home/m/mdistler/project-zarlab/consensus/nextsv_output2/minimap2_bam/minimap2_bam2cram.sh is causing a problem, and this error just descends down. I have converted our BAM file to CRAM format. I used this command: /u/home/m/mdistler/scratch/samtools-1.3/samtools view -T /u/home/m/mdistler/project-zarlab/mouseBAM/chr19_new.fa -C -o test.cram nextsv_output2.minimap2.bam.
So this means that there are no errors in bam file

@fangli80
Copy link
Collaborator

Hello @543090lee,
Sorry for the inconvenience. Could you please share your command for running NextSV?

Best,
Li

@kaichop
Copy link
Collaborator

kaichop commented Oct 18, 2022 via email

@543090lee
Copy link
Author

I used the command:
python nextsv2.py nextsv.cfg nextsv_output /u/home/m/mdistler/project-zarlab/mouseBAM

@543090lee
Copy link
Author

Upgrading to samtools1.16.1 creates a new error message

Here are the commands I used:

python nextsv2.py nextsv.cfg nextsv_output /u/home/m/mdistler/project-zarlab/mouseBAM
./work.sh

Here is the error message. There seems to be an issue with the header of the bam, so samtools is unable to index it. (in bold below)

(base) bash-4.2$ ./work.sh
1 input file(s):
output prefix: /u/home/m/mdistler/scratch/output1/clean_reads/output1
processing input file: /u/home/m/mdistler/project-zarlab/mouseBAM/A_J.chr19.fastq
number of clean reads: 29888081
number of filtered reads: 0
1 input file(s):
/u/home/m/mdistler/scratch/output1/clean_reads/output1.clean.fastq
output directory: /u/home/m/mdistler/scratch/output1/longread_qc

real 3m33.274s
user 9m27.823s
sys 0m31.683s
[M::mm_idx_gen::3.5940.76] collected minimizers
[M::mm_idx_gen::4.151
1.06] sorted minimizers
[M::main::4.1511.06] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::4.361
1.06] mid_occ = 133
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::4.494*1.06] distinct minimizers: 8335668 (88.18% are singletons); average occurrences: 1.322; average spacing: 5.573
[main_samview] fail to read the header from "-".

real 0m24.696s
user 1m16.162s
sys 0m7.773s
samtools sort: failed to read header from "/u/home/m/mdistler/scratch/output1/minimap2_bam/output1.minimap2.bam"

real 0m0.091s
user 0m0.005s
sys 0m0.011s
[E::hts_open_format] Failed to open file "/u/home/m/mdistler/scratch/output1/minimap2_bam/output1.minimap2.sorted.bam" : No such file or directory
samtools index: failed to open "/u/home/m/mdistler/scratch/output1/minimap2_bam/output1.minimap2.sorted.bam": No such file or directory

real 0m0.009s
user 0m0.003s
sys 0m0.005s
Traceback (most recent call last):
File "/u/scratch/m/mdistler/nextsv/bin/check_bam_and_remove_file.py", line 32, in
main()
File "/u/scratch/m/mdistler/nextsv/bin/check_bam_and_remove_file.py", line 23, in main
ret = subprocess.check_output([samtools, "quickcheck", "-vvv", out_bam_file], stderr=subprocess.STDOUT).decode()
File "/u/home/m/mdistler/project-jflint/anaconda3/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/u/home/m/mdistler/project-jflint/anaconda3/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/u/home/m/mdistler/scratch/samtools-1.16.1/samtools', 'quickcheck', '-vvv', '/u/home/m/mdistler/scratch/output1/minimap2_bam/output1.minimap2.sorted.bam']' returned non-zero exit status 2.

real 0m25.027s
user 1m16.211s
sys 0m7.822s
/u/home/m/mdistler/scratch/output1/sniffles_calls/sniffles.minimap2.output1.sh: line 3: /u/project/jflint/mdistler/anaconda3/lib/python3.8/site-packages/sniffles: Is a directory

real 0m0.002s
user 0m0.001s
sys 0m0.000s

real 0m0.009s
user 0m0.001s
sys 0m0.004s
[E::hts_open_format] Failed to open file "/u/home/m/mdistler/scratch/output1/minimap2_bam/output1.minimap2.sorted.bam" : No such file or directory
samtools view: failed to open "/u/home/m/mdistler/scratch/output1/minimap2_bam/output1.minimap2.sorted.bam" for reading: No such file or directory
samtools index: "/u/home/m/mdistler/scratch/output1/minimap2_bam/output1.minimap2.sorted.cram" is in a format that cannot be usefully indexed
rm: cannot remove '/u/home/m/mdistler/scratch/output1/minimap2_bam/output1.minimap2.sorted.bam': No such file or directory
rm: cannot remove '/u/home/m/mdistler/scratch/output1/minimap2_bam/output1.minimap2.sorted.bam.bai': No such file or directory

real 0m0.040s
user 0m0.009s
sys 0m0.018s
ngmlr 0.2.7 (build: Jun 25 2018 10:17:59, start: 2022-10-18.16:53:36)
Contact: [email protected]
Writing output (SAM) to stdout
Reading encoded reference from /u/home/m/mdistler/project-zarlab/mouseBAM/chr19_new.fa-enc.2.ngm
Reading 61 Mbp from disk took 0.35s
Reading reference index from /u/home/m/mdistler/project-zarlab/mouseBAM/chr19_new.fa-ht-13-2.2.ngm
Reading from disk took 3.73s
Opening query file /u/home/m/mdistler/scratch/output1/clean_reads/output1.clean.fastq.gz
Mapping reads...
Exception bad_alloc occured in thread 5. This usually means you ran out of physical or virtual memory (try ulimit -v)
Terminating

real 0m4.173s
user 0m0.112s
sys 0m0.527s
samtools sort: couldn't allocate memory for bam_mem

real 0m0.013s
user 0m0.003s
sys 0m0.008s
[E::hts_open_format] Failed to open file "/u/home/m/mdistler/scratch/output1/ngmlr_bam/output1.ngmlr.sorted.bam" : No such file or directory
samtools index: failed to open "/u/home/m/mdistler/scratch/output1/ngmlr_bam/output1.ngmlr.sorted.bam": No such file or directory

real 0m0.009s
user 0m0.002s
sys 0m0.006s
Traceback (most recent call last):
File "/u/scratch/m/mdistler/nextsv/bin/check_bam_and_remove_file.py", line 32, in
main()
File "/u/scratch/m/mdistler/nextsv/bin/check_bam_and_remove_file.py", line 23, in main
ret = subprocess.check_output([samtools, "quickcheck", "-vvv", out_bam_file], stderr=subprocess.STDOUT).decode()
File "/u/home/m/mdistler/project-jflint/anaconda3/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/u/home/m/mdistler/project-jflint/anaconda3/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/u/home/m/mdistler/scratch/samtools-1.16.1/samtools', 'quickcheck', '-vvv', '/u/home/m/mdistler/scratch/output1/ngmlr_bam/output1.ngmlr.sorted.bam']' returned non-zero exit status 2.

real 0m4.326s
user 0m0.157s
sys 0m0.570s
/u/home/m/mdistler/scratch/output1/sniffles_calls/sniffles.ngmlr.output1.sh: line 3: /u/project/jflint/mdistler/anaconda3/lib/python3.8/site-packages/sniffles: Is a directory

real 0m0.001s
user 0m0.000s
sys 0m0.001s

real 0m0.007s
user 0m0.001s
sys 0m0.004s
[E::hts_open_format] Failed to open file "/u/home/m/mdistler/scratch/output1/ngmlr_bam/output1.ngmlr.sorted.bam" : No such file or directory
samtools view: failed to open "/u/home/m/mdistler/scratch/output1/ngmlr_bam/output1.ngmlr.sorted.bam" for reading: No such file or directory
samtools index: "/u/home/m/mdistler/scratch/output1/ngmlr_bam/output1.ngmlr.sorted.cram" is in a format that cannot be usefully indexed
rm: cannot remove '/u/home/m/mdistler/scratch/output1/ngmlr_bam/output1.ngmlr.sorted.bam': No such file or directory
rm: cannot remove '/u/home/m/mdistler/scratch/output1/ngmlr_bam/output1.ngmlr.sorted.bam.bai': No such file or directory

real 0m0.030s
user 0m0.005s
sys 0m0.022s

Thanks!

@fangli80
Copy link
Collaborator

The reason might be that some depending tools updated their commands.
I am wring a new version of NextSV and will update soon.

@fangli80
Copy link
Collaborator

Hello @543090lee

I updated NextSV to be compatible with the latest version of Sniffles v2 and also added a new SV caller (cuteSV) to the pipeline.
Please follow the instructions in the README.md file to install NextSV3.

If you still have issues, please feel free to let me know.

Thanks !
Li

@fangli80
Copy link
Collaborator

@543090lee
By the way, I'm assuming your data is ONT. Is that correct?

@543090lee
Copy link
Author

Hello,
Thank you for updating it to the new version nextsv3.
I downloaded and ran the work.sh and got following error.
image
image

@fangli80
Copy link
Collaborator

It seems that there are issues with your reference genome, because minimap2 reported the following:

loaded/built the index for 0 target sequence(s)
distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan; total length: 0

Therefore, the alignment process failed.
Could you please check if the reference file (-r) is correct? It should be a FASTA file (not a fasta.fai file for example).

@543090lee
Copy link
Author

Hello,
I used a .fa file for the reference file, which uses a fasta formatting

Sincerely,
Seungmo Lee

@fangli80
Copy link
Collaborator

OK. There will be a run_minimap2.sample_name.sh script in the 2_aligned_bam folder.
Could you please run it line-by-line and see what's happening.

@543090lee
Copy link
Author

This is what I got.
image

@fangli80
Copy link
Collaborator

minimap2 is not found. Please run it in the conda environment of nextsv.

@543090lee
Copy link
Author

image

Here is with error running it on environment

@fangli80
Copy link
Collaborator

fangli80 commented Oct 23, 2022

There are two issues:

  1. the memory is not enough as samtools reported couldn't allocate memory for bam_mem.
  2. there is no valid sequence in your reference fasta file as minimap2 still reported loaded/built the index for 0 target sequence(s). Could you please use head -20 path/to/reference.fasta to show the first 20 lines of your file and let's see if we can figure it out.

@fangli80
Copy link
Collaborator

You can reduce the --threads to 2 (the current value is 8) which might save some memory. It's best you can run it in a machine with at least 16G memory. You can use free -g to check the memory of your machine.

@543090lee
Copy link
Author

When print first 20 lines, I get this
(base) [seichang@login3 nextsv_input]$ head -20 chr19_new.fa

19
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

@fangli80
Copy link
Collaborator

The first 20 lines are correct. I know this might be a modified reference genome. How about using the original reference genome to test the pipeline?
If possible, you may also gzip the chr19_new.fa file and email it to me ([email protected]).
Thanks.
Li

@543090lee
Copy link
Author

Hello,
I successfully installed and ran nextsv, and got the vcf files. However, sniffles and cuteSV vcf files look empty.
##source=Sniffles2_2.0.7
##command="/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/bin/sniffles --output-rnames --allow-overwrite --input /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sorted.bam --vcf /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls/sample_name.minimap2.sniffles.vcf --reference /u/home/s/seichang/nextsv_input/chr19_new.fa --threads 2"
##fileDate="2022/11/01 13:16:08"
##contig=<ID=19,length=61431566>
##ALT=<ID=INS,Description="Insertion">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=BND,Description="Breakend; Translocation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="Number of reference reads">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of variant reads">
##FORMAT=<ID=ID,Number=1,Type=String,Description="Individual sample SV ID for multi-sample output">
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=GT,Description="Genotype filter">
##FILTER=<ID=SUPPORT_MIN,Description="Minimum read support filter">
##FILTER=<ID=STDEV_POS,Description="SV Breakpoint standard deviation filter">
##FILTER=<ID=STDEV_LEN,Description="SV length standard deviation filter">
##FILTER=<ID=COV_MIN,Description="Minimum coverage filter">
##FILTER=<ID=COV_CHANGE,Description="Coverage change filter">
##FILTER=<ID=STRAND,Description="Strand support filter">
##FILTER=<ID=SVLEN_MIN,Description="SV length filter">
##FILTER=<ID=NM,Description="Alignment noise level filter">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Structural variation with precise breakpoints">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Structural variation with imprecise breakpoints">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of structural variation">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variation">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Mate chromsome for BND SVs">
##INFO=<ID=SUPPORT,Number=1,Type=Integer,Description="Number of reads supporting the structural variation">
##INFO=<ID=SUPPORT_INLINE,Number=1,Type=Integer,Description="Number of reads supporting an INS/DEL SV (non-split events only)">
##INFO=<ID=SUPPORT_LONG,Number=1,Type=Integer,Description="Number of soft-clipped reads putatively supporting the long insertion SV">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##INFO=<ID=STDEV_POS,Number=1,Type=Float,Description="Standard deviation of structural variation start position">
##INFO=<ID=STDEV_LEN,Number=1,Type=Float,Description="Standard deviation of structural variation length">
##INFO=<ID=COVERAGE,Number=.,Type=Float,Description="Coverages near upstream, start, center, end, downstream of structural variation">
##INFO=<ID=STRAND,Number=1,Type=String,Description="Strands of supporting reads for structural variant">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count, summed up over all samples">
##INFO=<ID=SUPP_VEC,Number=1,Type=String,Description="List of read support for all samples">
##INFO=<ID=CONSENSUS_SUPPORT,Number=1,Type=Integer,Description="Number of reads that support the generated insertion (INS) consensus sequence">
##INFO=<ID=RNAMES,Number=.,Type=String,Description="Names of supporting reads (if enabled with --output-rnames)">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
##INFO=<ID=NM,Number=.,Type=Float,Description="Mean number of query alignment length adjusted mismatches of supporting reads">
##INFO=<ID=PHASE,Number=.,Type=String,Description="Phasing information derived from supporting reads, represented as list of: HAPLOTYPE,PHASESET,HAPLOTYPE_SUPPORT,PHASESET_SUPPORT,HAPLOTYPE_FILTER,PHASESET_FILTER">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE

This is the sample vcf file I got.
Thank you

@fangli80
Copy link
Collaborator

fangli80 commented Nov 3, 2022

Hello,
Could you please show me:

  1. the size of bam file in the 2_aligned_bam folder
  2. the shell scripts in 2_aligned_bam and 3_SV_calls

Thanks!
Li

@543090lee
Copy link
Author

Hello,
1.
total 5.4G
-rw-r--r--. 1 seichang eeskin 966 Nov 1 12:42 run_minimap2.sample_name.sh
-rw-r--r--. 1 seichang eeskin 2.7G Nov 1 13:12 sample_name.minimap2.bam
-rw-r--r--. 1 seichang eeskin 2.7G Nov 1 13:15 sample_name.minimap2.sorted.bam
-rw-r--r--. 1 seichang eeskin 172K Nov 1 13:16 sample_name.minimap2.sorted.bam.bai

.sh file in 2_aligned_bam:
#!/bin/bash

minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fastq.gz /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fasta.gz | samtools view -@ 2 -bS - > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.bam
samtools sort -@ 2 -o /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sorted.bam /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.bam
samtools index -@ 2 /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sorted.bam
/u/project/zarlab/seichang/nextsv/bin/check_bam_and_remove_file.py /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sorted.bam /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.bam samtools

.sh file in 3_SV_calls:
cuteSV:
#!/bin/bash

mkdir -p /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls/cuteSV_temp
cuteSV --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 100 --diff_ratio_merging_DEL 0.3 --report_readid --genotype --min_support 5 --threads 2 --sample sample_name /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sorted.bam /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls/sample_name.minimap2.cuteSV.vcf /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls/cuteSV_temp
rm -r /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls/cuteSV_temp

sniffles:
#!/bin/bash

sniffles --output-rnames --allow-overwrite --input /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sorted.bam --vcf /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls/sample_name.minimap2.sniffles.vcf --reference /u/home/s/seichang/nextsv_input/chr19_new.fa --threads 2

Thank you for your help

@fangli80
Copy link
Collaborator

fangli80 commented Nov 4, 2022

The file sizes look correct. Could you please run the two shell scripts manually and see if any errors are reported?
Thanks,
Li

@543090lee
Copy link
Author

543090lee commented Nov 4, 2022

For the sh file in 2_aligned_bam:

(nextsv3) [seichang@login1 2_aligned_bam]$ ./run_minimap2.sample_name.sh
[M::mm_idx_gen::2.6790.96] collected minimizers
[M::mm_idx_gen::3.242
1.13] sorted minimizers
[M::main::3.2421.13] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::3.432
1.12] mid_occ = 133
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::3.566*1.12] distinct minimizers: 8335668 (88.18% are singletons); average occurrences: 1.322; average spacing: 5.573; total length: 61431566
[main_samview] fail to read the header from "-".
samtools sort: failed to read header from "/u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.bam"
Traceback (most recent call last):
File "/u/project/zarlab/seichang/nextsv/bin/check_bam_and_remove_file.py", line 32, in
main()
File "/u/project/zarlab/seichang/nextsv/bin/check_bam_and_remove_file.py", line 23, in main
ret = subprocess.check_output([samtools, "quickcheck", "-vvv", out_bam_file], stderr=subprocess.STDOUT).decode()
File "/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/samtools'

sh file in the 3_SV_calls folder:

(nextsv3) [seichang@login1 3_SV_calls]$ ./run_cuteSV_forminimap2.sample_name.sh
2022-11-03 21:42:34,379 [INFO] Running /u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/bin/cuteSV --max_cluster_bias_INS 100 --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 100 --diff_ratio_merging_DEL 0.3 --report_readid --genotype --min_support 5 --threads 2 --sample sample_name /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sorted.bam /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls/sample_name.minimap2.cuteSV.vcf /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls/cuteSV_temp
Traceback (most recent call last):
File "/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/bin/cuteSV", line 876, in
run(sys.argv[1:])
File "/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/bin/cuteSV", line 872, in run
main_ctrl(args, argv)
File "/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/bin/cuteSV", line 673, in main_ctrl
samfile = pysam.AlignmentFile(args.input)
File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 1000, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False

For Sniffles:
(nextsv3) [seichang@login1 3_SV_calls]$ ./run_sniffles_forminimap2.sample_name.sh
Running Sniffles2, build 2.0.7
Run Mode: call_sample
Start on: 2022/11/03 21:43:18
Working dir: /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls
Used command: /u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/bin/sniffles --output-rnames --allow-overwrite --input /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sorted.bam --vcf /u/project/zarlab/seichang/nextsv/nextsv_output/3_SV_calls/sample_name.minimap2.sniffles.vcf --reference /u/home/s/seichang/nextsv_input/chr19_new.fa --threads 2

Opening for reading: /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sorted.bam
Traceback (most recent call last):
File "/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/bin/sniffles", line 616, in
Sniffles2_Main(config.from_cmdline(),processes)
File "/u/home/s/seichang/project-zarlab/anaconda3/envs/nextsv3/bin/sniffles", line 118, in Sniffles2_Main
bam_in=pysam.AlignmentFile(config.input, config.input_mode)
File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 1000, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False

@fangli80
Copy link
Collaborator

fangli80 commented Nov 7, 2022

It looks like minimap2 ran out of memory. Could you please run the following command and see what happened?
minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fastq.gz /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fasta.gz > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sam

@543090lee
Copy link
Author

Hello,
(base) [seichang@login4 2_aligned_bam]$ minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fastq.gz /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fasta.gz > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sam
-bash: minimap2: command not found

I got this error, and minimap2 wasn't found?

Thank you

@fangli80
Copy link
Collaborator

fangli80 commented Nov 7, 2022

Please run it in the conda environment of nextsv.

@543090lee
Copy link
Author

Hello,
I got this result:

(nextsv3) [seichang@login3 2_aligned_bam]$ minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fastq.gz /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fasta.gz > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sam
[M::mm_idx_gen::2.7280.93] collected minimizers
[M::mm_idx_gen::3.279
1.11] sorted minimizers
[M::main::3.2801.11] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::3.464
1.10] mid_occ = 133
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::3.592*1.10] distinct minimizers: 8335668 (88.18% are singletons); average occurrences: 1.322; average spacing: 5.573; total length: 61431566
Segmentation fault

Thank you for your help

@fangli80
Copy link
Collaborator

fangli80 commented Nov 8, 2022

ok. Please try the following command in the conda environment of nextsv

  1. minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fastq.gz > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sam

  2. minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fasta.gz > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sam

@543090lee
Copy link
Author

Hello, I got these results:

1.(nextsv3) [seichang@login1 2_aligned_bam]$ minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fastq.gz > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sam

[M::mm_idx_gen::2.6600.98] collected minimizers
[M::mm_idx_gen::3.194
1.15] sorted minimizers
[M::main::3.1941.15] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::3.377
1.14] mid_occ = 133
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::3.504*1.14] distinct minimizers: 8335668 (88.18% are singletons); average occurrences: 1.322; average spacing: 5.573; total length: 61431566
Segmentation fault

2.(nextsv3) [seichang@login1 2_aligned_bam]$ minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fasta.gz > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sam
[M::mm_idx_gen::2.4610.99] collected minimizers
[M::mm_idx_gen::3.004
1.17] sorted minimizers
[M::main::3.0051.17] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::3.192
1.16] mid_occ = 133
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::3.3211.15] distinct minimizers: 8335668 (88.18% are singletons); average occurrences: 1.322; average spacing: 5.573; total length: 61431566
[M::worker_pipeline::504.162
1.00] mapped 1 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/project/zarlab/seichang/nextsv/nextsv_output/1_clean_reads/sample_name.clean.fasta.gz
[M::main] Real time: 504.299 sec; CPU: 502.375 sec; Peak RSS: 3.100 GB

@fangli80
Copy link
Collaborator

fangli80 commented Nov 8, 2022

  1. There are both fastq and fasta files in your input folder. Please make sure your reference fasta file is NOT in the input folder. Otherwise it will be regarded as input reads as well.
  2. the alignment of the fastq file failed. Please run the following command and see what happened.
$FASTQ=/path/to/your/input/fastq
minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa $FASTQ > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sam

@543090lee
Copy link
Author

Hello,
This command uses a .fa file in the nextsv input folder, so I ran the above command, and removed .fa file.
Here is the result:
(nextsv3) [seichang@login4 2_aligned_bam]$ minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa $FASTQ > /u/project/zarlab/seichang/nextsv/nextsv_output/2_aligned_bam/sample_name.minimap2.sam
[M::mm_idx_gen::2.5570.94] collected minimizers
[M::mm_idx_gen::3.105
1.13] sorted minimizers
[M::main::3.1061.12] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::3.312
1.12] mid_occ = 133
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::3.454*1.11] distinct minimizers: 8335668 (88.18% are singletons); average occurrences: 1.322; average spacing: 5.573; total length: 61431566
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 --MD -t 2 -ax map-ont -N 10 /u/home/s/seichang/nextsv_input/chr19_new.fa /u/home/s/seichang/nextsv_input
[M::main] Real time: 3.484 sec; CPU: 3.866 sec; Peak RSS: 0.519 GB

Thank you

@543090lee
Copy link
Author

Also do you want me to remove the reference fasta file in the input folder when running the nextsv3.py or for running individual commands in the sh scripts?

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants