GitHub - 1KFG/2019_dataset: Try again creating frozen dataset

2019 frozen 1KFG genome dataset

Setup your version with your own JGI password

$ git clone https://github.com/1KFG/2019_dataset.git
$ cd 2019_dataset
$ mv scripts/init_jgi_download.sh.template scripts/init_jgi_download.sh

edit init_jgi_download.sh and addin your JGI username and password.

Download the fungi.xml file (or edit the code change the code in init_jgi_download.sh to be 'pezizomycotina' instead of 'fungi' to point to the clade you care about for faster service)

$ bash scripts/init_jgi_download.sh 
# or
$ chmod +x scripts/init_jgi_download.sh 
$ ./scripts/init_jgi_download.sh
$ mkdir -p source/JGI
$ python scripts/jgi_download.py

This creates a file in lib/jgi_download.sh which are a series of 'curl' commands to download GFF, DNA, and CDS. The parsing of the XML does its best but the JGI files are not entirely consistent as to how to encode the presence of multiple versions of an annotation per species. They do not encode 'best and latest' as a category so it is difficult to totally know how to pull the correct one out. So you should check over and make sure you did get what was expected for many. There is a table provided in lib/jgi_fungi.csv to check what was selected.

If you are running on a machine with multiple processors and you want to parallelize downloading using the unix parallel tool

CPU=4 # e.g. 4 CPUs on this machine
$ cat lib/jgi_download.sh | parallel -j $CPU

Alternatively this will run in serial

$ bash lib/jgi_download.sh

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
Rscripts		Rscripts
isolate_selection		isolate_selection
jobs		jobs
lib		lib
scripts		scripts
.gitignore		.gitignore
2019_dataset.Rproj		2019_dataset.Rproj
FIXME		FIXME
README.md		README.md
sumstats_jgi.csv		sumstats_jgi.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

1KFG/2019_dataset

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages