Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download modzs.gctx #3

Closed
stuppie opened this issue Aug 25, 2016 · 5 comments
Closed

Download modzs.gctx #3

stuppie opened this issue Aug 25, 2016 · 5 comments

Comments

@stuppie
Copy link

stuppie commented Aug 25, 2016

I can't figure out where to get modzs.gctx, which is needed to construct the signature dataframe sig_expr_df in consensi.ipynb. From here, you say:

The z-score signature vectors are retrieved from the /xchip/cogs/data/build/a2y13q1/modzs.gctx file on the C3 cloud.

But this was 2 years ago and the link doesn't work anymore. Also, I'm not exactly sure what this file is exactly or how it was generated.

I appreciate your help in advance!

@dhimmel
Copy link
Owner

dhimmel commented Aug 25, 2016

@stuppie, It looks like the lincscloud website is no longer functional. I'll upload this file to figshare.

@dhimmel
Copy link
Owner

dhimmel commented Aug 25, 2016

The file is 42.5 GB which exceeds my figshare quota. I sent figshare an email to see if they can make an exception. In the meantime, I'm running an aggressive compression:

xz --extreme -9 --threads=0 --verbose --keep modzs.gctx

I expect the compression ratio to be small however (~15%) since the file is already compressed, hence the x in the .gctx extension.

@stuppie
Copy link
Author

stuppie commented Aug 25, 2016

Great. Thanks. Can you tell me what this file is exactly? It contains the CD (characteristic directions) / Z-scores / etc for all perturbations for all 22k probes?

Is this data the same as what is present here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70138

@dhimmel
Copy link
Owner

dhimmel commented Aug 26, 2016

My understanding is that modzs.gctx is the LINCS L1000 data at the SIG stage in the following pipeline:

l1000_data_flow

In other words, modzs.gctx is a LINCS L1000 legacy dataset of differential expression signatures. It contains a matrix of signatures and probes. Each value is a differential expression z-score. This file belongs in the download directory of this repository, but was not uploaded to GitHub due to large file size. See the "Differential Expression (Signature Generation)" section of this help page, for more information on signatures.

modzs.gctx is a file that can be read into python using cmap/l1ktools. The cmap directory of this repository is copied from cmap/l1ktools, with perhaps some small modifications (I forget / should use a submodule next time).

@stuppie does this make sense? If you are just looking for the consensus signatures we generated, you can download those on figshare.

The GEO SuperSeries seems to correspond to the Level 4 data, although I'm not sure what if any differences there are. If you are starting fresh work, I assume the L1000 team would prefer if you use the official GEO datasets. But for reproducibility and extensibility of this repository, I'll work on making modzs.gctx available.

@dhimmel
Copy link
Owner

dhimmel commented Aug 29, 2016

I posted modzs.gctx to figshare. Thanks figshare for temporarily raising their file size limit and allowing this upload!

@stuppie, let me know if you need anything else. In general, I advise working with the output datasets from this repository or the raw production LINCS L1000 data from GEO, since I'm not sure if the LINCS L1000 team is still providing support for using modzs.gctx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants