Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing "multi"-tag reads #22

Open
StevenWingett opened this issue Jun 8, 2020 · 1 comment
Open

Processing "multi"-tag reads #22

StevenWingett opened this issue Jun 8, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@StevenWingett
Copy link
Owner

The idealised view of a di-tag is not always correct, for a paired end-read may comprise components from not just two, but rather multiple regions of a genome. We have made steps to address this in a HelpDesk job which involved writing a “pre-HiCUP” script that cuts at DpnII HiC junctions in FASTQ reads. The script then takes these segments and generates all the segment-segment permutations into 2 new FASTQ files. (See http://www.bioinformatics.babraham.ac.uk/cgi-bin/helpdeskuser.cgi?action=show_job&public_id=divefeet)

We shall now expand on this, as described in the attached image from Mikhail.

Whiteboard 1 -01

@StevenWingett StevenWingett added the enhancement New feature or request label Jun 8, 2020
@StevenWingett
Copy link
Owner Author

The pipeline has three additional scripts:

hicup_combiner – this splits reads at the truncation sequence and creates new virtual read pairs

hicup_allocater – this allocates reads to “their” restriction fragment and adds this information to the read header.

hicup_prefilter – this removes identical fragment-fragment interactions from the same “di-tag group” (generated from a conventional read-pair). It also removes novel intra-fragment interactions generated by hicup_combiner (but keeps those that may have been generated as part of the conventional HiCUP pipeline).

The incorporation of scripts into the pipeline is quite fiddley (it can be just as difficult as writing the scripts). For this reason, when you run the pipeline, please put the input files in the current working directory. Also don’t specify the --outdir option and please run separate runs in separate folders.

I’ve quickly generated an Excel template for the production of graphs to visualise the results. Simply copy and paste the results from the “hicup_combinations_pipeline_summary_report.txt” final report file into the grey shaded box and the graphs will be generated (like HiCUP in the old days). All the results are in hicup_combinations_pipeline_summary_report.txt anyway. The final output file is *.hicup.bam, as before.

hicup_combinations_results_template.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant