Processing "multi"-tag reads #22

StevenWingett · 2020-06-08T16:47:50Z

The idealised view of a di-tag is not always correct, for a paired end-read may comprise components from not just two, but rather multiple regions of a genome. We have made steps to address this in a HelpDesk job which involved writing a “pre-HiCUP” script that cuts at DpnII HiC junctions in FASTQ reads. The script then takes these segments and generates all the segment-segment permutations into 2 new FASTQ files. (See http://www.bioinformatics.babraham.ac.uk/cgi-bin/helpdeskuser.cgi?action=show_job&public_id=divefeet)

We shall now expand on this, as described in the attached image from Mikhail.

StevenWingett · 2020-07-29T16:01:31Z

The pipeline has three additional scripts:

hicup_combiner – this splits reads at the truncation sequence and creates new virtual read pairs

hicup_allocater – this allocates reads to “their” restriction fragment and adds this information to the read header.

hicup_prefilter – this removes identical fragment-fragment interactions from the same “di-tag group” (generated from a conventional read-pair). It also removes novel intra-fragment interactions generated by hicup_combiner (but keeps those that may have been generated as part of the conventional HiCUP pipeline).

The incorporation of scripts into the pipeline is quite fiddley (it can be just as difficult as writing the scripts). For this reason, when you run the pipeline, please put the input files in the current working directory. Also don’t specify the --outdir option and please run separate runs in separate folders.

I’ve quickly generated an Excel template for the production of graphs to visualise the results. Simply copy and paste the results from the “hicup_combinations_pipeline_summary_report.txt” final report file into the grey shaded box and the graphs will be generated (like HiCUP in the old days). All the results are in hicup_combinations_pipeline_summary_report.txt anyway. The final output file is *.hicup.bam, as before.

hicup_combinations_results_template.xlsx

StevenWingett added the enhancement New feature or request label Jun 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing "multi"-tag reads #22

Processing "multi"-tag reads #22

StevenWingett commented Jun 8, 2020

StevenWingett commented Jul 29, 2020

Processing "multi"-tag reads #22

Processing "multi"-tag reads #22

Comments

StevenWingett commented Jun 8, 2020

StevenWingett commented Jul 29, 2020