Some missing details #1

sdwfrost · 2017-02-16T11:20:38Z

Hi @evogytis @rambaut @plemey @trvrb @msuchard

A few things in the repository that I couldn't find in the biorXiv paper:

How was the ML tree reconstructed?
Which putative ADAR edited sites/sequences were masked? Are the data in Data/ the masked or unmasked data?
How were missing dates imputed?
Any update on the missing accession numbers?

Don't mean to be a pain, but I'd much rather use a common resource rather than try to reproduce with subtly different results.

The text was updated successfully, but these errors were encountered:

rambaut · 2017-02-16T15:14:09Z

Hi Simon, Now the paper is out the way, we are working on updating the github READMEs with as much detail as possible. Gytis - when you have a minute could you put some details about the points below? I will describe the date imputing. On another note, the full output of BEAST (2 independent runs with trees etc) have a DOI: http://dx.doi.org/10.7488/ds/1711 <http://dx.doi.org/10.7488/ds/1711> Andrew

…

On 16 Feb 2017, at 11:20, Simon Frost ***@***.***> wrote: Hi @evogytis <https://github.com/evogytis> @rambaut <https://github.com/rambaut> @plemey <https://github.com/plemey> @trvrb <https://github.com/trvrb> @msuchard <https://github.com/msuchard> A few things in the repository that I couldn't find in the biorXiv paper: How was the ML tree reconstructed? Which putative ADAR edited sites/sequences were masked? Are the data in Data/ the masked or unmasked data? How were missing dates imputed? Any update on the missing accession numbers? Don't mean to be a pain, but I'd much rather use a common resource rather than try to reproduce with subtly different results. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAp5RQL_eMT4C1neKTV2ZpA2RqaFs97Dks5rdDEGgaJpZM4MC47g>.

evogytis · 2017-02-16T20:01:15Z

Hey @sdwfrost,

Responses in order:

I didn't generate the ML tree, but I imagine PhyML + HKY+G was used. @rambaut can confirm if true. We don't mention the ML tree anywhere in text as far as I know.
All the data we share is masked. Masking is easy to identify because we use ?s instead of Ns. Any ? used to be a C. I have a Jupyter notebook that takes in unmasked alignments and highlights problematic sequences/areas.
As far as I can tell we're missing 431 accessions. Didn't realise it was this bad. Most of the sequences missing accessions are EM, DML, IP, WHO or USAMRIID. We might have USAMRIID accessions but haven't updated the sequence names. Not sure where we're at with the other accessions.

sdwfrost · 2017-02-17T09:48:09Z

Thanks @evogytis @rambaut

Was the data partitioned for the analysis, as in the BEAST runs? If not, I can run one myself. Just putting in the PhyML log files would be sufficient to see what was done.
Thanks for discriminating masking versus Ns! That'll be an easy fix. If you could also share the notebook, that would be really handy.
Those are some hefty BEAST runs linked from the doi....I'm curious as to how many iterations of the models you went through.

evogytis · 2017-02-17T19:36:51Z

Hey @sdwfrost

Depends if @rambaut used the Geneious or the command line version. I imagine the former is the case.
This is the notebook that I've been using: EBOV_scrutiny.ipynb.zip. This is the consensus sequence that I've used: EBOV_consensus.fasta.zip. The script identifies each gene based on how the consensus aligns to the dataset, highlights ADAR sites and can output an alignment in CDS+ig format. Apologies for lack of comments too, didn't think I'd have other eyes on it.

rambaut · 2017-02-17T19:39:18Z

I have a command-line script that bats back and forth between phyml (to create an initial tree using NJ), RAXML to search topologies, and back to Phyml to improve branch lengths. Am re-running on the 1610 data here and will upload all in a couple of days.

rambaut · 2017-02-17T19:42:11Z

Missing accession numbers are from the Quick et al MinION sequencing. This is because although the raw data were on ENA, the consensuses were simply on Nick's github. These have recently been deposited in genbank so will endeavour to match accession to sequence in the tables. Creating Issue...

BEAST-Community · 2018-06-04T03:34:31Z

Very nice work! @rambaut @evogytis @msuchard @plemey .
@rambaut, could you tell me how to how to back and forth between phyml and RAxML to get a better tree, thank you.

evogytis · 2018-06-04T16:42:43Z

@BEAST-Community the script is now in the repo with 67a36db.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some missing details #1

Some missing details #1

sdwfrost commented Feb 16, 2017

rambaut commented Feb 16, 2017 via email

evogytis commented Feb 16, 2017

sdwfrost commented Feb 17, 2017

evogytis commented Feb 17, 2017

rambaut commented Feb 17, 2017

rambaut commented Feb 17, 2017

BEAST-Community commented Jun 4, 2018

evogytis commented Jun 4, 2018

Some missing details #1

Some missing details #1

Comments

sdwfrost commented Feb 16, 2017

rambaut commented Feb 16, 2017 via email

evogytis commented Feb 16, 2017

sdwfrost commented Feb 17, 2017

evogytis commented Feb 17, 2017

rambaut commented Feb 17, 2017

rambaut commented Feb 17, 2017

BEAST-Community commented Jun 4, 2018

evogytis commented Jun 4, 2018