scientific notation for large numbers of reads #371

map2085 · 2016-02-24T19:08:08Z

command: bedtools genomecov -d -g genome.size -ibam in.bam > out.cov

genome.size consists of a single RNA "myRNA", length = 8 kb nucleotides.

in.bam contains > 15 million reads aligned to "myRNA".

For some positions on the RNA, there is coverage > 1 million. But in the coverage file out.cov, BEDTOOLS uses scientific notation for the coverage at position with > 1million depth.

ex:

...
myRNA 1238 1.06464e+06
myRNA 1239 1.06471e+06
myRNA 1240 1.06482e+06
myRNA 1241 1.06494e+06
...

Can BEDTOOLS please be corrected to output the full, exact integer, NOT in exponent notation?

Thank you

The text was updated successfully, but these errors were encountered:

arq5x · 2016-02-29T17:12:20Z

Yep, I will look into it.

JYLeeBioinfo · 2022-06-16T00:48:39Z

Any update on this issue?

I also think writing large numbers as exact integers is needed, at least as an option.

camelest · 2022-10-21T01:08:49Z

@map2085 @hd00ljy Hi, I'm having the same problem with you. Did you find any way around?

JYLeeBioinfo · 2022-10-21T05:38:54Z

@camelest I tried bamCoverage in deepTools.
It generates binned bigwigs, which were sufficient for my purpose.
But if you need single bp resolution, the deepTools would not be the option

camelest · 2022-10-21T08:33:31Z

@hd00ljy Thank you so much for the information. I will give it a try!

brentp · 2022-11-28T21:16:59Z

@arq5x . what precision do we want for default here? 4, so like 123.4343 ?

camelest · 2022-11-28T23:21:30Z

@brentp @arq5x I think we want something like 1,064,643 (instead of 1.06464e+06).
(I also had similar problems mentioned here

By the way in a different situation when I perform
bedtools merge -c 5 -o sum -I input.bam
even if the sum exceeds > 1million, I don't find any scientific notation. I'm wondering where the difference comes.

brentp · 2022-11-29T00:39:18Z

oh. right. I was thinking this was a float. I will make a PR. Thanks for following up.

camelest · 2022-11-29T00:58:05Z

Thank you so much for your response!

see arq5x#371 arq5x#1015

brentp · 2022-11-29T22:50:21Z

bedtools.static.gz

Hi, care to try out this binary to see if it resolves your issue? Just gunzip, chmod +x and then use as ./bedtools.static genomecov ...

camelest · 2022-11-30T16:49:25Z

@brentp Thank you so much for your prompt response. I checked it but somehow still found "e"... Is it because I'm applying -5 options?

./bedtools.static genomecov -ibam input.bam -5 -bg -strand + > output.bedGraph
grep "e" output.bedGraph
chr11 65499044 65499045 1.0946e+06
chr19 48965308 48965309 1.24953e+07

brentp · 2022-11-30T17:25:25Z

Thanks for following up. Indeed, I see why the fix doesn't work. (The ternary must have a fixed type and it still chooses float).
I'll come up with a test-case and see how to resolve this.

brentp · 2022-11-30T21:28:44Z

so we can adjust this by messing with the fmtflags. Here is the test code using the number above from @camelest
output is shown after // comment.

#include <cstdint>
#include <iostream>

int main() {
    double f = 12495300.0;
    std::cout << f << std::endl; // 1.24953e+07

    std::cout.setf(std::ios_base::fixed);
    std::cout << f << std::endl; // 12495300.000000

    std::cout.precision(0);
    std::cout << f << std::endl; // 123000000
}

So, we can safely set to std::cout.setf(std::ios_base::fixed); which avoids the scientific notation.
And when scale == 1.0, I think we can't universally set precison(0), but I can add that (and restore default) in functions that should write integers.

@arq5x you ok with this approach?

I would use std::cout.setf(std::ios_base::fixed); in genomeCoverageMain.cpp
and the apply precision(0) and restore default in the appropriate functions.

Or, I could apply fixed and restore default in each function as well.

brentp · 2022-11-30T21:31:18Z

There are sections like this in ReportGenomeCoverage:

   // loop through the depths for the entire genome
    // and report the number and fraction of bases in
    // the entire genome that are at said depth.
    for (histMap::iterator genomeDepthIt = genomeHist.begin(); genomeDepthIt != genomeHist.end(); ++genomeDepthIt) {
        int depth = genomeDepthIt->first;
        CHRPOS numBasesAtDepth = genomeDepthIt->second;

        cout << "genome" << "\t" << depth << "\t" << numBasesAtDepth << "\t"
            << genomeSize << "\t" << (float) ((float)numBasesAtDepth / (float)genomeSize) << endl;
    }

where we want the floating (or scientfic format) retained. For what I proposed above, we'd always get the float (with apparently 6 decimals of precision) but depending on how we do it, we could get a scientific format in some cases.

arq5x · 2022-11-30T22:38:52Z

I wonder if moving to printf or separate court calls would be better so that we can have fine-scale control over the precision for integers (such as depth) and floats (such as fraction of bases at depth)

brentp · 2022-11-30T22:49:30Z

Integers are printed as expected. The problem is the scale parameter which turns ints into floats even with it is 1.0.

This program:

#include <cstdint>
#include <iostream>

int main() {
        double f = 12495300.0;
        int64_t i = 880000000000;
        std::cout << f << " i:" << i << std::endl; // 1.24953e+07

        std::cout.setf(std::ios_base::fixed);
        std::cout << f << " i:" << i << std::endl; // 12495300.000000

    std::cout.precision(0);
        std::cout << f << " i:" << i << std::endl; // 123000000
}

produces this output:

1.24953e+07 i:880000000000
12495300.000000 i:880000000000
12495300 i:880000000000

camelest · 2022-12-03T01:34:33Z

@brentp @arq5x Thank you so much for following up. It would be really nice if we can choose to output integer as in "precision(0)".

brentp · 2022-12-05T16:47:48Z

I will make a PR that sets precision(0) where appropriate and when scale == 0. It will reset precision to the default after each function call.

closes arq5x#371

brentp · 2022-12-05T18:44:12Z

@camelest . I added a test this time for the latest fix, but if you could verify that this works on your data, it would be much appreciated.

bedtools.static.gz

camelest · 2022-12-06T11:40:21Z

@brentp I tested it and it worked perfect! Thank you so much for your quick solution.

./bedtools.static genomecov -ibam input.bam -5 -bg -strand + > output.bedGraph
grep "65499044" output.bedGraph
chr11 65499044 65499045 1094599

camelest mentioned this issue Oct 20, 2022

How to avoid scientific notation #1015

Closed

brentp added a commit to brentp/bedtools2 that referenced this issue Nov 29, 2022

use integer when scale == 1.0

e470be7

see arq5x#371 arq5x#1015

brentp mentioned this issue Nov 29, 2022

genomecov: use integer when scale == 1.0 #1030

Merged

brentp added a commit to brentp/bedtools2 that referenced this issue Dec 5, 2022

fix coverage formatting when scale == 1

2f7f4b4

closes arq5x#371

brentp mentioned this issue Dec 5, 2022

fix coverage formatting when scale == 1 #1031

Merged

brentp added a commit to brentp/bedtools2 that referenced this issue Dec 5, 2022

fix coverage formatting when scale == 1

f66261f

closes arq5x#371

arq5x closed this as completed in #1031 Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scientific notation for large numbers of reads #371

scientific notation for large numbers of reads #371

map2085 commented Feb 24, 2016

arq5x commented Feb 29, 2016

JYLeeBioinfo commented Jun 16, 2022 •

edited

Loading

camelest commented Oct 21, 2022 •

edited

Loading

JYLeeBioinfo commented Oct 21, 2022 •

edited

Loading

camelest commented Oct 21, 2022

brentp commented Nov 28, 2022

camelest commented Nov 28, 2022

brentp commented Nov 29, 2022

camelest commented Nov 29, 2022

brentp commented Nov 29, 2022

camelest commented Nov 30, 2022

brentp commented Nov 30, 2022 •

edited

Loading

brentp commented Nov 30, 2022

brentp commented Nov 30, 2022

arq5x commented Nov 30, 2022

brentp commented Nov 30, 2022

camelest commented Dec 3, 2022

brentp commented Dec 5, 2022

brentp commented Dec 5, 2022

camelest commented Dec 6, 2022

scientific notation for large numbers of reads #371

scientific notation for large numbers of reads #371

Comments

map2085 commented Feb 24, 2016

arq5x commented Feb 29, 2016

JYLeeBioinfo commented Jun 16, 2022 • edited Loading

camelest commented Oct 21, 2022 • edited Loading

JYLeeBioinfo commented Oct 21, 2022 • edited Loading

camelest commented Oct 21, 2022

brentp commented Nov 28, 2022

camelest commented Nov 28, 2022

brentp commented Nov 29, 2022

camelest commented Nov 29, 2022

brentp commented Nov 29, 2022

camelest commented Nov 30, 2022

brentp commented Nov 30, 2022 • edited Loading

brentp commented Nov 30, 2022

brentp commented Nov 30, 2022

arq5x commented Nov 30, 2022

brentp commented Nov 30, 2022

camelest commented Dec 3, 2022

brentp commented Dec 5, 2022

brentp commented Dec 5, 2022

camelest commented Dec 6, 2022

JYLeeBioinfo commented Jun 16, 2022 •

edited

Loading

camelest commented Oct 21, 2022 •

edited

Loading

JYLeeBioinfo commented Oct 21, 2022 •

edited

Loading

brentp commented Nov 30, 2022 •

edited

Loading