Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect output #9

Open
drmjc opened this issue Jul 31, 2013 · 11 comments
Open

incorrect output #9

drmjc opened this issue Jul 31, 2013 · 11 comments

Comments

@drmjc
Copy link

drmjc commented Jul 31, 2013

Hi Aaron,
Whilst on the way to getting up to speed on gemini, I've had a play with grabix. Unfortunately it's not playing nicely on our CentOS systems.
Following the examples in the README.md, 'grab' always returns the first row in BED file, N times & 'random' gives me a seg fault:

$ bgzip simrep.chr1.bed
$ zcat simrep.chr1.bed.gz | head -n 10
chr1 10000 10468 trf 789
chr1 10627 10800 trf 346
chr1 10757 10997 trf 434
chr1 11225 11447 trf 273
chr1 11271 11448 trf 187
chr1 11283 11448 trf 199
chr1 19305 19443 trf 242
chr1 20828 20863 trf 70
chr1 30862 30959 trf 79
chr1 44835 44876 trf 73

$ ./grabix index simrep.chr1.bed.gz
-- worked fine

-- this reports the 1st line, not the 100th
$ ./grabix grab simrep.chr1.bed.gz 100
chr1 10000 10468 trf 789

-- this reports the first line 10x
$ ./grabix grab simrep.chr1.bed.gz 100 110
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789
chr1 10000 10468 trf 789

$ ./grabix random simrep.chr1.bed.gz 10
Segmentation fault

some debugging info

$ uname -a
Linux gamma01.local 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue Apr 17 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux
$ ldd grabix
linux-vdso.so.1 => (0x00007fff57000000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000032cac00000)
libz.so.1 => /lib64/libz.so.1 (0x00000032c8c00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000032ca000000)
libc.so.6 => /lib64/libc.so.6 (0x00000032c7c00000)
libm.so.6 => /lib64/libm.so.6 (0x00000032c8000000)
/lib64/ld-linux-x86-64.so.2 (0x00000032c7800000)
$ cat /etc/centos-release
CentOS release 6.2 (Final)

My C's a little rusty, so hopefully you'll know what's going on far faster than me
cheers,
Mark

@gr8ape
Copy link

gr8ape commented Sep 3, 2013

I am having precisely the same problem as Mark

$> uname -a
Linuxn0640.edu 2.6.32-358.6.2.el6.x86_64 #1 SMP Tue May 14 15:48:21 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
$> ldd grabix
linux-vdso.so.1 => (0x00007fffd9bff000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000384ae00000)
libz.so.1 => /lib64/libz.so.1 (0x00000038fc600000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000384a600000)
libc.so.6 => /lib64/libc.so.6 (0x0000003844200000)
libm.so.6 => /lib64/libm.so.6 (0x0000003844600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003843a00000)

@gr8ape
Copy link

gr8ape commented Sep 4, 2013

Mark,

A colleague of mine, Daniel Kinnamon, was gracious enough to walk through the gdb debgugger with me, and, after some iniitial frustration, we think we found the problem. A simple adjustment fixes the incorrect output issue under CentOSx64. Seems to be an issue with what C/C++ standard is used during compilation that impacts the grabix index building process. But instead of trying to figure out what compilation options would get it to work, we just adjusted the source to circumvent the issue. (presumably, there is a compiler option that would alter how default declarations work.)

The line immediately below in grabix.cpp intends to set both prev_offset and offset variables to 0. The first variable is set via defaulting to zero, and the second by a direct declaration. But this only happens as intended on some (apparently most) platforms, depending on compiler settings. When we stepped through, prev_offset just contained trash on our system. So changing the line below so that both variables are independently set to zero fixes the issue.

int64_t prev_offset, offset = 0;

while ((status = bgzf_getline(bgzf_fp, '\n', line)) >= 0)
{
    offset = bgzf_tell (bgzf_fp);
    if (line->s[0] != '#')
        break;
    prev_offset = offset;
}
index_file << prev_offset << endl;

Replacing that first line above with the following does the trick. Although note that I wasn't certain if offset was also intended to be of type int64_t as well. The adjustment below seems to work, in any case.

int64_t prev_offset = 0;
int64_t offset = 0;

Hopefully this will help others who experienced similar frustrations trying to get gemini up and running...

Cheers,
Dale

@arq5x
Copy link
Owner

arq5x commented Sep 5, 2013

Thanks all, I will try to incorporate fixes for this and another outstanding bug soon. I have been horribly bogged down with other deadlines. Thanks for the patience.

@drmjc drmjc closed this as completed Sep 5, 2013
@drmjc drmjc reopened this Sep 5, 2013
@drmjc
Copy link
Author

drmjc commented Sep 5, 2013

Thanks Dale & Aaron, great stuff.
that fix sounds like it would fix the random subset mode, but perhaps not the 'always selecting the same first rown N times bug'?

M

@gr8ape
Copy link

gr8ape commented Sep 5, 2013

Mark,
it actually fixes the "always selecting the same row" issue. it's an odd side-effect of the index being mucked up. I hadn't checked the random pull function b/c I wasn't making use of it. just checked, and this does not fix it. However, I suspect it's a similar declaration issue in a different part of the code. will try to take a look tomorrow

@drmjc
Copy link
Author

drmjc commented Sep 5, 2013

ah ok, sorry I should have looked at the context of your fix!

@arq5x
Copy link
Owner

arq5x commented Sep 6, 2013

I just pushed a change that should address these issues.

@gr8ape
Copy link

gr8ape commented Sep 6, 2013

Thanks Aaron

@drmjc
Copy link
Author

drmjc commented Sep 8, 2013

Hi Aaron,
thanks for pushing those changes, much appreciated! They've fixed the 'grab' command, but i'm still segfaulting on the random command:

$ grabix random simrep.chr1.bed.gz 10
Segmentation fault: 11

@gr8ape
Copy link

gr8ape commented Sep 8, 2013

hi mark, aaron,

I took a quick look through, but I have little experience debugging c code.
I tried the previous initialization trick, but no luck this time.

The segfault in random (run on the included sample file and requesting 10 lines) occurs immediately following the last line below within the bgzf.c library, although I'm assuming the actual problem is somewhere upstream and it already has bad memory locations to work with at this point.

beyond my know-how to fix. could be another default initialization issue somewhere else in the code,
but I haven't found it. otherwise not sure why this would be specific to Mark and I's systems.

In any event, hope this helps narrow down the problem.

Best,
Dale

int bgzf_getline(BGZF *fp, int delim, kstring_t *str)
{
    int l;
    l = 0;
    int state;
    state = 0;
    unsigned char *buf = (unsigned char*)fp->uncompressed_block;
    str->l = 0;
    do {
        if (fp->block_offset >= fp->block_length) {
            if (bgzf_read_block(fp) != 0) { state = -2; break; }
            if (fp->block_length == 0) { state = -1; break; }
        }
        for (l = fp->block_offset; l < fp->block_length && buf[l] != delim; ++l);
        if (l < fp->block_length) state = 1;
        l -= fp->block_offset;
        if (str->l + l + 1 >= str->m) {
            str->m = str->l + l + 2;
            kroundup32(str->m);
        str->s = (char*)realloc(str->s, str->m);

}
        memcpy(str->s + str->l, buf + fp->block_offset, l);

@janxkoci
Copy link

janxkoci commented Apr 1, 2022

FYI I installed grabix with conda install -c conda-forge -c bioconda grabix on CentOS7 and I cannot reproduce the issue. It's either been fixed or the bioconda folks use a good compiler. Either way, it can solve your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants