Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMDB source doesn't work under valgrind #2404

Closed
flx42 opened this issue May 2, 2015 · 11 comments · Fixed by #3731
Closed

LMDB source doesn't work under valgrind #2404

flx42 opened this issue May 2, 2015 · 11 comments · Fixed by #3731

Comments

@flx42
Copy link
Contributor

flx42 commented May 2, 2015

Originally reported by @thatguymike

Simple repro, after downloading and creating the MNIST model (see examples/mnist/readme.md):

$ valgrind ./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt 
[...]
F0501 18:17:36.543970 20545 db.hpp:109] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument
[...]

Note that valgrind reports no errors before this (excluding the proverbial CUDA driver errors).
The error was generated by the following line in db.cpp:
https://github.com/BVLC/caffe/blob/master/src/caffe/util/db.cpp#L33

Using strace, I found that lmdb tries to mmap(2) the existing database file (examples/mnist/mnist_train_lmdb/data.mdb) with a size of 1 TB (because of variable LMDB_MAP_SIZE):

mmap(NULL, 1099511627776, PROT_READ, MAP_SHARED, 31, 0) = 0x7e2832583000

This might be the issue, I don't fully understand how valgrind works, but for heap allocation it has to shadow ("duplicate") the memory in order to keep track of uninitialized values with a bit-level accuracy.
This seems to be also true for mmaped memory since on my system with 16 GB the mmap allocation limit was around ~7.2 GB (so 14.4 GB with shadow memory).

I don't think this is a LMDB bug, but a limitation of valgrind. So, I only see two ways of tackling this:

  1. Mention this limitation in the documentation, to prevent other people from staring blankly at this puzzling issue.
  2. Try to find a workaround, do we really need to set the map size to 1TB? Maybe in some cases we can avoid that, for instance if we know the DB is read-only, maybe we can set LMDB_MAP_SIZE == size of the database file. If the database can be modified by another writer simultaneously, I suppose this workaround is not possible.

Thoughts? Other ideas?

@hyc
Copy link

hyc commented May 2, 2015

valgrind has a hardcoded limit on how much memory it will handle. The limit is arbitrary and can be changed by recompiling valgrind. I think 1TB may be a bit much for it, but we have certainly used valgrind with LMDB before.

http://stackoverflow.com/questions/8644234/why-is-valgrind-limited-to-32-gb-on-64-bit-architectures

@flx42
Copy link
Contributor Author

flx42 commented May 2, 2015

valgrind has a hardcoded limit on how much memory it will handle. The limit is arbitrary and can be changed by recompiling valgrind.

Sure, but if valgrind still needs to shadow the memory, it won't solve this issue since I don't have 2TB of RAM.

I think 1TB may be a bit much for it, but we have certainly used valgrind with LMDB before.

I don't argue with that, as I said, it works if I modify the map size to 7GB (~half my RAM). I'm trying to find a solution to accommodate all reasonable sizes of database while still allowing valgrind to be used.

@rohrbach rohrbach added the JL label May 4, 2015
@longjon longjon added upstream issue and removed JL labels May 8, 2015
@acgtyrant
Copy link

@flx42 So what is the solution now? Thank you!

@flx42
Copy link
Contributor Author

flx42 commented Nov 19, 2015

I don't think anything has changed on this side.

@awan-10
Copy link

awan-10 commented Jan 16, 2016

Is there a fix for this problem? I am seeing the same issue with valgrind.

@mrgloom
Copy link

mrgloom commented May 18, 2016

Same issue when I trying to create new lmdb

    string db_type= "lmdb";
    pDatabase= db::GetDB(db_type);
    string path= dbPath.string();
    cout << path << endl;
    pDatabase->Open(path, db::NEW);

F0518 15:43:50.384389 21482 db_lmdb.hpp:15] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument

But I'm not using Valgrind.

@cbare
Copy link

cbare commented Jun 15, 2016

I see the same error as "mrgloom" above while trying to run convert_mnist_data on the dockerized caffe (hosted on OS X, if that matters):

$docker run -ti --rm --volume=$(pwd):/workspace caffe:cpu bash mnist/create_mnist.sh
Creating lmdb...
libdc1394 error: Failed to initialize libdc1394
F0615 02:55:42.737716     7 db_lmdb.hpp:15] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument
*** Check failure stack trace: ***
    @     0x7f8718fcbdaa  (unknown)
    @     0x7f8718fcbce4  (unknown)
    @     0x7f8718fcb6e6  (unknown)
    @     0x7f8718fce687  (unknown)
    @     0x7f8719342361  caffe::db::LMDB::Open()
    @           0x402b8f  convert_dataset()
    @           0x40261d  main
    @     0x7f87181dbf45  (unknown)
    @           0x402666  (unknown)
    @              (nil)  (unknown)
mnist/create_mnist.sh: line 17:     7 Aborted                 $BUILD/convert_mnist_data.bin $DATA/train-images-idx3-ubyte $DATA/train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND} --backend=${BACKEND}

I end up with an 8k file called "lock.mdb" in each of the train and test folders.

@cbare
Copy link

cbare commented Jun 15, 2016

...and a little more searching indicates my particular issue may be a lack of support for memory mapped files when mounting host folders in docker / boot2docker / vitualbox. Maybe similar to docker-library/mongo#30 and maybe only superficially similar to the other issues here.

Also, convert_mnist_data works when writing inside the container's own filesystem rather than to a shared folder.

@mrgloom
Copy link

mrgloom commented Jun 15, 2016

I found that problem was that for some reason lmdb can't create it's database in shared folder (I was running caffe on Ubuntu 14.04 in VirtualBox).
So I think problem as you say is in lack of support for memory mapped files.

@guotong1988
Copy link

guotong1988 commented Jan 22, 2017

@cbare Same problem .. How to solve .. Thank you ..

@cutd
Copy link

cutd commented Jan 5, 2018

@guotong1988 don't use share folder.user a local folder is ok for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants