Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

Issue using visual bag of words with large images #769

Closed
JakeTrans opened this issue Aug 12, 2017 · 7 comments
Closed

Issue using visual bag of words with large images #769

JakeTrans opened this issue Aug 12, 2017 · 7 comments

Comments

@JakeTrans
Copy link

I have been using the Visual Bag of Words to identify different types of standard scanned documents (the goal being to sort the 4 different with the possibility of using this classification to look for specific data within these document.

I have found an issue when doing the classification on large pictures (about 4032 x 3024), the numbers involved overflow the GetRectangle function, I have look at the source code and changed the integers to 64 bit integer however these increased the memory usage by an extreme amount. Downscaling the image to smaller dimension does also fix the problem.

I will continue to test this however would there be a more efficient/accurate way of doing this I’m conscious I’m new to Machine Learning and may be using the wrong tool to do this
To replicate the issue take the Visual Bag of word example and expand one of the picture to 4032 x 3024 and attempt the compute the bag of words this will cause the error in question.

Thanks,

@cesarsouza
Copy link
Member

Hi @JakeTrans,

Thanks a lot for opening this issue. I would say that changing the GetRectangle method to use Int64 instead of Int32 would actually be the correct route here. The problem we might have to address then is how to deal with the increased memory requirements of your large images.

When using Bag-of-(Visual)-Words, it is not actually necessary to use all feature points to learn the clustering algorithm, or even to create the final representation of the image. I would say that the code needs to be updated to consider only a subsample of all features instead of all of them, thus reducing the amount of data you would need to keep in memory.

As such, if you have already gone through the work of updating the code, you could try altering this line I am linking here and which I am also reproducing below:

     descriptors[i] = detector.ProcessImage(x[i]).Select(p => p.Descriptor).ToArray();

And update it to return not all descriptors, but only some of them, let's say randomly sampling 1000 descriptors maximum per image (you can also make this quantity configurable by adding it as a property of the BaseBagOfVisualWords class). You can use the Sample method for that to help with the sampling.

This will cause the vocabularies to be created using only a subset of the descriptors and hopefully decrease the memory requirements of the problem you are trying to tackle.

Please let me know if it helps!

@JakeTrans
Copy link
Author

Hi @cesarsouza

Thank you very much for your response, I will look into this at let you know the results.

@JakeTrans
Copy link
Author

I have converted the GetRectangle over to Int64 and also a number of related methods and with a computer with more memory (the computer I have been working with only had 4GB) I been able to sucessfully train the Bag of Words without needing to sample the classifiers and make a small improvement in the memory handling. I will look into the sampling of the descriptions separately as this could be useful anyway in my project.

The small improvement I made was to add the below overload to the learn function in BaseBagOfVisualWords:

    public TModel Learn(string[] x, double[] weights = null)
      {
          // Note: See note in the method below
          var descriptors = new TFeature[x.Length][];
          // For all images
          For(0, x.Length, (i, detector) =>
          {
              Bitmap ImageToLearn = (Bitmap)Bitmap.FromFile(x[i]);
              // Compute the feature points
              descriptors[i] = detector.ProcessImage(ImageToLearn).Select(p => p.Descriptor).ToArray();
              ImageToLearn.Dispose();
          });
          return Learn(descriptors.Concatenate());
      }

As this overload accepts the filepath as the array rather then the images themselves and loads the image on-demand so to speak this does cut down the requirements in my case the Bitmap list use around 5-6GB of RAM at peak and this version took around 4-5GB at peak.

I will look further into the sampling but please let me know if this would be of any use to you.

Thank you for your help

JakeTrans

@cesarsouza
Copy link
Member

Hi @JakeTrans,

Thanks a lot, it is also nice to know this strategy worked well for you. I guess that since this method is doing lots of IO it would also be a good candidate to be written using Async as mentioned in #635. I might have to think a little bit about what would be the best interface/API to offer here, but I think this could indeed be a nice addition to the framework! Thanks a lot!

(By the way - if anyone reading this would like to work on this issue and implement it while using async also please let me know and feel free to submit a PR!)

Regards,
Cesar

@JakeTrans
Copy link
Author

Hi @cesarsouza

Thank you for the response, please forgive me but this is the first project I've raised a issue with or modified, but I would like to be sure. would you like me to submit a pull request with my changes? at lease as a stopgap until #635 has been worked on.

Thank you,

Jake

@cesarsouza
Copy link
Member

Well, I was thinking about waiting a little bit until I could decide what would be the best way to expose this feature (there might be other places that might also benefit from having images represented by filenames rather than Bitmaps). However, if you are willing to submit a PR with your changes, please go ahead and I can figure it out later 😄

Thanks a lot!

Regards,
Cesar

@cesarsouza
Copy link
Member

Support for computing bag of visual words from file names, as well as options for sampling feature descriptors and images from the training set, has been added in 3.8.0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants