-
Notifications
You must be signed in to change notification settings - Fork 2k
Manipulated array throws “Index was outside the bounds of the array” exception #802
Comments
Hi TG, Thanks for opening the issue! Yes, Accord does offer an implementation for cross-validation. You can find it in the Accord.MachineLearning.Performance namespace, and an example is available at bottom of the aforelinked documentation page. Regarding the exception, I think this might be happening because one of your folds didn't contain all the possible class labels or symbol values in its training set, but did contain them in the testing set. In other words, the testing set contained either classes or symbols not actually expected by the model because it had never seen them during the training phase. To overcome this situation, it should be possible to create the NaiveBayes model manually and specify how many classes and symbols it should be actually expecting. Then, you can pass it to the "Model" property of the NaiveBayesLearning object you have just created. Something like this: ...
var partialLearner = new NaiveBayesLearning()
{
Model = new NaiveBayes(numberOfClasses, numberOfSymbolsForVar1, numberOfSymbolsForVar2, ... etc)
};
// Learn a Naive Bayes model from the examples
NaiveBayes pnb = partialLearner.Learn(learnDataInputs.ToArray(), learnDataOutputs.ToArray()); Please let me know if it helps. The exception should have thrown a more descriptive message if this is the case. Regards, |
Just make sure that you retrieve the number of classes and number of symbols from your entire training set and not compute it from the fold. If you have used a codebook to codify your dataset, you can retrieve those values from there, for example (as long as you have created this codebook outside of the cross-validation scheme). |
Hi cesar. I will use the cross validation of Accord and inform you about being successful. But in my problem the exception occurs while the learn method gets called for the first fold. Please check the image I've added in the question. So the problem should be somewhere else. As the screen shot shows the length of input and output arrays are the same. Any other ideas to check? |
Hi @ConductedClever, it should be possible to have more classes or variable symbols in the testing set than in the training set in any of the folds, including the first. If you still get the issue even after specifying the model parameters manually, do you think you can provide an example of your data such I can reproduce the issue here? You can use the .ToCSharp() extension methods (provided by the Matrix class) to convert your double[][] and int[][] matrices and arrays into C#, such that you could paste them here (unless they may contain sensitive information you would like to avoid posting online). |
As I noted above, the exception occurs in the learning phase not in the testing (Please check this screen shot again). So your idea about class count and variable count, I think does not point the problem. Yes of course; this is the data (and the source code is the same of main question): The entire input data:
The entire output data:
The first fold learn input data:
The first fold learn output data:
Thanks for your time spending. |
@cesarsouza if the data before codification also helps tell me to add it. Thanks. |
Hi @ConductedClever, obviously you are completely right. I misunderstood the situation yesterday. I believe I will be able to try out your data examples (many thanks for that) in the next few hours. However, it can be that my initial suggestion about how it could be solved may still be right. What can be happening is that NaiveBayes (or rather, GeneralDiscreteDistribution that is used by NaiveBayes) is not being able to recognize gaps between symbols in your discrete data set. Specifying the number of symbols directly instead of letting the learning algorithm figure it out might be a workaround. I will investigate and report soon. Many thanks for including the example data in the issue. Regards, |
Hi @cesarsouza. You're very welcome. So if your suggestion goes right, moving the codification phase after partitioning phase willl correct the problem. I will try this solution and inform you about that. Thanks. |
Hi @ConductedClever, For example, using the following code might work around the problem you have encountered: var teacher = new NaiveBayesLearning()
{
Model = new NaiveBayes(2, 2, 7, 8, 17)
}; This should make sure that any Naive Bayes models are created using the correct number of symbols for each of the variables being learned (as the number of symbols in your 4 variables are, respectively, 2, 7, 8, and 17 - and the first 2 in the constructor are the number of classes). Regards, |
Anyways, I guess this could have been handled better by the framework itself, I will include some updates in the next version to make this more automatic. Thanks again for opening the issue. |
By the way, as a last question: can I include the data you have provided as a unit test in the framework under the LGPL license? |
Hi @cesarsouza. I will try your solution tomorrow (about 7 hours later) and notify you from the result. Yes, with that update developers will have a better life 😀. Of course you can. It is my proud and pleasure. Thanks. |
Hi @cesarsouza , I have tested your solution and everything just went right. I was just forget to reflect the result here. Thanks a lot. |
Available in 3.8.0. |
What would you like to submit? (put an 'x' inside the bracket that applies)
Issue description
Hi. I am developing a dot net core 2.0 application using Accord.net 3.7.0. When I give the entire array as learning input everything goes right, but when I divide the array for the purpose of "K-Fold Cross Validation" the exception of “Index was outside the bounds of the array” occurs. The mentioned source code is the following:
And some additional data there exists in this screen shot:
exception screen shot
Questions:
1- Is there any "K-Fold Cross Validation" functionality implemented in Accord itself?
2- What makes this exception occur? (Is it a bug?)
3- I have previously encountered same exception here. Was that solution the best one? (although current problem differs that) (Is it a bug?)
Thanks in advance. TG.
The text was updated successfully, but these errors were encountered: