-
Notifications
You must be signed in to change notification settings - Fork 4.7k
the line count of text file is different between prediction result count #344
Comments
Hello @chuangys, Thank you for your post. This used to be an error in the past, did you checkout the latest version of fastText? Thank you, |
@cpuhrsch In fact, I running at windows environment and suffered many issue in the beginning. However, find the fasttext execution file build by "xiami" at github. It worked fine. He built up the file at May/2017. Not sure if it the the latest update version code. But here is the only one resource I can run fasttext now. |
@cpuhrsch After downloading the last source code, and build the exe file by WinGW, and can execute now. But still show the sample problem. Input 1537584 lines for prediction but return only 1537490 lines results. |
Recently I started seeing a similar issue. I'm seeing 667 results when only 600 lines are fed in. The results have a few portions that look like this:
It doesn't seem right that there should be a number greater than 1. This has been happening for a week or so. Source is built fresh on every push via CI. The number of seemingly duplicated erroneous "books" labels is 57, so does not quite account for the extra lines, but is close enough and silly enough to make me think this is part of the problem. |
@cpuhrsch Sure. I can probably get you the trained model too if you can suggest a good way to privately send you a 1GB file. Maybe I can email you a link to a file on Dropbox or something? |
Hello @jazoom, If you're comfortable with this, you could post the link here. I'll then let you know once I got the data and you can invalidate the link to prevent additional traffic. Thanks, |
@cpuhrsch I was able to track down the problem to some sneaky carriage returns. Sorry for the confusion. |
Hello @jazoom, Thank you for resolving this! I'm happy to hear that this wasn't on our end. I'm going to close this issue now as it appears to be resolved, but please feel encouraged to reopen it if this isn't the case. Thanks, |
No problem @cpuhrsch. I should mention that I still see "1.00001" as a confidence value for some results. It doesn't make sense to me, but perhaps it's intentional? I built a fresh model with new data and it also gives some results above 1. |
Hello @jazoom, Thank you for noticing this. We'll need to have access to the data or you'll need to reproduce this for one of your test datasets in order for me to be able to investigate this. Ideally you'd also be able to reproduce this within a docker image so that we can be sure that we're in the same environment. Having said that, if the value is limited to 1.00001 there should be no need to worry. We are adding 1e-5 to std_log (which is used for prediction) in order to deal with very small values. Otherwise, please open a separate issue so that we can keep the topics clearly separated. Thanks, |
It's not bothering me since I just take it to mean essentially 100% confident. I just wasn't sure if your team knew about it. |
My text file exist 1537584 lines, but after the prediction process, there are only 1537490 results.
I've already check that my file didn't include any empty line. Can someone explain why?
Besides, is it possible to add document number into test file? So that even the number count mismatch, I still can mapping the results with raw test file.
The text was updated successfully, but these errors were encountered: