Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Error when the input text is "Data" #3

Closed
karrtikiyer opened this issue Sep 28, 2022 · 13 comments
Closed

Bug: Error when the input text is "Data" #3

karrtikiyer opened this issue Sep 28, 2022 · 13 comments
Labels
bug Something isn't working triage

Comments

@karrtikiyer
Copy link

MRE:
fastText::language_identification("data", pre_trained_language_model_path = file.path(raw_data_dir, "lid.176.bin"), k=1, th=0, threads = 1, verbose = F)
Error Recieved: Error in setnames(x, value) : Can't assign 2 names to a 1 column data.table In addition: Warning message: In data.table::fread(pth_res_out, header = F, stringsAsFactors = F, : File '/var/folders/zt/z0d17y_n05qgmbgt2v0f3svh0000gn/T//RtmpQLdT6Y/file12ec86a1c5acc.txt' has size 0. Returning a NULL data.table.

When the input text is data, the library errors out, the error is listed above.

R Version : 4.2.1 on Mac OS Monterey 12.6

Note: I have downloaded the pre-trained versions of this model from the link which you have provided in your documentation.

@github-actions github-actions bot added bug Something isn't working triage labels Sep 28, 2022
@mlampros
Copy link
Owner

@karrtikiyer
you have to include a reproducible example, by mentioning which data and which code you use (not only the error message)

@karrtikiyer
Copy link
Author

@mlampros : The example is included; See the text above, it is:
fastText::language_identification("data", pre_trained_language_model_path = file.path(raw_data_dir, "lid.176.bin"), k=1, th=0, threads = 1, verbose = F)

@mlampros
Copy link
Owner

mlampros commented Sep 28, 2022

when I run the example of the documentation with 'data' as input I receive

require(fastText)
file_pretrained = system.file("language_identification/lid.176.ftz", package = "fastText")

dtbl_out = language_identification(input_obj = 'data',
                                   pre_trained_language_model_path = file_pretrained,
                                   k = 3,
                                   th = 0.0,
                                   verbose = TRUE)
The 'fasttext' algorithm starts ...
Conversion of the predicted labels and probabilities for k = 3 and threads = 1 ... 
The predicted labels will be loaded from the temporary file ...
The temporary files will be removed ...
Elapsed time: 0 hours and 0 minutes and 0 seconds. 

dtbl_out
   iso_lang_1    prob_1 iso_lang_2    prob_2 iso_lang_3    prob_3
1:         en 0.0781234         zh 0.0371557         ta 0.0339562

Then if I include a pre-trained model as described in official documentation webpage then I receive the following,

bin_file = tempfile(fileext = '.bin')
download.file(url = 'https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin', destfile = bin_file, method = 'curl')

file.exists(bin_file)
[1] TRUE

object.size(bin_file)
152 bytes

file.size(bin_file)
[1] 131266198

fastText::language_identification("data", pre_trained_language_model_path = bin_file, k=1, th=0, threads = 1, verbose = T)
The 'fasttext' algorithm starts ...
The predicted labels will be loaded from the temporary file ...
The temporary files will be removed ...
Elapsed time: 0 hours and 0 minutes and 0 seconds. 
   iso_lang_1    prob_1
1:         en 0.0539288

It seems to me that your downloaded pre-trained .bin model is corrupt.

@karrtikiyer
Copy link
Author

I just downloaded again this file and I still get the same error.
Can it be so that I am facing this issue since I am using Mac M1 Max?

@karrtikiyer
Copy link
Author

And the fact that it only fails when the input text is data, if modify the input to data1 or anything else, it works fine.

@mlampros
Copy link
Owner

I can not tell you for sure. The 'fastText' R package is tested in the following flavors on CRAN

@karrtikiyer
Copy link
Author

The M1 Max will basically use the ARM version of R & RStudio.

@mlampros
Copy link
Owner

@karrtikiyer
I'm sorry but I'm not in place to test on any available Mac OSx version and type. The only thing I can do is to add a test case in the 'fastText' R package with your mentioned example (that raises an error in your computer) and the expected output and see if this case passes the CRAN tests. It seems that Github actions do not support currently Mac M1

@mlampros
Copy link
Owner

mlampros commented Oct 8, 2022

I added the test case that I mentioned and submitted the new version to CRAN. I'll take a look in the next days if any error related to the Mac OSx version pops up. The current version passes the tests of CRAN for Windows and Linux

@karrtikiyer
Copy link
Author

karrtikiyer commented Oct 8, 2022 via email

@karrtikiyer
Copy link
Author

karrtikiyer commented Oct 11, 2022 via email

@mlampros
Copy link
Owner

mlampros commented Oct 11, 2022

from what I see the updated version (1.0.3) did not show any errors on the Mac OSx flavors of CRAN. I'm not sure if your Mac OSx version is tested, moreover you use as an input the .bin file whereas the fastText R package for all the tests and examples uses the .ftz file

file_pretrained = system.file("language_identification/lid.176.ftz", package = "fastText")

and this because the .ftz is smaller in size than the .bin file. Do you receive the error also with the .ftz file?

@mlampros
Copy link
Owner

I'll close the issue for now. Feel free to re-open it in case the code does not work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants