Bug: Error when the input text is "Data" #3

karrtikiyer · 2022-09-28T06:35:14Z

MRE:
fastText::language_identification("data", pre_trained_language_model_path = file.path(raw_data_dir, "lid.176.bin"), k=1, th=0, threads = 1, verbose = F)
Error Recieved: Error in setnames(x, value) : Can't assign 2 names to a 1 column data.table In addition: Warning message: In data.table::fread(pth_res_out, header = F, stringsAsFactors = F, : File '/var/folders/zt/z0d17y_n05qgmbgt2v0f3svh0000gn/T//RtmpQLdT6Y/file12ec86a1c5acc.txt' has size 0. Returning a NULL data.table.

When the input text is data, the library errors out, the error is listed above.

R Version : 4.2.1 on Mac OS Monterey 12.6

Note: I have downloaded the pre-trained versions of this model from the link which you have provided in your documentation.

The text was updated successfully, but these errors were encountered:

mlampros · 2022-09-28T07:02:04Z

@karrtikiyer
you have to include a reproducible example, by mentioning which data and which code you use (not only the error message)

karrtikiyer · 2022-09-28T07:21:52Z

@mlampros : The example is included; See the text above, it is:
fastText::language_identification("data", pre_trained_language_model_path = file.path(raw_data_dir, "lid.176.bin"), k=1, th=0, threads = 1, verbose = F)

mlampros · 2022-09-28T11:49:37Z

when I run the example of the documentation with 'data' as input I receive

require(fastText)
file_pretrained = system.file("language_identification/lid.176.ftz", package = "fastText")

dtbl_out = language_identification(input_obj = 'data',
                                   pre_trained_language_model_path = file_pretrained,
                                   k = 3,
                                   th = 0.0,
                                   verbose = TRUE)
The 'fasttext' algorithm starts ...
Conversion of the predicted labels and probabilities for k = 3 and threads = 1 ... 
The predicted labels will be loaded from the temporary file ...
The temporary files will be removed ...
Elapsed time: 0 hours and 0 minutes and 0 seconds. 

dtbl_out
   iso_lang_1    prob_1 iso_lang_2    prob_2 iso_lang_3    prob_3
1:         en 0.0781234         zh 0.0371557         ta 0.0339562

Then if I include a pre-trained model as described in official documentation webpage then I receive the following,

bin_file = tempfile(fileext = '.bin')
download.file(url = 'https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin', destfile = bin_file, method = 'curl')

file.exists(bin_file)
[1] TRUE

object.size(bin_file)
152 bytes

file.size(bin_file)
[1] 131266198

fastText::language_identification("data", pre_trained_language_model_path = bin_file, k=1, th=0, threads = 1, verbose = T)
The 'fasttext' algorithm starts ...
The predicted labels will be loaded from the temporary file ...
The temporary files will be removed ...
Elapsed time: 0 hours and 0 minutes and 0 seconds. 
   iso_lang_1    prob_1
1:         en 0.0539288

It seems to me that your downloaded pre-trained .bin model is corrupt.

karrtikiyer · 2022-09-28T12:20:52Z

I just downloaded again this file and I still get the same error.
Can it be so that I am facing this issue since I am using Mac M1 Max?

karrtikiyer · 2022-09-28T12:26:24Z

And the fact that it only fails when the input text is data, if modify the input to data1 or anything else, it works fine.

mlampros · 2022-09-28T13:42:51Z

I can not tell you for sure. The 'fastText' R package is tested in the following flavors on CRAN

karrtikiyer · 2022-09-29T04:32:07Z

The M1 Max will basically use the ARM version of R & RStudio.

mlampros · 2022-09-29T06:13:14Z

@karrtikiyer
I'm sorry but I'm not in place to test on any available Mac OSx version and type. The only thing I can do is to add a test case in the 'fastText' R package with your mentioned example (that raises an error in your computer) and the expected output and see if this case passes the CRAN tests. It seems that Github actions do not support currently Mac M1

mlampros · 2022-10-08T07:54:27Z

I added the test case that I mentioned and submitted the new version to CRAN. I'll take a look in the next days if any error related to the Mac OSx version pops up. The current version passes the tests of CRAN for Windows and Linux

karrtikiyer · 2022-10-08T08:24:09Z

Thanks!

On Sat, 8 Oct 2022 at 13:24, Lampros Mouselimis ***@***.***> wrote: I added the test case <https://github.com/mlampros/fastText/blob/master/tests/testthat/test-fasttext.R#L411-L420> that I mentioned and submitted the new version to CRAN <https://cran.r-project.org/web/checks/check_results_fastText.html>. I'll take a look in the next days if any error related to the Mac OSx version pop ups — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABBMHMHOZUSPMJEFRHOTH4LWCESD3ANCNFSM6AAAAAAQXOESDA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Thanks and regards, Karrtik

karrtikiyer · 2022-10-11T09:06:00Z

It is basically an ARM Package. Not sure what the problem is. Thanks and regards, Karrtik

…

On Wed, 28 Sept 2022 at 19:13, Lampros Mouselimis ***@***.***> wrote: I can not tell you for sure. The 'fastText' R package is tested in the following flavors <https://cran.r-project.org/web/checks/check_flavors.html#r-release-macos-arm64> on CRAN — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABBMHMAGGFWANBX5KB6IOF3WARDOPANCNFSM6AAAAAAQXOESDA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mlampros · 2022-10-11T16:05:48Z

from what I see the updated version (1.0.3) did not show any errors on the Mac OSx flavors of CRAN. I'm not sure if your Mac OSx version is tested, moreover you use as an input the .bin file whereas the fastText R package for all the tests and examples uses the .ftz file

file_pretrained = system.file("language_identification/lid.176.ftz", package = "fastText")

and this because the .ftz is smaller in size than the .bin file. Do you receive the error also with the .ftz file?

mlampros · 2022-10-13T07:03:07Z

I'll close the issue for now. Feel free to re-open it in case the code does not work as expected.

github-actions bot added bug Something isn't working triage labels Sep 28, 2022

mlampros closed this as completed Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Error when the input text is "Data" #3

Bug: Error when the input text is "Data" #3

karrtikiyer commented Sep 28, 2022

mlampros commented Sep 28, 2022

karrtikiyer commented Sep 28, 2022

mlampros commented Sep 28, 2022 •

edited

Loading

karrtikiyer commented Sep 28, 2022

karrtikiyer commented Sep 28, 2022

mlampros commented Sep 28, 2022

karrtikiyer commented Sep 29, 2022

mlampros commented Sep 29, 2022

mlampros commented Oct 8, 2022 •

edited

Loading

karrtikiyer commented Oct 8, 2022 via email

karrtikiyer commented Oct 11, 2022 via email

mlampros commented Oct 11, 2022 •

edited

Loading

mlampros commented Oct 13, 2022

Bug: Error when the input text is "Data" #3

Bug: Error when the input text is "Data" #3

Comments

karrtikiyer commented Sep 28, 2022

mlampros commented Sep 28, 2022

karrtikiyer commented Sep 28, 2022

mlampros commented Sep 28, 2022 • edited Loading

karrtikiyer commented Sep 28, 2022

karrtikiyer commented Sep 28, 2022

mlampros commented Sep 28, 2022

karrtikiyer commented Sep 29, 2022

mlampros commented Sep 29, 2022

mlampros commented Oct 8, 2022 • edited Loading

karrtikiyer commented Oct 8, 2022 via email

karrtikiyer commented Oct 11, 2022 via email

mlampros commented Oct 11, 2022 • edited Loading

mlampros commented Oct 13, 2022

mlampros commented Sep 28, 2022 •

edited

Loading

mlampros commented Oct 8, 2022 •

edited

Loading

mlampros commented Oct 11, 2022 •

edited

Loading