Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New error (sadly) #981

Closed
holgafreak opened this issue Sep 27, 2016 · 37 comments
Closed

New error (sadly) #981

holgafreak opened this issue Sep 27, 2016 · 37 comments

Comments

@holgafreak
Copy link

got further, but noiw this one:

In 1 module of nn.Sequential:
/home/xxx/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #3 to 'v' (cannot convert 'struct THCudaTensor *' to 'struct THFloatTensor *')
stack traceback:
[C]: in function 'v'
/home/xxx/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
...in/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:96: in function <...in/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:92>

yesterday didn't have this error either

-m

@soumith
Copy link
Member

soumith commented Sep 27, 2016

have you made sure that your model is typecasted to CUDA (or that your input is Float?)

@holgafreak
Copy link
Author

yes it is

@supakjk
Copy link

supakjk commented Sep 27, 2016

Same problem for ClassNLLCriterion after the update. (It worked fine until yesterday.)
luajit: /xxxxx/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #3 to 'v' (cannot convert 'struct THCudaTensor *' to 'struct THCudaLongTensor *')
stack traceback:
[C]: in function 'v'
/xxxxx/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'ClassNLLCriterion_updateOutput'
.../torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:41: in function 'updateOutput'
...torch/install/share/lua/5.1/nn/CrossEntropyCriterion.lua:13: in function 'forward'
xxxxx.lua:99: in main chunk
[C]: at 0x00405bb0

@soumith
Copy link
Member

soumith commented Sep 27, 2016

@supakjk update nn and cunn both.

@soumith
Copy link
Member

soumith commented Sep 27, 2016

@holgafreak can you give a small test case for this?

@holgafreak
Copy link
Author

@soumith after updating nn and cunn everything goes ok with my code.
But for some reason the labels go wild, and i'm getting this assertion cur_target > 0 and cur_target <= n_classes in v-function. (I'm just remembering something like this type of error, it's on the linux-thing). Labels on my code should just be 1 and 2, and they are not changed anywhere. Printing them the values are ints, but mostly >10000.
tried on os x and here's one from the demos train-on-cifar:
qlua: /Users/mjkoskin/torch-cl/install/share/lua/5.1/nn/THNN.lua:807: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /Users/mjkoskin/torch-cl/extra/nn/lib/THNN/generic/ClassNLLCriterion.c:31

@supakjk
Copy link

supakjk commented Sep 27, 2016

Even after updating nn and cunn, mine still doesn't work.

require 'nn'
torch.setdefaulttensortype('torch.FloatTensor')
crit = nn.CrossEntropyCriterion()
input = torch.Tensor():rand(5)
label = 3
crit:forward(input, label)

The CPU version works without problem but the following code (CUDA version) produces the error message I mentioned above.

require 'cunn'
torch.setdefaulttensortype('torch.FloatTensor')
crit = nn.CrossEntropyCriterion():cuda()
input = torch.CudaTensor():randn(5)
label = 3
crit:forward(input, label)

@1byxero
Copy link

1byxero commented Sep 30, 2016

I was trying this tutorial of torch and was trying to execute CNN for cifar10 given on the page and I encountered a similar error... Please help
`/home/cuda/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #3 to 'v' (cannot convert 'struct THCudaTensor *' to 'struct THCudaLongTensor *')

stack traceback:
[C]: in function 'v'
/home/cuda/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'ClassNLLCriterion_updateOutput'
...uda/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:41: in function 'forward'
...da/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'train'
[string "_RESULT={trainer:train(trainset)}"]:1: in main chunk
[C]: in function 'xpcall'
/home/cuda/torch/install/share/lua/5.1/trepl/init.lua:652: in function 'repl'
...cuda/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x00406670 `

@supakjk
Copy link

supakjk commented Sep 30, 2016

The problem seems because the target variable is not correctly set to be a CudaLongTensor.
As a temporary solution, I did something like the following. Hope there would be certain update or comments regarding this issue.
local theCrit = nn.CrossEntropyCriterion():cuda()
theCrit.nll.target = torch.CudaLongTensor{theCrit.nll.target[1]}

@GenTxt
Copy link

GenTxt commented Sep 30, 2016

Hi supakjk:

I have the same error and torch/lua is new to me. Could you explain where I add the code for your temporary solution?

Is it in the module(s) listed in the error or the lua script I'm trying to run, or both?

An example would be appreciated.

Thanks

Error:

/home/gentxt/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #3 to 'v' (cannot convert 'struct THCudaTensor *' to 'struct THCudaLongTensor *')
stack traceback:
[C]: in function 'v'
/home/gentxt/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'ClassNLLCriterion_updateOutput'
...txt/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:41: in function 'forward'
sample.lua:182: in function 'sample'
sample.lua:236: in main chunk
[C]: in function 'dofile'
...usr/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405e40

Torch/lua works fine with torch-rnn, https://github.com/karpathy/char-rnn etc. but doesn't work with the current code I'm trying to run. I've updated everything and the error continues.

@supakjk
Copy link

supakjk commented Sep 30, 2016

I mean, when you get an instance of the cuda version of ClassNLLCriterion, change the target field of that instance to be a CudaLongTensor (probably initially a CudaTensor)
I think it should be automatically done but the current source code missed that part.

@soumith
Copy link
Member

soumith commented Sep 30, 2016

it is automatically done in the source code: https://github.com/torch/nn/blob/master/ClassNLLCriterion.lua#L36

@soumith
Copy link
Member

soumith commented Sep 30, 2016

actually, i realized that i missed the non-batch case. I'm fixing it now.

@soumith
Copy link
Member

soumith commented Oct 1, 2016

this should be fixed now in master, and reinstalling the "nn" package will make it go away.

5ade793

luarocks install nn

@soumith soumith closed this as completed Oct 1, 2016
@dkbemisIII
Copy link

This still breaks for me, after updating. The code posted by @supakjk above gives the same error. torch.cudaLong is nil for me. Do you mean:

self.target = target.cudaLong and self.target:cudaLong() or self.target:cuda()

@supakjk
Copy link

supakjk commented Oct 2, 2016

It should be torch.CudaLong (not torch.cudaLong.)

@dkbemisIII
Copy link

Don't think so. target.cudaLong checks to make sure the conversion function is present for the tensor. Possibly 'torch.CudaLongTensor' could serve the same purpose, but indirectly. torch.CudaLong is still nil.

@supakjk
Copy link

supakjk commented Oct 2, 2016

My mistake. I mean torch.CudaLongTensor.
After updating all the related ones (nn,cunn,torch,cutorch,etc), my test code above worked fine.

@dkbemisIII
Copy link

That's a little surprising. There was a first fix that should have worked, but broke compatibility when there was no CudaLong. The next update seems to have rebroken the initial issue (at least for me), because of the typo.

If you update nn, it works for you?

@soumith
Copy link
Member

soumith commented Oct 2, 2016

there was my patch which was the attempted fix. then @mys007 sent a fix for back-compat, but it was broken. i then pushed another fix on top of his fix that keeps back-compat and works for master too.

@soumith
Copy link
Member

soumith commented Oct 2, 2016

if you now update nn, it should (fingers-crossed) work for you

@dkbemisIII
Copy link

Seems good now. Thanks.

@uahsan3
Copy link

uahsan3 commented Oct 4, 2016

When I rebuild cunn, I get the following:

[ 14%] Building NVCC (Device) object lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_SpatialDilatedConvolution.cu.o
Building NVCC (Device) object lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_AbsCriterion.cu.o
/tmp/luarocks_cunn-scm-1-258/cunn/lib/THCUNN/RReLU.cu(68): error: identifier "THCRandom_generatorStates" is undefined

1 error detected in the compilation of "/tmp/tmpxft_00003e0f_00000000-7_RReLU.cpp1.ii".
CMake Error at THCUNN_generated_RReLU.cu.o.cmake:267 (message):
Error generating file
/tmp/luarocks_cunn-scm-1-258/cunn/build/lib/THCUNN/CMakeFiles/THCUNN.dir//./THCUNN_generated_RReLU.cu.o

make[2]: *** [lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_RReLU.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [lib/THCUNN/CMakeFiles/THCUNN.dir/all] Error 2
make: *** [all] Error 2

Error: Build error: Failed building.

Any suggestions?

@soumith
Copy link
Member

soumith commented Oct 4, 2016

@uahsan3 this comes because of an outdated cutorch version.
luarocks install cutorch
luarocks install cunn

@yuvalpinter
Copy link

@soumith thanks!

@eriche2016
Copy link
Contributor

Hi, I use nn.LookupTable module , but came accross the similiar error above, can anyone fix this?

@eriche2016
Copy link
Contributor

eriche2016 commented Oct 20, 2016

my erorr message is below:
ome/xxx/torch/install/bin/luajit: /home/xxx/.luarocks/share/lua/5.1/nn/THNN.lua:109: bad argument #2 to 'v' (cannot convert 'struct THCudaTensor *' to 'struct THCudaLongTensor *')
stack traceback:
[C]: in function 'v'
/home/xxx/.luarocks/share/lua/5.1/nn/THNN.lua:109: in function 'LookupTable_accGradParameters'
/home/xxx/.luarocks/share/lua/5.1/nn/LookupTable.lua:85: in function 'accGradParameters'
/home/xxx/.luarocks/share/lua/5.1/nn/Module.lua:32: in function 'backward'
./misc_saver2_reg_atten_ws/LanguageModel.lua:738: in function 'updateGradInput'
/home/xxx/.luarocks/share/lua/5.1/nn/Module.lua:31: in function 'backward'
train_reg_on_att.lua:496: in function 'lossFun'
train_reg_on_att.lua:574: in main chunk
[C]: in function 'dofile'
.../hxw/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

@eriche2016
Copy link
Contributor

an i have update my torch, nn, cutorch, cunn to the latest versions. any idea?

@fmassa
Copy link
Contributor

fmassa commented Oct 20, 2016

@eriche2016 pass a CudaLongTensor instead of a CudaTensor, and you should be fine.

@eriche2016
Copy link
Contributor

eriche2016 commented Oct 20, 2016

@fmassa Still, i got the same error when doing backward pass. Below is the test code, check it.

th> model = nn.LookupTable(4, 3)
                                                                      [0.0001s]
th> model:cuda()
nn.LookupTable
                                                                      [0.0031s]
th> model:forward(torch.CudaLongTensor({1}))
 0.3487  1.1548 -0.7722
[torch.CudaTensor of size 1x3]

                                                                      [0.0016s]
th> model:backward(torch.CudaLongTensor{1}, torch.CudaTensor(1, 3))
/home/xxx/.luarocks/share/lua/5.1/nn/THNN.lua:109: bad argument #2 to 'v' (cannot convert 'struct THCudaTensor *' to 'struct THCudaLongTensor *')
stack traceback:
        [C]: in function 'v'
        /home/xxx/.luarocks/share/lua/5.1/nn/THNN.lua:109: in function 'LookupTable_accGradParameters'
        /home/xxx/.luarocks/share/lua/5.1/nn/LookupTable.lua:85: in function 'accGradParameters'
        /home/xxx/.luarocks/share/lua/5.1/nn/Module.lua:32: in function 'backward'
        [string "_RESULT={model:backward(torch.CudaLongTensor{..."]:1: in main chunk
        [C]: in function 'xpcall'
        /home/xxx/torch/install/share/lua/5.1/trepl/init.lua:650: in function 'repl'
        .../xxx/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
        [C]: at 0x00406670

@fmassa
Copy link
Contributor

fmassa commented Oct 20, 2016

@eriche2016 It seems that you don't have the latest nn. Line 85 in LookupTable.lua does not match yours.

@eriche2016
Copy link
Contributor

eriche2016 commented Oct 20, 2016

@fmassa I update my nn with command below:
luarocks install nn
I suppose it is the latest version of nn.

@fmassa
Copy link
Contributor

fmassa commented Oct 20, 2016

@eriche2016 Then I don't understand your error message. The distro package points latest nn to this line, which doesn't correspond to the error you are facing.
I'd check if the installation was successful, or if you had errors during compilation.

@eriche2016
Copy link
Contributor

eriche2016 commented Oct 20, 2016

I open my LookupTable.lua file, and it is the latest. see blow.
image
got any idea to solve this problem?

@fmassa
Copy link
Contributor

fmassa commented Oct 20, 2016

@eriche2016 there seems to be something wrong with your setup. The error message that you show corresponds to a line of comment.

@eriche2016
Copy link
Contributor

Oh, i got it, the error its sit on the file in:
/home/xxx/.luarocks/share/lua/5.1/nn/LookupTable.lua:85
not in /home/xxx/torch/ folder, which contains the latest nn. I will have to fix this

@eriche2016
Copy link
Contributor

@fmassa thank u very much for your patience. problem solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants