Same code works in VS, but not from command prompt?

Hey all,

I have been trying to debug a code for hours now, but it doesn’t work when I compile it with nvcc in command line. However, it works fine when I copy it in Visual studio and compile it there.

I prefer the command line version, because VS IntelliSense “red line” every CUDA functions.

Maybe the VS compile it with a special flag?

I’ve found that VS 2010 adds some commands that aren’t shown in the logs
I use the below for compiling from command prompt, try make some changes to this and see if it works

“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe” -gencode=arch=compute_35,code=“sm_35,compute_35” --cl-version 2010 -ccbin “c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64” -link -I"./" -I"./common/inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -maxrregcount=0 --ptxas-options="-v" --machine 64 --compile -use_fast_math -DWIN32 -D_MBCS -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT”

I’ve tried it, it still doesn’t work.

Here it is, if it helps: https://www.dropbox.com/s/4t8ieiz3i15j8nh/multiply.cu

Basically if ARRAY_SIZE is bigger than 497, the results (hOutInf and hOutSup arrays) are always 0;

This is the output I got using that command line. The first few numbers have blown up. Not sure what to make of the rest
[-0.117611, -59213151128224810000000000000000000000.000000] [0.006591, -39460142838842854000000000000000000000.000000]
[-0.286781, -159536870400163220000000000000000000000.000000] [-0.427330, -161305202422662390000000000000000000000.000000]
[-0.295433, -0.295433] [-0.710607, -0.240646]
[0.042503, -240600651830786840000000000000000000000.000000] [-0.689901, -175416415091873390000000000000000000000.000000]
[-0.198211, -0.090044] [-0.179842, -0.179842]
[-0.663990, -0.663990] [-0.344577, -113648882086357490000000000000000000000.000000]
[-0.098258, -0.098258] [0.005980, 0.023304]
[-0.348158, 0.136985] [-0.149098, -0.149098]
[-0.504779, -0.504779] [-0.552724, -0.552724]
[0.115875, 0.133297] [-0.105139, -0.078181]
[-0.001352, 0.118521] [-0.456914, -0.200854]
[-0.271929, -0.271929] [-0.292518, -0.292518]
[-0.175330, -0.006415] [-0.346012, -0.346012]
[-0.697323, -242631367245123660000000000000000000000.000000] [-0.638551, -55034635019511705000000000000000000000.000000]
[-0.239262, -0.239262] [-0.415171, -0.409911]
[-0.335525, -0.234598] [-0.692552, -0.692552]
[-0.223697, 0.117905] [-0.617604, -0.617604]
[0.021134, 0.021134] [-0.541145, -0.251920]
[0.078502, 0.078502] [-0.072214, 0.034341]
[-0.120020, -0.120020] [-0.243415, -0.243415]
[-0.034427, -0.021171] [-0.246313, -0.246313]

etc etc.

Try running the code with cuda-memcheck, and also wrap your kernel and CUDA calls with error checkers to see if any are failing – like so: http://choorucode.com/2011/03/02/cuda-error-checking/

Your earlier mention that it doesn’t work when you increase array size suggests you have a problem with how you’re defining your kernel’s gridsize and blocksize arguments. For that last run you posted, it suggests you’re indexing into something bogus that’s giving you the wrong output.