cuda.h error message

montmorency · October 21, 2009, 9:54am

I am porting a c++ project to the GPU to get some speed up in a few bottle necks. I have followed the paradigm set out by nvidia of compiling all the .cpp with the host compilers and then calling the kernel from one of these .cpp files through a wrapper written in .cu which cudaMallocs, cudaMemcpys, and then launches a kernel. The first problem is when run in emulation mode the kernel and all the software runs fine. When I run in device mode the program gets to my first cudaMalloc call and just hangs. I read that the .cpp file which calls the cuda wrapper needs the cuda.h file. When I include this I get :

In file included from /home/lamberh/NVIDIA_GPU_Computing_SDK/C/src/exciton09/source/ATOM_SCF1.cpp:26:
/usr/local/cuda/include/cuda.h:547: error: expected â€˜,â€™ or â€˜…â€™ before â€˜(â€™ token
In file included from /home/lamberh/NVIDIA_GPU_Computing_SDK/C/src/exciton09/source/ATOM_SCF1.cpp:26:
/usr/local/cuda/include/cuda.h:697: error: expected â€˜,â€™ or â€˜…â€™ before â€˜(â€™ token

which, in the cuda.h, corresponds to;
cuFuncGetAttribute
CUresult CUDAAPI cuDeviceGetAttribute(int *pi, CUdevice_attribute attrib, CUdevice dev);

I’m running 2.3 on fedora 11 with gcc 4.4 and the FindCUDA.cmake script.
I’ve attached the .cu which is defines the cuda wrapper and is called by my .cpp file.

Thanks
atoms.cu (1.5 KB)

montmorency · October 21, 2009, 10:06am

To save a bit of time trying to groc that, the kernel is meant to be very basic. A number is defined on the host passed to the GPU, then initialized to a different variable and passed back to the host.

avidday · October 21, 2009, 10:23am

There are at least two glaring syntax errors in that file I can see which will prevent it from compiling. When I fix them, it builds fine, which makes be think that the error is somewhere in one of your own include files. Certainly there are no errors in cuda.h, if that is what you are implying.

avid@cuda:~$ /opt/cuda/bin/nvcc -arch=sm_13 -c -I/opt/cuda/include -I$HOME/NVIDIA_GPU_Computing_SDK/C/common/inc -o atoms2.cu.o atoms2.cu

avid@cuda:~$ cat atoms2.cu

#include <stdio.h>

#include <cutil.h>

#include "cuda_runtime_api.h"

#include "cuda.h"

__global__ void integrals_2e_kernel(double* d_number, double* new_number){

*new_number = *d_number;

//printf("new number %f ",new_number);

}

void kernel_call(){

		int deviceCount;

		cudaGetDeviceCount(&deviceCount);

		int dev;

		printf("There are %d devices supporting CUDA", deviceCount);

		for(dev = 0; dev < deviceCount; dev++){

		cudaDeviceProp deviceProp;

		cudaGetDeviceProperties(&deviceProp, dev);

		printf("\nDevice %d: \"%s\"\n", dev, deviceProp.name);

		}

		cudaSetDevice(0);

		double number = 4.0;

		size_t size = 1 * int(sizeof(double));

		double* h_number;

		double* d_number;

		double* new_number;

		h_number = &number;

		dim3 dimBlock(1);

		dim3 dimGrid(1);

		CUDA_SAFE_CALL(cudaMalloc((void **)&d_number, size));

		CUDA_SAFE_CALL(cudaMalloc((void **)&new_number, size));

		CUDA_SAFE_CALL(cudaMemcpy(d_number, h_number, size, cudaMemcpyHostToDevice));

		integrals_2e_kernel<<<dimBlock,dimGrid>>>(d_number, new_number);

		printf("Executing GPU kernel...\n");

		CUDA_SAFE_CALL( cudaThreadSynchronize() );

		CUDA_SAFE_CALL(cudaMemcpy(h_number, new_number, size, cudaMemcpyDeviceToHost));

		CUDA_SAFE_CALL(cudaFree(d_number));

		CUDA_SAFE_CALL(cudaFree(new_number));

		printf("new number %f\n", *h_number);

}

montmorency · October 21, 2009, 11:46am

Sorry, I didn’t mean to imply that there was an error in the cuda.h file.

atoms.cu (1.75 KB)

I have attached a version of the atoms.cu which I compiled and succesfully ran as a stand alone file in device mode using the CUDA SDK makefile. It is when I try and integrate this .cu file with the rest of my c++ source that I have run into trouble. If I run in device emulation mode, calling atoms.cu from a separate .cpp file, everything goes fine. But when I run in device mode the program executes until it reaches the first cudaMalloc in atoms.cu and then hangs indefinitely with CPU spinning at 100%. Also when I try and include cuda.h in the .cpp file in either mode I get the aforementioned error. I’m trying to understand why it runs in emulation mode but hangs at the first cudaMalloc call in device mode.

just as a little more information I wondered if the problem could stem from the linking stage of my build phase:

/usr/lib64/ccache/g++ CMakeFiles/exciton09.dir/source/SCF1.cpp.o CMakeFiles/exciton09.dir/source/INTEGRALS1.cpp.o CMakeFiles/exciton09.dir/source/KPOINTS1.cpp.o CMakeFiles/exciton09.dir/source/INPUT_JOB_CTRL.cpp.o CMakeFiles/exciton09.dir/source/MATRIX_UTIL.cpp.o CMakeFiles/exciton09.dir/source/PAIRS_QUADS.cpp.o CMakeFiles/exciton09.dir/source/TOOLS.cpp.o CMakeFiles/exciton09.dir/source/HEADER.cpp.o CMakeFiles/exciton09.dir/source/MAIN1.cpp.o CMakeFiles/exciton09.dir/source/SYMMETRY.cpp.o CMakeFiles/exciton09.dir/source/MEMORY.cpp.o CMakeFiles/exciton09.dir/source/ATOM_SCF1.cpp.o ./exciton09_generated_ATOM_SCF_CUDA.cu.o -o bin/exciton09 -rdynamic /usr/local/cuda/lib64/libcudart.so -lcuda lib/libalgebra.so /usr/lib64/libblas.so.3 /usr/lib64/liblapack.so.3 -lpthread /usr/local/cuda/lib64/libcudart.so -lcuda -Wl,-rpath,/usr/local/cuda/lib64:/home/lamberh/NVIDIA_GPU_Computing_SDK/C/src/exciton09/build/lib:

Does the fact that g++ isn’t getting passed -fPIC seem very wrong and possibly responsible for the cudaMalloc hanging?

I really appreciate the help.

avidday · October 21, 2009, 12:14pm

You are describing two problems - under some circumstances, you code doesn’t compile, and under others the resulting program hangs. With Cuda 2.3 on Ubuntu 9.04 I can reproduce neither. As best as I can tell, my “fixed” version of the code you originally posted compiles and executes perfectly when linked with a C++ main in another file:

avid@cuda:~$ /opt/cuda/bin/nvcc -arch=sm_13 -c -I/opt/cuda/include -I$HOME/NVIDIA_GPU_Computing_SDK/C/common/inc -o atoms2.cu.o atoms2.cu 

avid@cuda:~$ g++ -L/opt/cuda/lib64 -L$HOME/NVIDIA_GPU_Computing_SDK/C/lib -o atoms2.exe main.cc atoms2.cu.o -lcutil -lcudart

avid@cuda:~$ LD_LIBRARY_PATH=/opt/cuda/lib64 ./atoms2.exe

There are 2 devices supporting CUDA

Device 0: "GeForce GTX 275"

Device 1: "GeForce GTX 275"

Executing GPU kernel...

new number 4.000000

avid@cuda:~$ cat main.cc 

extern "C" {

	void kernel_call();

};

int main()

{

	kernel_call();

	return 0;

}

And your second posted code also builds and runs perfectly as you noted:

avid@cuda:~$ /opt/cuda/bin/nvcc -arch=sm_13 -I/opt/cuda/include -I$HOME/NVIDIA_GPU_Computing_SDK/C/common/inc -L/opt/cuda/lib64 -L$HOME/NVIDIA_GPU_Computing_SDK/C/lib -o atoms.exe atoms-1.cu -lcutil -lcudart

avid@cuda:~$ LD_LIBRARY_PATH=/opt/cuda/lib64 ./atoms.exe

There are 2 devices supporting CUDA

Device 0: "GeForce GTX 275"

Device 1: "GeForce GTX 275"

Executing GPU kernel...

new number 5.000000

avid@cuda:~$ uname -a

Linux cuda 2.6.28-15-generic #52-Ubuntu SMP Wed Sep 9 10:48:52 UTC 2009 x86_64 GNU/Linux

montmorency · October 21, 2009, 1:31pm

Right. My original posts were a little ambiguous. I should identify my major problem as this:

My .cu code runs fine as a stand alone, and when integrated with my .cpp files it runs fine in device emulation mode as we’ve noted. It is when I try to run in device mode that it hangs at the first cudaMalloc call. I am unsure of where to begin debugging this hanging call to cudaMalloc. My suspicion is that it there is either a problem with the code itself, the way I’m linking my files at compile time or this is related to unsupported gcc 4.4.2/Fedora 11 and I’ll need either a work around or to step back to F10.

Are those fair conclusions or is this asking a bit much to get people without access to my setup to speculate?

Thanks

avidday · October 21, 2009, 1:36pm

Try building an executable using the two source files and commands I posted and see whether it works as a first step. If it does, then it is either your code, or your build procedure, and you can eliminate your tool chain as a variable.

montmorency · October 21, 2009, 6:09pm

Great. I followed your advice to eliminate the tool chain as a variable by successfully compiling and linking that small atoms.cu and main program. Then I went through and double checked my include files so that the cuda wrapper function was defined as extern C… finally I brought the kernel_call out of one of the .cpp sub rountine I had it in and put it front and center in the main file. This allowed me to run and compile sucessfully. Then I moved this cuda wrapper down in the code in the main file and I got the same hanging cudaMalloc problem at device run time. After some tinkering I noticed that if I called my kernel before this statement which opens a file defined on the command line for data ouput: file.out = fopen(strcat(argv[2], yy), “w”) everything ran fine. But if I called the kernel after that fopen that’s when cudaMalloc hanged.

I stripped out the string.cat
file.out = fopen(argv[2], “w”);

and now I can call that wrapper/kernel invocation from anywhere in my code. I was wondering what might cause that behaviour?

Anyways, thanks very much for the help I was stuck there.

avidday · October 21, 2009, 7:53pm

No surprises there. strcat will blindly append onto the end of the argv. Who knows how much space has been reserved in argv and what lies after it in the process memory map (even if there is theoretically space, if there are not trailing \0 characters it can easily run away). That code snippet is the very definition of a buffer overflow. No doubt something critical to the CUDA context is getting hosed by the strcat.

montmorency · October 22, 2009, 11:22am

Thanks you guys are on the ball.

Topic		Replies	Views
Porting to the GPU Any Easy way to Port Code CUDA Programming and Performance	18	17816	October 20, 2009
Calling CUDA functions from a C file CUDA Programming and Performance	19	29093	March 4, 2015
CUDA compile trouble CUDA Programming and Performance	47	5111	November 8, 2010
CUDA Error: Invalid Device Function Debugging CUDA errors CUDA Programming and Performance	3	5762	July 29, 2009
Issue with cudaMemcpyToSymbol and Separable Compilation. CUDA Programming and Performance	10	1251	February 28, 2019
including cuda_16fp.h breaks Visual Studio 2015 compilation CUDA Setup and Installation	5	1594	September 15, 2017
cuda-gdb hangs CUDA-GDB	12	8399	May 23, 2014
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13283	July 9, 2008
CUDA 2.1 Beta Problem/Bugs (Linux) CUDA Programming and Performance	5	1646	January 6, 2009
gcc 4.4 support anytime soon? CUDA Programming and Performance	24	108098	April 9, 2010

cuda.h error message

Related topics