undefined reference to `cudaSetupArgument', `cudaLaunch'

phillippang1994 · November 7, 2019, 6:19am

Hello,

I am compiling some code that runs saxpy on GPU. This code is some university assignment that I am trying out. Unfortunately, compilation fails and I don’t know how to debug.

Here’s the link to code and Makefile:https://github.com/stanford-cs149/asst3/tree/master/saxpy

I have edited the saxpy.cu file:

void saxpyCuda(int N, float alpha, float* xarray, float* yarray, float* resultarray) {

    // must read both input arrays (xarray and yarray) and write to
    // output array (resultarray)
    int totalBytes = sizeof(float) * 3 * N;

    // compute number of blocks and threads per block.  In this
    // application we've hardcoded thread blocks to contain 512 CUDA
    // threads.
    const int threadsPerBlock = 512;

    // Notice the round up here.  The code needs to compute the number
    // of threads blocks needed such that there is one thread per
    // element of the arrays.  This code is written to work for values
    // of N that are not multiples of threadPerBlock.
    const int blocks = (N + threadsPerBlock - 1) / threadsPerBlock;

    // These are pointers that will be pointers to memory allocated
    // *one the GPU*.  You should allocate these pointers via
    // cudaMalloc.  You can access the resulting buffers from CUDA
    // device kernel code (see the kernel function saxpy_kernel()
    // above) but you cannot access the contents these buffers from
    // this thread. CPU threads cannot issue loads and stores from GPU
    // memory!
    float* device_x;
    float* device_y;
    float* device_result;

    //
    // CS149 TODO: allocate device memory buffers on the GPU using cudaMalloc.
    //
    // We highly recommend taking a look at NVIDIA's
    // tutorial, which clearly walks you through the few lines of code
    // you need to write for this part of the assignment:
    //
    // https://devblogs.nvidia.com/easy-introduction-cuda-c-and-c/
    //
    cudaMalloc(&device_x, N*sizeof(float));
    cudaMalloc(&device_y, N*sizeof(float));
    cudaMalloc(&device_result, N*sizeof(float));
        
    // start timing after allocation of device memory
    double startTime = CycleTimer::currentSeconds();

    //
    // CS149 TODO: copy input arrays to the GPU using cudaMemcpy
    //
    cudaMemcpy(device_x, xarray, N*sizeof(float), cudaMemcpyHostToDevice); 
    cudaMemcpy(device_y, yarray, N*sizeof(float), cudaMemcpyHostToDevice);

    // run CUDA kernel. (notice the <<< >>> brackets indicating a CUDA
    // kernel launch) Execution on the GPU occurs here.
    double startRunTime = CycleTimer::currentSeconds();
    saxpy_kernel<<<blocks, threadsPerBlock>>>(N, alpha, device_x, device_y, device_result);
    cudaDeviceSynchronize();
    double endRunTime = CycleTimer::currentSeconds();
    printf("RunTime: %.3f ms\n", 1000.0f*(endRunTime - startRunTime));
    //
    // CS149 TODO: copy result from GPU back to CPU using cudaMemcpy
    //
    cudaMemcpy(resultarray, device_result, N*sizeof(float), cudaMemcpyDeviceToHost); 
    
    // end timing after result has been copied back into host memory
    double endTime = CycleTimer::currentSeconds();

    cudaError_t errCode = cudaPeekAtLastError();
    if (errCode != cudaSuccess) {
        fprintf(stderr, "WARNING: A CUDA error occured: code=%d, %s\n",
		errCode, cudaGetErrorString(errCode));
    }

    double overallDuration = endTime - startTime;
    printf("Effective BW by CUDA saxpy: %.3f ms\t\t[%.3f GB/s]\n", 1000.f * overallDuration, GBPerSec(totalBytes, overallDuration));

    //
    // CS149 TODO: free memory buffers on the GPU using cudaFree
    //
    cudaFree(device_x);
    cudaFree(device_y);
    cudaFree(device_result);
}

The compilation error output:

mkdir -p objs/
g++ -m64 -O3 -Wall -o cudaSaxpy objs/main.o  objs/saxpy.o -L/usr/local/cuda/lib64/ -lcudart
objs/saxpy.o: In function `saxpy_kernel(int, float, float*, float*, float*)':
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x4a): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x80): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x98): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0xb0): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0xc8): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0xd6): undefined reference to `cudaLaunch'
objs/saxpy.o: In function `saxpyCuda(int, float, float*, float*, float*)':
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x276): undefined reference to `cudaConfigureCall'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x2ba): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x4c0): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x4dc): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x4f8): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x514): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x526): undefined reference to `cudaLaunch'
objs/saxpy.o: In function `__device_stub__Z12saxpy_kernelifPfS_S_(int, float, float*, float*, float*)':
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x6b9): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x6e0): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x6f8): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x710): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x726): undefined reference to `cudaSetupArgument'
tmpxft_0000320c_00000000-4_saxpy.cudafe1.cpp:(.text+0x734): undefined reference to `cudaLaunch'
collect2: error: ld returned 1 exit status
Makefile:42: recipe for target 'cudaSaxpy' failed
make: *** [cudaSaxpy] Error 1

I have also tried the simple saxpy example from this blog: https://devblogs.nvidia.com/easy-introduction-cuda-c-and-c/ and it works fine.

My machine specifications:
OS: Ubuntu 16.04
CUDA: 10.1
Driver: 418.56
GPU: GTX 1080 TI
NVCC: 7.5.17
gcc: 5.4

tera · November 7, 2019, 10:58am

You are not actually using CUDA 10.1 as you are thinking.
If this is a university machine, you may need to load a module file (see your local documentation or try “module avail”). If this is your own machine, install CUDA 10.1.

HannesF99 · November 7, 2019, 3:31pm

you might have to link additionally to the CUDA driver API (not only the runtime API), as these functions - cudaLaunch etc. - are implemented there

phillippang1994 · November 11, 2019, 6:15am

Why not? That was the output of nvidia-smi.

phillippang1994 · November 11, 2019, 6:16am

How do I do that?

Robert_Crovella · November 11, 2019, 6:24am

nvidia-smi doesn’t tell you which version of CUDA compiler (nvcc) you are using. It identifies a driver compatibility version.

https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi/53504578#53504578

Function names that begin with cuda are part of the CUDA runtime API, not the CUDA driver API. Functions that begin with cu (but not cuda) are part of the driver API. There is no need to explicitly link against the driver API/library to use cuda****** functions.

cudaLaunch was part of the runtime API, was deprecated:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXECUTION__DEPRECATED.html

and is now removed from the latest CUDA versions.

As indicated by tera, the issue here is a misconfigured/mixed compilation environment.

phillippang1994 · November 11, 2019, 6:55am

phillippang1994:

NVCC: 7.5.17

You are not actually using CUDA 10.1 as you are thinking.
If this is a university machine, you may need to load a module file (see your local documentation or try “module avail”). If this is your own machine, install CUDA 10.1.

Why not? That was the output of nvidia-smi.

±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|

OK I see the problem you meant. The nvcc should be 10.1 as well. Turns out I have an older nvcc compiler hiding in /usr/bin. I just rm the old exe and ln -s to the correct one. nvcc -V now reports Cuda compilation tools, release 10.1, V10.1.243

However, I am still unable to compile and still see the same errors.

phillippang1994 · November 11, 2019, 8:06am

Robert_Crovella:

nvidia-smi doesn’t tell you which version of CUDA compiler (nvcc) you are using. It identifies a driver compatibility version.

https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi/53504578#53504578

Function names that begin with cuda are part of the CUDA runtime API, not the CUDA driver API. Functions that begin with cu (but not cuda) are part of the driver API. There is no need to explicitly link against the driver API/library to use cuda****** functions.

cudaLaunch was part of the runtime API, was deprecated:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXECUTION__DEPRECATED.html

and is now removed from the latest CUDA versions.

As indicated by tera, the issue here is a misconfigured/mixed compilation environment.

So you are saying if I am compiling with CUDA 10, there should not be any calls to cudaLaunch()?

tera · November 11, 2019, 10:34am

So now you may be compiling with the 10.1 version of nvcc, but it is apparent that you are still not using the 10.1 versions of the CUDA libraries.
I’d recommend to fully remove your old partial/broken installations and do a clean install of CUDA 10.1.

phillippang1994 · November 12, 2019, 7:33am

OK it’s good now. reinstalled and deleted the object files and recompiled successfully. Thanks.

Topic		Replies	Views
CUDA compile trouble CUDA Programming and Performance	47	5117	November 8, 2010
Nvcc 12.3 with gcc 13.2 not working CUDA NVCC Compiler	11	9767	March 12, 2024
matrixMulDrv.cpp undefined references CUDA Programming and Performance	12	5357	October 23, 2010
CUDA version not available message with nvc++ on Ubuntu nvc, nvc++ and nvfortran	11	7657	April 30, 2021
CUFFT_INVALID_DEVICE on cufftPlan1d in NVIDIA's Simple CUFFT example GPU-Accelerated Libraries	6	3803	December 15, 2014
CUDA 10 installation problems on Ubuntu 18.04 CUDA Setup and Installation	24	94579	December 11, 2020
CUDA, Linux Ubuntu 10.04 and strange mismatch version CUDA Programming and Performance	26	19086	November 18, 2010
permanent CUDAFE crashes due to 0x0 memory reference CUDA Programming and Performance	6	1586	January 19, 2015
Build Error MSB3721 When calling object method within kernel, using compiler directives CUDA Programming and Performance	9	5727	November 18, 2015
Cuda cannot find my graphic card? CUDA Setup and Installation	5	2408	April 9, 2019

undefined reference to `cudaSetupArgument', `cudaLaunch'

Related topics