Can't compile simple test cuda kernel

wesley.neill · March 14, 2023, 6:18pm

Good afternoon all,

I have been looking at converting some cpp code that I’ve had good success to a cuda kernel to maybe squeeze even more speed out of it. I decided to try my hand at a simple kernel first, a slight tweak on the example found here https://developer.nvidia.com/blog/even-easier-introduction-cuda/.

However, when try to compile this, nvcc just hangs forever:

I think that it’s not an issue with my code itself, but just in case this is it:

#include <torch/script.h>

using namespace torch;

__global__ 
void add_kernel(int N, float x[], float y[], float z[]) {
    
    int64_t i = threadIdx.x;
    int64_t s = blockDim.x;

    for (int64_t ix = i; i < N; i += s) {
        z[i] = x[i] + y[i];
    }
}

Tensor add_gpu(Tensor a, Tensor b) {

    int64_t N = a.size(0);
    size_t bytes = N * sizeof(float);

    Tensor result = torch::empty(N);

    float* d_a;
    float* d_b;    
    float* d_z;
    float* h_z = result.data_ptr<float>();
    cudaMallocManaged(&d_a, bytes);
    cudaMallocManaged(&d_b, bytes);
    cudaMallocManaged(&d_z, bytes);

    cudaMemcpy(d_a, a.data_ptr<float>(), bytes, cudaMemcpyHostToDevice);
    cudaMemcpy(d_b, b.data_ptr<float>(), bytes, cudaMemcpyHostToDevice);

    int64_t block_size = 256;
    int64_t grid_size = (int64_t)ceil(N / block_size);

    add_kernel<<<grid_size, block_size>>>(N, d_a, d_b, d_z);
    cudaMemcpy(h_z, d_z, bytes, cudaMemcpyDeviceToHost);

    cudaFree(d_a);
    cudaFree(d_b);
    cudaFree(d_z);

    return result;
}

int main(void) {
    int N = 100000;
    torch::Tensor x = torch::arange(N);
    torch::Tensor y = x + 1;
    Tensor z = add_gpu(x, y);

}

Topic		Replies	Views
Thrust or NVCC Bug - NVCC hangs CUDA Programming and Performance	5	927	November 2, 2018
Why am I unable to compile a CUDA program even though I have nvcc? CUDA Setup and Installation	3	589	December 4, 2023
compiling basics nvcc CUDA Setup and Installation	4	16638	June 27, 2014
Nvcc compiler problem Nvcc hangs during compilation of given piece of code CUDA Programming and Performance	7	11410	February 16, 2009
[SOLVED] Code not compiling for mysterious reason CUDA Programming and Performance	3	5585	December 5, 2017
Using custom cuda-kernels within .cpp files using cmake Jetson Xavier NX cuda	2	1080	October 18, 2021
compile cuda code compile cuda CUDA Programming and Performance	1	1830	July 13, 2009
Array of cudaTextureObject_t compiles on Windows, but not Linux CUDA Programming and Performance	3	1212	May 16, 2015
general use of nvcc CUDA Programming and Performance	8	2672	November 7, 2009
Unable to compile CUDA file CUDA Setup and Installation	9	10206	May 19, 2017

Can't compile simple test cuda kernel

Related topics