Hi,
I just started to learn CUDA and I use cupy/pycuda to interface.
I work on a simple example: scalar multiplication of a vector:
.cu code:
#define _I ( threadIdx.x + blockIdx.x * blockDim.x )
extern "C" __global__
void scalar_multiply_kernel(float *vec, float scalar)
{
int i = _I;
vec[i] = scalar * vec[i];
}
In pycuda this works fine with:
scalar_multiply_gpu(testvec_gpu, np.float32(2), block=(1024, 1, 1), grid=(int(N/1024)+1, 1, 1))
Where N is the length of the vector.
In cupy this approach does not seem to work, and I would like to understand why:
Somehow the scalar in cupy doesn’t get passed. I have to convert the scalar to a N=1 length vector and in the cuda code have a pointer :
in python:
grid = (int(N/1024)+1, 1, 1)
block = (1024, 1, 1)
args = (testvec_gpu, cp.asarray([2.0]).astype(cp.float32))
scalar_multiply_gpu(grid, block, args=args)
in cu file:
#define _I ( threadIdx.x + blockIdx.x * blockDim.x )
extern "C" __global__
void scalar_multiply_kernel(float *vec, float *scalar)
{
int i = _I;
vec[i] = scalar[0] * vec[i];
}
I’m sorry if this question may seem to be trivial, but if somebody has a good explanation, that will help me a lot.
Thanks