Hi,

I just started to learn CUDA and I use cupy/pycuda to interface.

I work on a simple example: scalar multiplication of a vector:

.cu code:

```
#define _I ( threadIdx.x + blockIdx.x * blockDim.x )
extern "C" __global__
void scalar_multiply_kernel(float *vec, float scalar)
{
int i = _I;
vec[i] = scalar * vec[i];
}
```

In pycuda this works fine with:

`scalar_multiply_gpu(testvec_gpu, np.float32(2), block=(1024, 1, 1), grid=(int(N/1024)+1, 1, 1))`

Where N is the length of the vector.

In cupy this approach does not seem to work, and I would like to understand why:

Somehow the scalar in cupy doesn’t get passed. I have to convert the scalar to a N=1 length vector and in the cuda code have a pointer :

in python:

```
grid = (int(N/1024)+1, 1, 1)
block = (1024, 1, 1)
args = (testvec_gpu, cp.asarray([2.0]).astype(cp.float32))
scalar_multiply_gpu(grid, block, args=args)
```

in cu file:

```
#define _I ( threadIdx.x + blockIdx.x * blockDim.x )
extern "C" __global__
void scalar_multiply_kernel(float *vec, float *scalar)
{
int i = _I;
vec[i] = scalar[0] * vec[i];
}
```

I’m sorry if this question may seem to be trivial, but if somebody has a good explanation, that will help me a lot.

Thanks