Typecast CUdeviceptr to uchar** access throws illegal memory access error 700

Hi,

We allocated the memory for CUdeviceptr for v1 object and type cast to uchar** type.

CUdeviceptr v1;
checkCudaResult(cuMemAlloc((CUdeviceptr*)&v1, 16));
checkCudaErrors(cudaMemset((void*)v1,1, 16));
uchar** v1char = (uchar**)v1;
foo_double<<<4,4>>>(v1char);
checkCudaResult(cuMemFree(v1));

The above code throws an error of illegal memory access when v1char is accessed inside the kernel like the following,

_global__ void foo_double(uchar** val) {
  unsigned int tid = threadIdx_x;
  unsigned int bid = blockIdx_x * blockDim_x + tid;
  uchar* val_ = val[bid];
  printf("%d:%d\n", bid,*val_);
}

How to typecast cudeviceptr to uchar** for the above access inside kernel?

You allocate 16 bytes, which is the size of two pointers, but try to access memory of 16 pointers. That won’t work.

So, how to allocate the16 pointers and access the 16 bytes according to this example?. Please help me on this.

Allocate size of 16 pointers.

checkCudaResult(cuMemAlloc((CUdeviceptr*)&v1, sizeof(uchar*) * 16));
checkCudaErrors(cudaMemset((void*)v1,1, sizeof(uchar*) * 16));

This will probably fail as well.

uchar* val_ = val[bid];
  printf("%d:%d\n", bid,*val_);

Even if you allocate 16 pointers and uchar* val_ = val[bid]; succeeds, the pointer value of val_ is probably invalid so it must not be dereferenced in the printf.

If I don’t dereference,

printf("%d:%d\n", bid,val_);

It outputs the following,

8:0
9:0
10:0
11:0
0:0
1:0
2:0
3:0
4:0
5:0
6:0
7:0
12:0
13:0
14:0
15:0

Doubt here is, I should get 1 instead of 0 rite?

Use %p to print pointer values.

Thanks for the support. The size we could observe like only 16 bytes for allocation. sizeof(uchar*) multiple is definitely required to make uchar** to work?

N objects of type T require N * sizeof(T) bytes. In your case, T = uchar*.
This is not specific to CUDA

1 Like