PTX code Interoperation between CUDA and OpenCL

Hi everyone, I am considering inter-operation between CUDA PTX code and OpenCL PTX code. I think it will work in theory. However, I came across some problems when doing experiments in practice.

I use “transpose” in both CUDA and OpenCL. The signature of native kernel is listed as follows:

void transpose_native(const __global datatype * imatrix,
const int w,
const int h,
__global datatype * omatrix)

After front-end compiling, we get the PTX codes of

.entry _Z16transpose_nativeIfEvPKT_iiPS0_ (
    .param .u64 __cudaparm__Z16transpose_nativeIfEvPKT_iiPS0__imatrix,
    .param .s32 __cudaparm__Z16transpose_nativeIfEvPKT_iiPS0__w,
    .param .s32 __cudaparm__Z16transpose_nativeIfEvPKT_iiPS0__h,
    .param .u64 __cudaparm__Z16transpose_nativeIfEvPKT_iiPS0__omatrix)


.entry transpose_native(
.param .u32 .ptr .global .align 4 transpose_native_param_0,
.param .u32 transpose_native_param_1,
.param .u32 transpose_native_param_2,
.param .u32 .ptr .global .align 4 transpose_native_param_3)

When I use the OpenCL PTX code as the input of “clCreateProgramWithBinary”, it can compile, run correctly and finally get a correct result.

When I use the CUDA PTX code as its input, it can also compile correctly and create a kernel object when running. But when it runs into function “clEnqueueNDRange” (I use “clWaitForEvents” in it) , it shows “CL_OUT_OF_RESOURCES”. Could anyone tell me the reason?

thanks in advance…

I’m a beginner in CUDA and OpenCL.
Can you tell me how to obtain PTX code for openCL and CUDA code??
Thanks in advance

Use nvcc compiler to get ptx code.

From the looks of it name-mangling is hapen in the code above. This can be avoided with “extern c { …kernel code… }”

The parameters in the code above seems to be 64 bit, this might be because of nvcc -machine 64, try changing it to -machine 32 or so… or try adding it.

Maybe that will solve the out of resources problem.

Perhaps post some more code, perhaps code is using too many registers or so…