Hi everyone, I am considering inter-operation between CUDA PTX code and OpenCL PTX code. I think it will work in theory. However, I came across some problems when doing experiments in practice.
I use “transpose” in both CUDA and OpenCL. The signature of native kernel is listed as follows:
void transpose_native(const __global datatype * imatrix,
const int w,
const int h,
__global datatype * omatrix)
After front-end compiling, we get the PTX codes of
CUDA:
.entry _Z16transpose_nativeIfEvPKT_iiPS0_ (
.param .u64 __cudaparm__Z16transpose_nativeIfEvPKT_iiPS0__imatrix,
.param .s32 __cudaparm__Z16transpose_nativeIfEvPKT_iiPS0__w,
.param .s32 __cudaparm__Z16transpose_nativeIfEvPKT_iiPS0__h,
.param .u64 __cudaparm__Z16transpose_nativeIfEvPKT_iiPS0__omatrix)
OpenCL:
.entry transpose_native(
.param .u32 .ptr .global .align 4 transpose_native_param_0,
.param .u32 transpose_native_param_1,
.param .u32 transpose_native_param_2,
.param .u32 .ptr .global .align 4 transpose_native_param_3)
When I use the OpenCL PTX code as the input of “clCreateProgramWithBinary”, it can compile, run correctly and finally get a correct result.
When I use the CUDA PTX code as its input, it can also compile correctly and create a kernel object when running. But when it runs into function “clEnqueueNDRange” (I use “clWaitForEvents” in it) , it shows “CL_OUT_OF_RESOURCES”. Could anyone tell me the reason?
thanks in advance…