This are the things that are most wished for me to be fixed:
improvements:
-
Support kernels with a loop with a lot of MADS for testing peak flops: this gets long compile times-> kernel in CUDA compiles fast…
-
Ship an up to date ICD compatible with AMD one i.e fix ICD for detecting also AMD backend… (or AMD ship fixed OCL iCD dll)…
-
expose
clGetGLContextInfoKHR(cl_context_properties *properties,
cl_gl_context_info param_name,
size_t param_value_size,
void *param_value,
size_t *param_value_size_ret)
is not in hearders, .lib and also not exported in khronos .dlls
-
Add DirectCompute ocean demo to OpenCL port in GTC09 (shown): i.e are the plans to publish OpenCL port of DirectCompute ocean demo shown in GTC OpenCL course…
-
Ship a driver compatible with new Nvidia DirectX interop extensions
-
fp_16 and 3d_image_write extensions?
ocl compiler bugs:
- and bug in ATI AES sample… see:
Thanks. Also, I’ve found a way to fix AESEncryptDecrypt sample to pass test on nvidia: just replace
unsigned char hiBitSet = (a & 0x80);
with
unsigned char hiBitSet = ((a>127)?128:0);
in AESEncryptDecrypt_Kernels.cl
It looks weird, but it works
- fft apple lib see: http://forums.nvidia.com/index.php?showtopic=153544
Take a look at fft_base_kernels.h, see line 4 of “baseKernels”, the complexMul line.
The define seems to be too complicated to the NVidia OpenCL compiler, I replaced the define by a function and it’s now working:
float2 complexMul(float2 a,float2 B) { return (float2)(mad(-(a).y, (B).y, (a).x * (B).x), mad((a).y, (B).x, (a).x * (B).y));}
- kernels without parameters don’t compile
bugs in SDK:
- samples get platformID but have to set parameter to NULL for working on non Nvidia imp (AMD imp.)…
or fix the function for setting to NULL at first…
-
Oclutils: getdevice(i) check num devices but returns wrong data if i=num devices due to incorrect check if(i>numdevices) error…
-
Shrutils: findfilepath if you put absolute path “c:..” fails due to adding "." you have to add “” to add paths…