I tried to run a ptx code with the CUDA Driver API but the execution couldn’t complete. The ptx code is taken from a .cl kernel.
I think the problem is with some special registers ( %envreg ), and more specific with %envreg6. I figured that %envreg6 has no value and thats why the execution cannot end. I manually changed the ptx assembly, replacing the %envreg6 with a value, and the program runned. I also figured that %envreg6 normally stores the blocksize.
Is there any way to set values to these special registers ( ptx_isa doesn’t say much )? Am i missing something with the driver calls, a flag maybee on cuLaunchKernel?
I also had made a similar thread last year: https://devtalk.nvidia.com/default/topic/509271/cuda-programming-and-performance/what-is-envreg-60-32-62-special-register-/ .
My pc configuration is pretty much the same.
So in the bottom line, can I and if yes HOW can i set values to those registers?