Anyone optimize by modifying PTX or cubin code?

terps128 · October 4, 2010, 3:48pm

I am curious if anybody does this. This should all be possible to do on CUDA or OpenCL right? I am specifically looking for a way to see if my global memory stores/loads are coalesced by just looking at the PTX and then changing my high level code to fix the problem. Would I be wasting my time with PTX for a task like this? Does anybody really know how to mess with PTX or cubin code effectively or is it still incredibly difficult to do? Let me know, I would appreciate any elaboration :)

terps128 · October 4, 2010, 3:48pm

I am curious if anybody does this. This should all be possible to do on CUDA or OpenCL right? I am specifically looking for a way to see if my global memory stores/loads are coalesced by just looking at the PTX and then changing my high level code to fix the problem. Would I be wasting my time with PTX for a task like this? Does anybody really know how to mess with PTX or cubin code effectively or is it still incredibly difficult to do? Let me know, I would appreciate any elaboration :)

jan.heckman · October 4, 2010, 4:18pm

Look at http://forums.nvidia.com/index.php?showtopic=159033

jan.heckman · October 4, 2010, 4:18pm

Look at http://forums.nvidia.com/index.php?showtopic=159033