I am looking for the presenter who did the last lectures on Wed and Thurs. He did the “Fundamental Performance Optimizations for GPUs” and “Analysis-Driven Performance Optimization For GPUs”.
He presented a really neat feature in the compute profiler for CUDA and OpenCL where you could measure the ratio of latency vs memory ops vs instruction ops. Nobody in the CUDA Programming and Development forum or anybody on Stack Overflow knows about it though. I am not sure if this is a new feature or hasn’t been released yet. I have been digging around the compute profiler 3.2 and I cannot find it :(