Hello everyone ,
I have tried to install the OpenCL visual profiler with the opencl version 1.0 and using Tesla c870 card.
I am getting pathetic performance for my compute kernels. Is there any reson the Visual profiler should give segfault? Will it work?
also my kernel are in executed in a for loop of 30*50, should that affect the performance.
With the CUDA code I am getting 60X speedup without optimizations and 160X speedup using texture memory.
My code is getting badly serialized somehwere hence i needed Profiler. Is there any other way of profiling without using the Visual Profiler.??
Any help would be welcome.
Thanks