performance analysis tool

hi, since my opencl code runs slowly, i want to find out internal activity in GPU, (instructions executed, wait between instructions, read/write delay etc.), is there any tool available ?

hi, since my opencl code runs slowly, i want to find out internal activity in GPU, (instructions executed, wait between instructions, read/write delay etc.), is there any tool available ?