I modified postProcessGL example from SDK to measure bandwidth from cuda to opengl. It works by generating image in cuda kernel and then transferring it to OpenGL context trough PBO. Nothing is transferred from cpu->cuda or OpenGL -> cuda. Modified source can be loaded from here.
It seems that I get really bad results, only 390MB/s. Am I doing something wrong or is this really that slow?
It seems that bandwidth is about equal to what I get by transferring data from cuda to cpu to opengl… So is this what cuda drivers do currently?