cbl9.5
April 27, 2019, 8:52am
1
hi,
I found that Memcpy from cuda buffer to screen buffer takes too much time. How can I avoid the memcpy step or make memcpy work at 100% bandwidth of DRAM?
unsigned char screen_buf = (unsigned char )mmap(NULL, screenlen, PROT_WRITE | PROT_READ, MAP_SHARED, fd,0);
cudaMallocManaged (&cuda_buf , size, cudaMemAttachGlobal);
…
for (i=0; i<height; i++) {
memcpy(screen_buf +i width 4 ,cuda_buf +i width 4,width*4);
}
tera
April 28, 2019, 4:44pm
2
Use the CUDA/OpenGL interop.
You will still need to do a final device side write into the “interop buffer” that is mapped to an opengl texture (depending on the method).
Mapping the texture to a surface and writing to it directly from your CUDA kernel is likely one of the more efficient methods AFAIK.
is it to directly write to GL_BACK of openGL’s default colorAttachment by CUDA?