host code writing into gpu memory getting rid of memcopies ..

Dear all,

Is it possible for host code to write its results directly into cuda-malloc’ed GPU-memory?

At the moment I’m writing it into cpu memory and doing a Memcopy afterwards…

I wanted to try if it’s any faster to directly write into videomemory instead of copying a matrix with too much information already…

unsigned char* m_videoYGPU;

CUDA_SAFE_CALL( cudaMalloc( (void**) &m_videoYGPU, mem_size));

for ( int i = 0; i < g_height; i++){

m_videoYGPU[g_width*i]=m_videoY[g_width*i];

}

As you c, i only need to certain positions in the array and not the whole array itself…

The code presented above gives a Access violation writing location …

Kind regards

use cudaMallocHost for allocation of pinned (locked) memory - more speedy transfers …

i think there is no way to write directly to GPU memory without driver assisted memcpy …

What do you mean by “write results without copy”? Where does source data reside? Isn’t it in host memory already?

No, there’s no way to assign values directly. You have to call cudaMemcpy() for each element (or for whole array).

Then cudaMemcpy just the values you need to in a “packed” array that doesn’t have unneeded elements and run a quick little kernel to unpack it. If your unpacking kernel has all reads/writes coalesced (read from a tex1Dfetch if you can’t coalesce them), then you should see 70GiB/s performance in the unpacking.

Thank you very much for your replies, have to redesign some stuff so some extra code is executed at gpu side, which would make this memcopies obsolete.