Is there any equivalent to the cudaMemset function in OpenCL? So far I’ve found nothing. I would just like to set all values in a buffer to 0. I suppose I can write a kernel for that but I somehow hope that it can be done more efficiently
I didn’t read about any memset function in OpenCL. However instead of writing a kernel that do it, I would use the clEnqueueWrite() to copy an array with null value into the buffer. It would faster than using a kernel.
Here’s a nice little kernel i wrote:
__kernel void memset_uint4(__global uint4*mem,__private uint4 val) { mem[get_global_id(0)]=val; }
This is about 0.5 GB/sec faster on my 9600GT then clEnqueueWrite().
What I’m curious to know is if it could be improved any further; on the cpu, the difference between memset and memcpy is pretty significant; is the gpu different in this regard or is my code sub-optimal?
A few things i tested:
-
if i use uint2 instead of uint4 performance remains exactly the same
-
uint slightly but significantly decreases performance
-
uint8 and uint16 have the same performance, which is a little less then half of the uint2/uint4 (optimum) one
Did anyone benchmark cudaMemset against cudaMemcpy? Do they have approximately the same bandwidth?
If they do then my opencl kernel is probably optimal.
hi i just wanted to know how to do that in ClEnqueWrite() i am very new to this opencl please help me my code looks like this
for (j = 0 ; j < frame_size ; j++)
{
(*source_view).z_world_depth_frame[LU][j] = MAX_DEPTH_WORLD;
}
please do help me waiting for u replay
regards
megharaj