Is there any equivalent to the cudaMemset function in OpenCL? So far I’ve found nothing. I would just like to set all values in a buffer to 0. I suppose I can write a kernel for that but I somehow hope that it can be done more efficiently

I didn’t read about any memset function in OpenCL. However instead of writing a kernel that do it, I would use the clEnqueueWrite() to copy an array with null value into the buffer. It would faster than using a kernel.

Here’s a nice little kernel i wrote:

__kernel void memset_uint4(__global uint4*mem,__private uint4 val) { mem[get_global_id(0)]=val; }

This is about 0.5 GB/sec faster on my 9600GT then clEnqueueWrite().

What I’m curious to know is if it could be improved any further; on the cpu, the difference between memset and memcpy is pretty significant; is the gpu different in this regard or is my code sub-optimal?

A few things i tested:

  • if i use uint2 instead of uint4 performance remains exactly the same

  • uint slightly but significantly decreases performance

  • uint8 and uint16 have the same performance, which is a little less then half of the uint2/uint4 (optimum) one

Did anyone benchmark cudaMemset against cudaMemcpy? Do they have approximately the same bandwidth?

If they do then my opencl kernel is probably optimal.

hi i just wanted to know how to do that in ClEnqueWrite() i am very new to this opencl please help me my code looks like this

for (j = 0 ; j < frame_size ; j++)


(*source_view).z_world_depth_frame[LU][j] = MAX_DEPTH_WORLD;


please do help me waiting for u replay