memset32_aligned1D

When I run the Compute Visual Profiler, instead of a call to one of my kernels show this “memset32_aligned1d” method.
All I could find is that it is a memory alignment function used internally by CUDA, but why is it calling this instead of my kernel?

When I run the Compute Visual Profiler, instead of a call to one of my kernels show this “memset32_aligned1d” method.
All I could find is that it is a memory alignment function used internally by CUDA, but why is it calling this instead of my kernel?

Hi,

Are you calling cudaMemset(…) in your code?

Hi,

Are you calling cudaMemset(…) in your code?

Actually when I use cudaMemset inside the kernel I get this error: “calling a host function(“cudaMemset”) from a device/global function(“function”) is not allowed”, but when I use the C memset it works just fine.

And even if I comment the memset, I still have this memset32_aligned1D showing in the Compute Visual Profiler.

Actually when I use cudaMemset inside the kernel I get this error: “calling a host function(“cudaMemset”) from a device/global function(“function”) is not allowed”, but when I use the C memset it works just fine.

And even if I comment the memset, I still have this memset32_aligned1D showing in the Compute Visual Profiler.

Hi,

cudaMemset is a host function. You can’t call it from a kernel.

It looks like you are running the wrong binary in your profiler. If you uncomment all GPU calls in the application, do you still have a memset32_aligned1D call in CVP?

Hi,

cudaMemset is a host function. You can’t call it from a kernel.

It looks like you are running the wrong binary in your profiler. If you uncomment all GPU calls in the application, do you still have a memset32_aligned1D call in CVP?

Perhaps the application was compiled in cuda 3.2 emulation mode ?

Perhaps the application was compiled in cuda 3.2 emulation mode ?

Inside the kernel I´m using just memset, not cudaMemset. I had tested it in other kernels and it seems to work.

I check the binary I was running and it´s the right one.

If I uncomment all GPU calls I still hava a memset32_aligned1D call.

I tried using a for loop in the place of the memset, but no changes in the CVP.

But now I tried commenting the kernels calls. In the CVP the kernels calls are gone, so I know it´s the right binary I´m running.

There is just cudaMallocs,cudaMemcpys, and cudaFrees in the code, and still the memset32_aligned1D is there.

In the Profiler Output tab, shows that the memset32_aligned1D is the first method to be called, the GPU Timestamp is 0.

I´m compiling it in cuda 4.0 debug mode.

Inside the kernel I´m using just memset, not cudaMemset. I had tested it in other kernels and it seems to work.

I check the binary I was running and it´s the right one.

If I uncomment all GPU calls I still hava a memset32_aligned1D call.

I tried using a for loop in the place of the memset, but no changes in the CVP.

But now I tried commenting the kernels calls. In the CVP the kernels calls are gone, so I know it´s the right binary I´m running.

There is just cudaMallocs,cudaMemcpys, and cudaFrees in the code, and still the memset32_aligned1D is there.

In the Profiler Output tab, shows that the memset32_aligned1D is the first method to be called, the GPU Timestamp is 0.

I´m compiling it in cuda 4.0 debug mode.