My codes are like:

cudaMalloc((void**)&device_a, SIZE);

cudaMemset(device_a, FLT_MAX, SIZE);

I found that cudaMemset doesnt work at all.

In the cuda guide doc appendix:
CUresult cuMemsetD32(CUdeviceptr dstDevice,unsigned int value, unsigned int count);

However if I try cuMemsetD32(device_a, FLT_MAX, SIZE);
It says can’t convert float* to CUdeviceptr

Anyone knows the way to do this?

memset() copies bytes at a time. Although the second argument is an int, it gets truncated to a byte. This is the behavior of the C Standard Library, and also the behavior in CUDA.

Additionally, the last time I checked cudaMemset() was unreasonably slow. I recommend just writing a very simple kernel that fills your memory with the values you need.

cuMemsetD32() might do what you need, but you cannot mix Driver API functions (those that start with cu*) with Runtime API functions (cuda*)

For cuMemsetD32, you would need addition type casts -also explicitly include cuda.lib, not just cudart.lib, such as;

size_t N = ...
float* pX, * pY;

checkCudaErrors(cudaMallocManaged(&pX, N * sizeof(float)));
checkCudaErrors(cudaMallocManaged(&pY, N * sizeof(float)));

float s1 = 1.0f;
float s2 = 2.0f;

cuMemsetD32(reinterpret_cast<CUdeviceptr>(pX), *reinterpret_cast<int*>(&s1), N);
cuMemsetD32(reinterpret_cast<CUdeviceptr>(pY), *reinterpret_cast<int*>(&s2), N);

Please note the reinterpret_cast for the value passed in as well.

(I just realized this was from 2008, yet I typed already, hence it goes :-) )