not using ‘malloc’ as such, use ‘cudaMalloc’, this will allocate memory that exists on the device.
i.e.
float* device_buffer;
CUDA_SAFE_CALL(cudaMalloc((void**)&device_buffer,number_of_elements*sizeof(float)));
remember, you cannot directly access this memory on the host without copying over it first.