Memeory allocation on Host Memory allocation to Host to Device Transfer

In several examples of the CUDA Programming Guide 2.3.1 they

cudaMalloc((void**)&dd_A, size);

then they later cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice):

In other words they allocate the memory on the host then copy it to the device where it s used in this case by a matrix multiplication kernel. Why bother with allocating on the host system? Why not just allocate on the device and skip the host to device transfer. It seems a lot easier. They may not be a command to allow cudaMalllloc on the device and that could be a valid point. However, the question is still valid. Maybe there should be such as command if there currently is not one.



cudaMalloc((void**)&d_A, size);

allocates a device memory with “size” bytes and put the address into d_A

where d_A is a host variable, its content is an device address.

In most case, you need host to device transfer or vice versa since

you need to report results either to File or to screen.

Err no. cudaMalloc allocates memory on the device, not the host. The cudaMemcpy is copying from another piece of host memory into the memory allocated by cudaMalloc. This is all covered in Ch 3 of the programming guide…