if you go to page 20 on CudaProgramming_guide_2.3.pdf you will see the matrix multiplication example. i understand the global idea of that code,
they transfer the host data to device memory for the matrix elements. using cudaMalloc, thats ok and understandable, so they can read that data from video memory inside the kernel function… but
what about the values width and height of the Matrix struct?? no cudaMalloc for them, they just went through inside the struct as arguments for the kernel…, i mean how CUDA knows that these 2 variables width and height can be read even when they never did cudaMalloc for them??. how can the kernel read them?
and they dont even do cudaFree for them??, so looks like it had stayed on host memory all the time…a little confusing. since kernels do not read host memory , do they?
help to clear this questions would be appreaciated. its confusing me damnn lol