Hi, I am rather new to CUDA and sorry if this question has been asked before.
I would like to know more about why we should copy data from host to device. I have found some tutorials which show you how to copy an array from host to device, modify it, copy it back to host and display the result. I’ve modified the source code to directly read the variable which belong to device, without copying it to host, and it still works.
Here is the original code:
[codebox]int block_size = 4;
int n_blocks = N / block_size + ( N % block_size == 0 ? 0 : 1 );
square_array <<< n_blocks, block_size >>> ( a_d, N );
// Retrieve result from device and store it in host array
cudaMemcpy( a_h, a_d, sizeof( float ) * N, cudaMemcpyDeviceToHost );
// Print results
for ( int i = 0; i < N; i++ )
printf( "%d %f\n", i, a_h[i] ); // Cleanup
free( a_h );
cudaFree( a_d );[/codebox]
Here is my modification:
[codebox]int block_size = 4;
int n_blocks = N / block_size + ( N % block_size == 0 ? 0 : 1 );
square_array <<< n_blocks, block_size >>> ( a_d, N );
// Print results
for ( int i = 0; i < N; i++ )
printf( "%d %f\n", i, a_d[i] ); // Cleanup
cudaFree( a_d );[/codebox]
Can anyone tell me what’s the difference? Thanks.
EDITED: I think I put this topic in the wrong thread, it suppose to be in CUDA Programming and Development. Sorry.