copy object from host to device

Hi, I am rather new to CUDA and sorry if this question has been asked before.

I would like to know more about why we should copy data from host to device. I have found some tutorials which show you how to copy an array from host to device, modify it, copy it back to host and display the result. I’ve modified the source code to directly read the variable which belong to device, without copying it to host, and it still works.

Here is the original code:

[codebox]int block_size = 4;

int n_blocks   = N / block_size + ( N % block_size == 0 ? 0 : 1 );

square_array <<< n_blocks, block_size >>> ( a_d, N );

// Retrieve result from device and store it in host array    

cudaMemcpy( a_h, a_d, sizeof( float ) * N, cudaMemcpyDeviceToHost );

// Print results

for ( int i = 0; i < N; i++ )

    printf( "%d %f\n", i, a_h[i] ); // Cleanup

free( a_h );

cudaFree( a_d );[/codebox]

Here is my modification:

[codebox]int block_size = 4;

int n_blocks   = N / block_size + ( N % block_size == 0 ? 0 : 1 );

square_array <<< n_blocks, block_size >>> ( a_d, N );

// Print results

for ( int i = 0; i < N; i++ )

    printf( "%d %f\n", i, a_d[i] ); // Cleanup

cudaFree( a_d );[/codebox]

Can anyone tell me what’s the difference? Thanks.

EDITED: I think I put this topic in the wrong thread, it suppose to be in CUDA Programming and Development. Sorry.

The device(GPU) and host(CPU) memory is two different entities and in order to calculate stuff on the GPU you need to copy
the data from the host’s RAM to the GPU’s memory, do the calculation in the kernel and copy the result back to the host’s RAM.

Are you running your modified code in DEBUG mode or an early CUDA version? DEBUG mode actually runs on the host so
you are not really running anything on the GPU and therefore the memory issues are ok…

eyal

The device(GPU) and host(CPU) memory is two different entities and in order to calculate stuff on the GPU you need to copy
the data from the host’s RAM to the GPU’s memory, do the calculation in the kernel and copy the result back to the host’s RAM.

Are you running your modified code in DEBUG mode or an early CUDA version? DEBUG mode actually runs on the host so
you are not really running anything on the GPU and therefore the memory issues are ok…

eyal

Your confusing RELEASE / DEBUG with EMUDEBUG / EMURELEASE now.

Christian

Your confusing RELEASE / DEBUG with EMUDEBUG / EMURELEASE now.

Christian