Streams and Texture Memory

I am trying to split some computations over two gpu’s, so I want to use two streams (one per gpu).

I grasp the basic idea of streams, my problem is when I add texture memory into the mix. Is there any stream related asynchronous binding version for cudaBindTextureToArray? Or it will work regardless?

I do know that the best way is always to “play with it” and see what happens, however when the code is long enough and you start to have errors, is nice to know that they are not all coming from here…

Thank you,

Streams are intended for asynchronous operations within a single context (ie. on a single GPU). They don’t really have anything to do with multiple GPU programming, because the CUDA model uses a different context per GPU.

Thank you for your answer, I see your point.

However, on the SDK example Simple Multi-GPU you can see how you can do something like:

for ( int gpu = 0 ; gpu < number_of_GPUs; ++gpu )


		cudaSetDevice ( gpu );

		cudaStreamCreate ( &stream[gpu] );


and then

for ( int gpu = 0; gpu < number_of_GPUs; ++gpu )


		cudaSetDevice ( gpu );


		cudaMemcpyAsync( d_Data, h_Data, size, cudaMemcpyHostToDevice, stream[gpu] );


		kernel<<<BLOCK_N, THREAD_N, 0, stream[gpu]>>>(d_Output, d_Input, N);


		cudaMemcpyAsync( h_Output, d_Output, size, cudaMemcpyDeviceToHost, stream[gpu] );


Achieving something like what I want to do. Now, I want to add texture memory onto the mess… Any hints?

Right, but the only part of that code you have posted which is specifically related to multi-gpu is the cudaSetDevice call. The rest is only about asynchronous operations within a single GPU context. If you have a texture, it will be defined in each context you initialize on each GPU. Think of anything after a cudaSetDevice call as occurring in the scope of a given context. This includes global memory symbols, textures, kernels and anything which has context level scope.

Thank you avidday, that really solved my doubts!