texture binding and stream

Hi,

I’m wondering if the texture binding will work correctly or not as in the following way:

cudaBindTextureToArray //binding to an array
someKernel<<<,stream>>> //launch asynchronous kernel
cudaBindTextureToArray //binding to another array;
anotherKernel<<<,stream>>>//launch another asynchronous kernel in same stream.

Will the first kernel execution still use the first array binded to the same texture reference?

Thanks!

I know that this question was posted a long time ago, but I have been wondering the same thing. Can anyone comment?

Can we do:

cudaBindTextureToArray //binding to an array

someKernel<<<,,,stream>>> //launch asynchronous kernel 

cudaBindTextureToArray //binding to another array;

anotherKernel<<<,,,stream>>>//launch another asynchronous kernel in same stream.

Or should we do:

cudaBindTextureToArray //binding to an array

someKernel<<<,,,stream>>> //launch asynchronous kernel 

cudaStreamSynchronize(stream)   // synchronize first!  (don't rebind until someKerenl<<<>>> has completed)

cudaBindTextureToArray //binding to another array;

anotherKernel<<<,,,stream>>>//launch another asynchronous kernel in same stream.

Thanks!

I’d wager the synchronize would be pretty essential in getting the 2nd binding to play well with the kernel launches… But I’m likely on this road as well and will find out what to do when I get there. I’m using the driver API in windows and linking to a PTX string resource and so to rebind I’d likely be using some combination of…

CF_CHECK_CALL( rcCUDA, cuModuleGetTexRef( &hDeviceTextureUnit, hModule, szName ) );
CF_CHECK_CALL( rcCUDA, cuTexRefGetFormat( &aChannelFormat, &iChannels, hDeviceTextureUnit ) );
CF_CHECK_CALL( rcCUDA, cuTexRefSetFlags( hDeviceTextureUnit, CU_TRSF_NORMALIZED_COORDINATES ) );
CF_CHECK_CALL( rcCUDA, cuTexRefSetFilterMode( hDeviceTextureUnit, eFilterMode ) );
CF_CHECK_CALL( rcCUDA, cuArrayCreate( &hDeviceArrayMem, &ad ) );
CF_CHECK_CALL( rcCUDA, cuTexRefSetArray( hDeviceTextureUnit, hDeviceArrayMem, CU_TRSA_OVERRIDE_FORMAT ) );

… with a different codepath for linear memory instead of an array.

Best of luck!

I got some information from one of NVIDIA’s developers of CUDA, and I can finally put this very old thread to rest:

No synchronization necessary. The bindings are versioned under the hood so you’ll just keep going asynchronously and the prior launch will still be churning with the old binding. Eventually the HW may run out of binding versions, but that’ll take a while, and when it does the driver will take care of the necessary synchronization internally.”