Howdy, Stranger!
It looks like you're new here. If you want to get involved, click one of these buttons!
Categories
- All Discussions1,524
- General534
- Graphics109
- GPU Computing419
- Mobile141
- Pro Graphics163
- Tools158
In this Discussion
- franzdaubner February 8
- fwende February 8
Tags in this Discussion
- cuda 422
- cuda-sdk 54
- direct-compute 11
Interactive updating the texture volume
-
Hi,
I am trying to update the texture (cudaArray) with CUDA while rendering it with ray-casting.
Since the device-to-device copy takes several seconds in my Quadro 4000,
the ray-casting is very hard to be interactive if there are many copy events.
Could you give me any suggestion to update the texture (cudaArray) directly? or any better way?
Here are my CUDA codes:
// 1. Variable declarations
cudaArray *d_cudaArray = 0;
texture tex_volume
unsigned char *d_Edited_Volume_uc;
// 2. copy data to 3D array (Device to Device)
// There are multiple copies during the ray-casting in my implementation
cudaMemcpy3DParms copyParams = {0};
cudaExtent volumeSize = make_cudaExtent(width, height, depth);
copyParams.srcPtr = make_cudaPitchedPtr(d_Edited_Volume_uc, width*sizeof(unsigned char), width, height);
copyParams.dstArray = d_cudaArray;
copyParams.extent = volumeSize;
copyParams.kind = cudaMemcpyDeviceToDevice;
cutilSafeCall( cudaMemcpy3D(©Params) ); // --> Takes 3-4 seconds (The main bottle neck)
// 3. 3D texture binding
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc();
cutilSafeCall(cudaBindTextureToArray(tex_volume, d_cudaArray, channelDesc));
// 4. Texture fetching at the ray-casting algorithm
float sample = tex3D(tex_volume, texCoord.x, texCoord.y, texCoord.z);
After the volume (d_Edited_Volume_uc, width*height*depth = 256^3) is updated with CUDA, it is copied to "d_cudaArray" for the texture binding. That takes most of the time and it is the main bottle neck for the interactive ray-casting. Without the copying the volume, it shows around 10 FPS.
I am wondering if I can update the texture memory directly with CUDA during the ray-casting.
Or, if is there any better way to make the rendering interactive with multiple texture copies.
Thank you in advance
-
2 Comments sorted by
-
Hmm, in my opinion device to device copying should not be so slow. Did you measure the throughput of your cudaArray copy operation compared to a plain device-to-device memcopy (preferably of the same size)?
Cuda Arrays have some restrictions concerning alignment, so if the alignment of your texture does not match the requirements of the cudaArray the copy operation might actually be performed line-by-line, which would slow down things considerably.
If I remember correctly under toolkit 4.1 you can query the alignment restrictions for the texture units.
Unfortunately there is no equivalent to cudaBindTexture2D for 3-dimensional textures, this would allow you to skip the copy operation entirely because the texture could be bound directly to the linear memory. -
maybe you can write the data you want to copy to 'd_cudaArray' after the kernel execution directly to 'd_cudaArray' from within the kernel that fetches the texture. after the kernel has finished, you can bind the texture to 'd_cudaArray' without memcpy.
if you want to update the volume repeatedly, you can introduce a double buffer with each of the buffers bound to a texture. this way, you need to setup/bind textures just once.