Question about Volume Rendering & cudaArray (s) taking the SDK example, modifying to accept 16bi


I am still a novice as far as it comes to CUDA (and for that matter, I’m relatively new to C++). My task the last few weeks has been to convert the SDK volumeRender example to read 16bit .RAW files. Actually, the datasets I am trying to read are of the 10/12 bit varety (…tasets/new.html).

I’ve poked around the forum a little bit and found the most pertinent information a little bit unfinished: Despite what has been said, the opening 16bit files is not as straightforward as one should believe. When replacing this codeblock from uchar to ushort (1byte/8bits to 16bits), there is an error at cudaMemCpy3D.


void initCuda(ushort *h_volume, cudaExtent volumeSize)


// create 3D array

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<uchar>();

cutilSafeCall( cudaMalloc3DArray(&d_volumeArray, &channelDesc, volumeSize) );

// copy data to 3D arraycudaMemcpy3DParms copyParams = {0};

copyParams.srcPtr   = make_cudaPitchedPtr((void*)h_volume, volumeSize.width*sizeof(uchar), volumeSize.width, volumeSize.height);

copyParams.dstArray = d_volumeArray;

copyParams.extent   = volumeSize;

copyParams.kind     = cudaMemcpyHostToDevice;

cutilSafeCall( cudaMemcpy3D(&copyParams) );


I have used the correct TypeDef for ushort and seemed to have approriately modified the code to deal with 2bytes of .RAW data instead of the 1byte uchars. The fopen/fread have appropriately been passed parameters which should allow for opening the correct data.

After frustration, I’ve decided to play with different datatypes… Since I noticed that the dstArray in the cudaMalloc3DArray used the cudaArray datastructure, I tried using this as the source array for h_volume. I have attached the modified program called “volumeArray_cudaArray”.

I have some “newbie” like questions such as,

1.) Can cudaArrays be instantiated & used on the host?

the error with the cudaArray-source program seems to not allow memory copies from the host to device.

2.) What type of easier to understand remedies can I use to solve this problem? I haven’t found much documentation about some of the features used in SDK examples besides (…line/index.html)

Thanks in advance, also, let me know if I can clarify anything. I am using CUDA SDK 2.3 with Visual Studio 2008. If you would like to get the sourcecode working, download the headMRT scan (… and place it within the “/data” “bucky.raw” folder under the attached VS solution file.

I don’t want to give up, in fact I can’t, so thanks again! (2.01 MB) (1.2 MB)

Okay, since so many people seem to be having trouble with this, I’ve modified the volume rendering code to support 16-bit data.

The modified files are attached, just drop these into the exisiting SDK directory. I added a typedef “VolumeType” in the .cpp and .cu files which you can change to whatever type you want to use.

Note that the 16-bit data files you linked to only contain 10-bit data, so I’ve added a scale times 64 in the rendering code. This is a bit inefficient, you should do the scale at load time really.

Also, the code assumes a cubical volume so the images might come out a bit distorted with non-square volumes. I leave this as an exercise for the reader! (8.75 KB)

Does this new version already include the performance enhancements for rendering?


Quick question to those in the know- what exactly is the shrUtils.h header file? I see it noted in the OpenCL Getting Started Guide on page 17. I’m still using the 2.3SDK and was just curious about the discrepancy. It appears that a few user’s have made a stab at this question:

Thanks again Simon, I’m working with your code. I may have to update to the 3.0 kit for shrUtils (I don’t have the best feeling about copying that file in a mixed-version include files).

Backported this to CUDA SDK 2.3 for the good of humanity (no actually for purely selfish reasons) - see attachment. Place the mrt16_angio.raw file in the data folder for this sample to run.

I will update my fun VoxelPlancton app with the improved rendering code soon.

Christian (13.7 KB)

Thanks Simon & Cbuchner! I’m not sure where I was getting stuck with displayed the 16bit datasets, but the volumeType solution makes sense. General question,

What would be an effective way of boosting the performance of this volume rendering code? I was thinking of working with memory locality to ensure that all points on the ray (being casted) are sequentially loaded into memory.

I have read about automatic use of registers (shared memory)… my goal would be to increase memory coalescing. MY hardware isn’t Nexus compatible(8800GTS & FX 570M Quadro), but I’d like to try to take advantage of basic compute capability 1.0 specifications.

Please let us know about improvements you were able to make. It would helpful.

I’d start with empty-space skipping. The course notes here are very good:

I have two questions regarding your code …

Ref: //updated to use texture for display instead of glDrawPixels.

Q1: Is this done just for the speed ? Is there any other implication ?

Ref: //changed to render from front-to-back rather than back-to-front

Q2: Is there a difference in speed of execution, final result or quality of the result ? Could both the ways used interchangeably?


Cool, I propose two answers

A1: It uses a faster and more hardware accelerated code path in the driver -> less CPU utilization for rendering.

A2: Fully saturated rays can do an early abort with this method (i.e. the voxels behind them don’t change the visual result anymore, so skip them alltogether). -> less wasted GPU cycles.

Hi all,

I have also 2 questions :shifty: :

-Concerning volume distorsion:

=>How could we correct the images distorsions with non-square volumes (with the scale parameters of the modelview matrix)?

-Concerning volume visualisation:

=> When we increase the ray step (a phenomen occurs :pirate: ) the volume seems to be crop. Does this fact is dued to the blending calculation?

Thanks for your help :teehee:

Thanks. Awesome.

Somebody can help me please :turned: ?

In the method of Compositing in Ray Casting, we accumulate the densities at a distance of ‘Ray Step Size’ along the ray casted into the volume. When we increase the Ray Step Size, we are missing on the intermediate densities thereby loosing details.

I think, for correcting the image distortions for non square volumes, the step where we copy memory to CUDA’s pitch size has to be changed, as what is being done presently would store improper data in the 3D texture. I am not sure though. Suggestions would be appreciated.

I think, for correcting the image distortions for non square volumes, the step where we copy memory to CUDA’s pitch size has to be changed, as what is being done presently would store improper data in the 3D texture. I am not sure though. Suggestions would be appreciated.

Hum I thought it was due to trilinear interpolation and normalization made in the 3D texture not the pitch size :turned: . Somebody have an explanation on how to represent correctly non square volumes?



What prohibits to keep an adaptive Step Size to keep sampling density along ray according to each step contribution to integral? If it is done right the number of samples may depends logarithmically on size of volume dimension. The regular sampling is such a waste.


In fact, in the one hand we’ve a volume with non cubic voxel and in the other we use a normalize texture which is supposed to use cubic texel. So, if we want to correct the distorsion one solution could be to “modify” the modelview matrix with a scale factor with the same projection mode (orthographic). An other solution could be to resample the volume to be square (the use of anisotropic filter). Somebody see an other technic?