Writing to 3D texture

inthecrossfire · November 9, 2008, 4:45pm

As I understand it, you can’t write directly to a texture (cudaArray). The way it is done is to write to another part of memory and copy it over to the texture. I can get this to work easily in 2D but am having some issues getting it do work in 3D. My code is below:

texture<uchar, 3, cudaReadModeNormalizedFloat> tex;

uchar *d_writeable = 0;

cudaArray *d_texArray = 0;

size_t pitch = 0;

int fieldSize = 64;

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<uchar>();

CUDA_SAFE_CALL( cudaMalloc3DArray(&d_texArray, &channelDesc, volumeSize) );

CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_writeable, &pitch, sizeof(uchar)*fieldSize, fieldSize*fieldSize) );

CUDA_SAFE_CALL( cudaMemset2D(d_writeable, pitch, 0, fieldSize, fieldSize*fieldSize) );

CUDA_SAFE_CALL( cudaMemcpy2DToArray(d_texArray, 0, 0, d_writeable, pitch, sizeof(uchar)*fieldSize, fieldSize*fieldSize, cudaMemcpyDeviceToDevice) );

CUDA_SAFE_CALL(cudaBindTextureToArray(tex, d_texArray, channelDesc));

I’m trying to set up a 3D cuda array and bind it to a texture. Plus I’m trying to set up another part of memory that I can write to and then copy it across.

I have problems with the cudaMemcpy2DToArray (gives an invalid argument error at runtime). If I change the height argument from fieldSize*fieldSize to just fieldSize, it works fine (but obviously only copies part of the array). What am I missing? Thanks in advance.

Ailleur · November 9, 2008, 6:00pm

As the name implied, you shouldnt be using cudaMemcpy2DToArray to write to a 3d array.

Heres a quick cut and paste of the code i use

texture<short, 3, cudaReadModeNormalizedFloat> tdataTex;

cudaArray *tArray;

cudaChannelFormatDesc tfloatTex = cudaCreateChannelDesc<short>();;

	const cudaExtent tvolumeSize = make_cudaExtent(pdose->terma->x_dim, pdose->terma->y_dim, pdose->terma->z_dim);

	CUDA_SAFE_CALL( cudaMalloc3DArray(&tArray, &tfloatTex, tvolumeSize) );

	cudaMemcpy3DParms tcopyParams = {0};

	tcopyParams.srcPtr   = make_cudaPitchedPtr((void*)pdose->terma->data, tvolumeSize.width*sizeof(short), tvolumeSize.width, tvolumeSize.height);

	tcopyParams.dstArray = tArray;

	tcopyParams.extent   = tvolumeSize;

	tcopyParams.kind	 = cudaMemcpyHostToDevice;

	CUDA_SAFE_CALL( cudaMemcpy3D(&tcopyParams) );

	CUDA_SAFE_CALL(cudaBindTextureToArray(tdataTex, tArray, tfloatTex));

Note that this is for a texture of shorts, so dont forget to adapt it.

inthecrossfire · November 9, 2008, 6:39pm

Thanks for your reply. But you appear to be doing a HostToDevice copy. I’m trying to use a DeviceToDevice copy so that I can copy from a writeable part of memory onto the cudaArray (bound to a texture). The CUDA reference recommends using cudaMallocPitch for 3D arrays and I’ve been trying cudaMemcpyToArray and cudaMemcpy2DToArray (using cudaMemcpyDeviceToDevice) to copy the data to the cudaArray. Thanks.

inthecrossfire · November 9, 2008, 7:06pm

I realised that cudaMemcpy3D as you pointed out is probably what I’m looking for. So I adapted your code to:

cudaExtent volumeSize = make_cudaExtent(fieldSize, fieldSize, fieldSize);

cudaMemcpy3DParms tcopyParams = {0};

tcopyParams.srcPtr   = make_cudaPitchedPtr((void*)d_writeable, volumeSize.width*sizeof(uchar), volumeSize.width, volumeSize.height);

tcopyParams.dstArray = d_texArray;

tcopyParams.extent   = volumeSize;

tcopyParams.kind	 = cudaMemcpyDeviceToDevice;

CUDA_SAFE_CALL( cudaMemcpy3D(&tcopyParams) );

However the cudaMemcpy3D call fails again with an invalid argument error. Any ideas? Thanks.

Ailleur · November 9, 2008, 7:25pm

Again, copied straight from some code im using so it needs adapting.

static void prepareCudaTexture(float* d_source, cudaArray *a_Data,cudaChannelFormatDesc floatTex, const cudaExtent volumeSize)

{

	cudaMemcpy3DParms copyParams = {0};

	copyParams.srcPtr   = make_cudaPitchedPtr((void*)d_source, volumeSize.width*sizeof(float), volumeSize.width, volumeSize.height);

	copyParams.dstArray = a_Data;

	copyParams.extent   = volumeSize;

	copyParams.kind	 = cudaMemcpyDeviceToDevice;

	CUDA_SAFE_CALL( cudaMemcpy3D(&copyParams) );

	densityData.normalized = false;					  

	densityData.filterMode = cudaFilterModePoint;	  

	densityData.addressMode[0] = cudaAddressModeWrap;   // wrap texture coordinates

	densityData.addressMode[1] = cudaAddressModeWrap;

	densityData.addressMode[2] = cudaAddressModeWrap;

	// bind array to 3D texture

	CUDA_SAFE_CALL(cudaBindTextureToArray(densityData, a_Data, floatTex));

}

cudaChannelFormatDesc floatTex = cudaCreateChannelDesc<float>();;

	const cudaExtent volumeSize = make_cudaExtent(map->x_dim, map->y_dim, map->z_dim);

	CUDA_SAFE_CALL( cudaMalloc3DArray(&density_arr, &floatTex, volumeSize) );

	prepareCudaTexture(d_density,density_arr,floatTex,volumeSize);

where d_density is a pointer to global memory. It has been populated by another kernel that executed before this call.

So basicaly the kernel wrote its output to a 1d float array and im morphing it to a 3d texture.

Hope this helps

inthecrossfire · November 10, 2008, 12:51am

Thanks for all your help. I literally copied your code in and added in the necessary variables as needed but still get an invalid argument when cudaMemcpy3D is called. What size of 3D array are you using? I’m attempting a 64x64x64 grid. I also tried running it in emulation but that gave the same error as well. Any ideas? Thanks.

inthecrossfire · November 11, 2008, 1:19am

I managed to get it to work. Thought I’d post my code in case someone in the future is looking for the same thing. So here is a way of setting up writeable device memory and then copying it to a cudaArray. The key component is using the poorly documented cudaPitchedPtr. So code:

// Size of the field (which is a 3D cube)

int fieldSize = 8;

// Global variables for the two memory locations

cudaPitchedPtr d_writeable;

cudaArray *d_texArray = 0;

// Create the texture array (which will be read only)

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<uchar>();

cudaExtent volumeSize = make_cudaExtent(fieldSize, fieldSize, fieldSize);

CUDA_SAFE_CALL( cudaMalloc3DArray(&d_texArray, &channelDesc, volumeSize) );

// Create the writeable part of memory using a cudaPitchedPtr

cudaExtent pitchedVolSize = make_cudaExtent(fieldSize*sizeof(uchar), fieldSize, fieldSize);

CUDA_SAFE_CALL(cudaMalloc3D(&d_writeable, pitchedVolSize));

// Setup the copy

cudaMemcpy3DParms copyParams = {0};

copyParams.srcPtr = d_writeable;

copyParams.dstArray = d_texArray;

copyParams.kind = cudaMemcpyDeviceToDevice;

copyParams.extent = volumeSize;

// Execute the copy

CUDA_SAFE_CALL( cudaMemcpy3D(&copyParams) );

As a side note, the cudaPitchedPtr is defined as:

struct cudaPitchedPtr

{

  void   *ptr;

  size_t  pitch;

  size_t  xsize;

  size_t  ysize;

};

So inside a kernel it could be accessed like:

__global__

void someFunction(cudaPitchedPtr cpp)

{

	int y = threadIdx.x;

	int x = blockIdx.x;

	int index = cpp.pitch*y + x;

	((uchar *)cpp.ptr)[index] = 10;

}

Obviously the above code would only access the first layer of the cube (where z=0) but it gives the idea.

Anyway, thanks for your help again.

carlos_mex · May 18, 2010, 2:38am

Hi

i’m having exactly the same problem as you, and found your method very useful. However, i can not get an index to access all of the members of the 3d structure, not necessarily a cube. can you help me out on the index?
thank you in advence,
cheers

Qwazyteb · November 18, 2010, 4:51pm

I managed to get it to work. Thought I’d post my code in case someone in the future is looking for the same thing. So here is a way of setting up writeable device memory and then copying it to a cudaArray. The key component is using the poorly documented cudaPitchedPtr. So code:

// Size of the field (which is a 3D cube)

int fieldSize = 8;

// Global variables for the two memory locations

cudaPitchedPtr d_writeable;

cudaArray *d_texArray = 0;

// Create the texture array (which will be read only)

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<uchar>();

cudaExtent volumeSize = make_cudaExtent(fieldSize, fieldSize, fieldSize);

CUDA_SAFE_CALL( cudaMalloc3DArray(&d_texArray, &channelDesc, volumeSize) );

// Create the writeable part of memory using a cudaPitchedPtr

cudaExtent pitchedVolSize = make_cudaExtent(fieldSize*sizeof(uchar), fieldSize, fieldSize);

CUDA_SAFE_CALL(cudaMalloc3D(&d_writeable, pitchedVolSize));

// Setup the copy

cudaMemcpy3DParms copyParams = {0};

copyParams.srcPtr = d_writeable;

copyParams.dstArray = d_texArray;

copyParams.kind = cudaMemcpyDeviceToDevice;

copyParams.extent = volumeSize;

// Execute the copy

CUDA_SAFE_CALL( cudaMemcpy3D(&copyParams) );

As a side note, the cudaPitchedPtr is defined as:

struct cudaPitchedPtr

{

  void   *ptr;

  size_t  pitch;

  size_t  xsize;

  size_t  ysize;

};

So inside a kernel it could be accessed like:

__global__

void someFunction(cudaPitchedPtr cpp)

{

	int y = threadIdx.x;

	int x = blockIdx.x;

	int index = cpp.pitch*y + x;

	((uchar *)cpp.ptr)[index] = 10;

}

Obviously the above code would only access the first layer of the cube (where z=0) but it gives the idea.

Anyway, thanks for your help again.

Thank you, I had the exact same problem and your code helped me make it work.

The important detail I was missing is that you need two different extents.

The extent

[] for cudaMalloc3D is (wsizeof(datatype), h, d)

[*] for cudaMalloc3DArray and cudaMemcpy3DParms in cudaMemcpy3d is (w, h, d) - no sizeof(datatype)

Qwazyteb · November 18, 2010, 4:51pm

I managed to get it to work. Thought I’d post my code in case someone in the future is looking for the same thing. So here is a way of setting up writeable device memory and then copying it to a cudaArray. The key component is using the poorly documented cudaPitchedPtr. So code:

// Size of the field (which is a 3D cube)

int fieldSize = 8;

// Global variables for the two memory locations

cudaPitchedPtr d_writeable;

cudaArray *d_texArray = 0;

// Create the texture array (which will be read only)

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<uchar>();

cudaExtent volumeSize = make_cudaExtent(fieldSize, fieldSize, fieldSize);

CUDA_SAFE_CALL( cudaMalloc3DArray(&d_texArray, &channelDesc, volumeSize) );

// Create the writeable part of memory using a cudaPitchedPtr

cudaExtent pitchedVolSize = make_cudaExtent(fieldSize*sizeof(uchar), fieldSize, fieldSize);

CUDA_SAFE_CALL(cudaMalloc3D(&d_writeable, pitchedVolSize));

// Setup the copy

cudaMemcpy3DParms copyParams = {0};

copyParams.srcPtr = d_writeable;

copyParams.dstArray = d_texArray;

copyParams.kind = cudaMemcpyDeviceToDevice;

copyParams.extent = volumeSize;

// Execute the copy

CUDA_SAFE_CALL( cudaMemcpy3D(&copyParams) );

As a side note, the cudaPitchedPtr is defined as:

struct cudaPitchedPtr

{

  void   *ptr;

  size_t  pitch;

  size_t  xsize;

  size_t  ysize;

};

So inside a kernel it could be accessed like:

__global__

void someFunction(cudaPitchedPtr cpp)

{

	int y = threadIdx.x;

	int x = blockIdx.x;

	int index = cpp.pitch*y + x;

	((uchar *)cpp.ptr)[index] = 10;

}

Obviously the above code would only access the first layer of the cube (where z=0) but it gives the idea.

Anyway, thanks for your help again.

Thank you, I had the exact same problem and your code helped me make it work.

The important detail I was missing is that you need two different extents.

The extent

[] for cudaMalloc3D is (wsizeof(datatype), h, d)

[*] for cudaMalloc3DArray and cudaMemcpy3DParms in cudaMemcpy3d is (w, h, d) - no sizeof(datatype)

physimrookie · December 28, 2010, 11:37am

I managed to get it to work. Thought I’d post my code in case someone in the future is looking for the same thing. So here is a way of setting up writeable device memory and then copying it to a cudaArray. The key component is using the poorly documented cudaPitchedPtr. So code:

// Size of the field (which is a 3D cube)

int fieldSize = 8;

// Global variables for the two memory locations

cudaPitchedPtr d_writeable;

cudaArray *d_texArray = 0;

// Create the texture array (which will be read only)

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<uchar>();

cudaExtent volumeSize = make_cudaExtent(fieldSize, fieldSize, fieldSize);

CUDA_SAFE_CALL( cudaMalloc3DArray(&d_texArray, &channelDesc, volumeSize) );

// Create the writeable part of memory using a cudaPitchedPtr

cudaExtent pitchedVolSize = make_cudaExtent(fieldSize*sizeof(uchar), fieldSize, fieldSize);

CUDA_SAFE_CALL(cudaMalloc3D(&d_writeable, pitchedVolSize));

// Setup the copy

cudaMemcpy3DParms copyParams = {0};

copyParams.srcPtr = d_writeable;

copyParams.dstArray = d_texArray;

copyParams.kind = cudaMemcpyDeviceToDevice;

copyParams.extent = volumeSize;

// Execute the copy

CUDA_SAFE_CALL( cudaMemcpy3D(&copyParams) );

As a side note, the cudaPitchedPtr is defined as:

struct cudaPitchedPtr

{

  void   *ptr;

  size_t  pitch;

  size_t  xsize;

  size_t  ysize;

};

So inside a kernel it could be accessed like:

__global__

void someFunction(cudaPitchedPtr cpp)

{

	int y = threadIdx.x;

	int x = blockIdx.x;

	int index = cpp.pitch*y + x;

	((uchar *)cpp.ptr)[index] = 10;

}

Obviously the above code would only access the first layer of the cube (where z=0) but it gives the idea.

Anyway, thanks for your help again.

could you please help me out how to write to different layers?

thanks :)