Using 2d texture fetchs without binding to array Can it be done?

papag62 · February 20, 2008, 8:59pm

Ok I am removing distortion from images and then rectifying a stereo pair. I have two lookup maps using textures, 1 for the distortion map and one for the rectification map.

I first run a undistort kernel that uses the distortion map texture to create an undistorted image. I then want to bind that result to a new texture to use in a rectification kernel that will use the rectification map texture to fetch data in the undistort result texture. (I hope this makes sense)

I have it working but after ther undistor kernel I am doing the following on the host:

CUDA_SAFE_CALL( cudaMemcpyToArray( cu_arrayLu, 0, 0, dResultDatafL, RES_H*RES_V*sizeof(float4), cudaMemcpyDeviceToDevice));

    CUDA_SAFE_CALL( cudaBindTextureToArray( tex_unImageL, cu_arrayLu, channelDesc32));

My question is whether I have to do the cudaMemcpyToArray() or can I just bind the undistort result memory directly to a texture? I tried just doing a cudaBindTexture() but I get black images back. I assume this is because I am doing 2D texture fetches on linear data.

I want to avoid the devicetodevice memcpy as it is a bottle neck (reduces speed by ~10Hz).

MisterAnderson42 · February 20, 2008, 10:01pm

I don’t think there is a way to do 2D tex cache lookups using just device memory.

As a side note, cudaMemcpyToArray seems to have a performane bug in CUDA 1.1. Search the forums for details: IIRC, it only manaages to sustain ~8 GiB/s when it should be capable of 70.

papag62 · February 21, 2008, 4:08pm

Your right, the programming guide states that you can only do 1D tex fetching using device memory.

I am seeing ~1.5GB/s, but I am copying from cudaMalloc() allocated memory to a 2D array. After some reading I understand that I should be allocating with cudaMallocPitch() since I will always copy to a 2D array. If I am using cudaMallocPitch() to allocate a 1024x768 image will padding be added to my allocation for coelescing even though 1024x768 is a multiple of 32? And if no padding is added will this result in faster devicetodevice transfers to a 2D cuda array?

MisterAnderson42 · February 21, 2008, 4:35pm

Padding should only affect memory coalescing when reading directly from device memory. I’ve never noticed a difference when using padded memory and cudaMemcpyToArray.

This is the post I was referring to about the performance problems in cudaMemcpyToArray. [url=“http://forums.nvidia.com/index.php?showtopic=56025&hl=cudamemcpytoarray”]The Official NVIDIA Forums | NVIDIA
NVIDIA has acknowledged the issue, and hopefully it will be fixed in the next CUDA release but they won’t comment on that of course.

I know back in CUDA 0.8 cudaMemcpyToArray reached 70 GiB/s, so the latest version’s problems must be a software of some kind. I’m not sure if 1.0 has the problem or not.

papag62 · February 21, 2008, 5:03pm

D.5.3 of the programming manual states:

So are you saying that you don’t see a difference when doing 2D copies to arrays with memory allocated with cudaMalloc() vs. cudaMallocPitch()?

Thanks, for the link on the bandwidth issue. I’ll be sure to be on the look out for any updates.

MisterAnderson42 · February 21, 2008, 10:19pm

Hmm, you have a point. I didn’t see any issues in the bandwidth test I did in that post. But I was already allocating a memory buffer with a multiple of 16 pitch, so using cudaMallocPitch would not have made a difference.

I haven’t used any arrays in my application since CUDA 0.8 (switched to tex1Dfetch and device memory) so I don’t have any recent experience with them other than that benchmark. You could modify the benchmark to use a width of 513 or something and see if that changes the performance for the worse as the manual section you quoted suggests.

Topic		Replies	Views
cudaBindTexture2D vs cudaBindTextureToArray CUDA Programming and Performance	4	11463	October 24, 2009
2D Array CUDA Programming and Performance	16	77323	January 20, 2012
Using cudaBindTextureToArray() CUDA Programming and Performance	3	4295	September 22, 2013
Avoiding a device write using textures and arrays. CUDA Programming and Performance	3	2843	August 7, 2008
performance of cudaBindTextureToArray CUDA Programming and Performance	1	7878	July 5, 2007
Using Textures CUDA Programming and Performance	10	21973	March 29, 2007
texture with tex1Dfetch bind the texture, and try to get it out. CUDA Programming and Performance	5	3213	July 16, 2009
cudaBindTexture2D problem CUDA Programming and Performance	3	11827	August 3, 2010
Question about texture memory CUDA Programming and Performance	3	4509	May 27, 2009
cudaBindTexture2D vs cudaBindTextureToArray CUDA Programming and Performance	0	308	July 23, 2020

Using 2d texture fetchs without binding to array Can it be done?

Related topics