char texture misread PyCuda

blak_ndecker · July 12, 2018, 2:15pm

Dear all,

I have a collection of 3D array filters that work fine on 4 byte data like ints and floats but fail on char data. The arrays are accessed on the GPU through a texture<unsigned char, cudaTextureType3D, cudaReadModeElementType> with tex3D. An array of constant values e.g. 1 will be read as another value, 128 in this case. The data are sent to the gpu with pycuda.driver.Memcpy3D.

some corrrespondencies are [input, read]
[0, 0],
[1, 128], (+128)
[2, 192],
[3, 161],
[4, 225], (+64)
[5, 145],
[6, 177], (+32)
[7, 209], (+32)
[8, 241], (+32)
…
[153, 153],
[154, 154],
[155, 155],
[156, 156],

you can notice some pattern, only for the highest values the read texture value is equal to the input value.
What could cause such behaviour?

blak_ndecker · July 24, 2018, 1:20pm

an example of the problem:

import pycuda.autoinit
from pycuda.compiler import SourceModule
import pycuda.driver as drv
import numpy as np

include_dirs = ["c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0",
                "c:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\common"]

source_module = r'''
#include <stdio.h>
 
texture<unsigned char, cudaTextureType3D, cudaReadModeElementType> text_1;
texture<unsigned char, cudaTextureType3D, cudaReadModeNormalizedFloat> text_2;

extern "C"
{
    __global__ void charTest(){
        printf("%f\n", (float) tex3D(text_1, 0, 0, 0));  // expect 3
        printf("%f\n", (float) tex3D(text_2, 0, 0, 0));  // expect 3/255 == 0.011764...
    }
}

'''

mod = SourceModule(source_module, include_dirs=include_dirs, no_extern_c=1)

charTest = mod.get_function('charTest')
text_1 = mod.get_texref('text_1')
text_2 = mod.get_texref('text_2')

block = (1,)*3
grid = (1,)*3
d,h,w = shape = (1,)*3
datatype = np.uint8

data = drv.pagelocked_empty(shape, datatype)

data[0,0,0] = 3

descr = drv.ArrayDescriptor3D()
descr.width, descr.height, descr.depth = shape
descr.format = drv.dtype_to_array_format(datatype)
descr.num_channels = 1
descr.flags = drv.array3d_flags.SURFACE_LDST
array = drv.Array(descr)
descr = array.get_descriptor_3d()

stream = drv.Stream()

copy = drv.Memcpy3D()
copy.set_src_host(data)
copy.set_dst_array(array)
copy.width_in_bytes = copy.src_pitch = data.strides[1]
copy.src_height = copy.height = h
copy.depth = d
copy(stream)

text_1.set_array(array)
text_2.set_array(array)
charTest.prepare('')
charTest.prepared_async_call(grid, block, stream)

at line 18 I expect outcome 3.0 but get 193.0.
cudaReadNormalizedFloat gives the correct outcome, but it’s inconvenient as I need to multiply with 255 and convert to char explicitely to be able to reuse the uint32 code

Robert_Crovella · July 24, 2018, 4:11pm

Add this before line 58:

text_1.set_flags(drv.TRSF_READ_AS_INTEGER)

blak_ndecker · August 7, 2018, 12:27pm

Thank you! Where could I have found this information?

Robert_Crovella · August 8, 2018, 12:32am

If you are asking “where is this information located?” just google TRSF_READ_AS_INTEGER Then read the top 5 or 10 hits.

If you are asking “how would I have found this information had I not known what to look for?” I don’t know if I have a good answer to that. I didn’t find it by simply reading pycuda documentation or studying pycuda codes.

For starters, if you want to solve problems like this, it probably helps to have a rock-solid understanding of how textures work in CUDA C++, and what are all the knobs for. That is documented to a considerable degree in the CUDA C++ programming guide.

[url]https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html[/url]

In addition, there are various (CUDA C++) sample codes provided to demonstrate texturing. None of these are in python of course. So being an expert in CUDA C++, as well as having worked with some texture codes in detail would probably help but certainly doesn’t carry you over the finish line.

pycuda is an adapter/layer on top of the CUDA driver API, so as to provide effectively a python host wrapper for GPU activity (where such activity is still launched in the form of CUDA C++ kernels). pycuda was not written by NVIDIA. It’s not officially part of the CUDA toolkit, nor is it officially supported or maintained by anyone at NVIDIA. So google is your friend, and you have to have a thought process that leads to a solution, of course.

I’m pretty knowledgeable in CUDA C++. Less so with textures. Less so with pycuda.

My thought process was this:

I know this is possible in CUDA C++
I’ll bet dollars to donuts it’s doable in pycuda. (pycuda is a high-quality implementation, in my opinion)
google google google google, huh. Nothing.
OK, lets start playing with every imaginable documented knob in pycuda that pertains to texturing.
after trying 5 or 6 things, “voila!”

Step 4 is where the CUDA C++ texturing knowledge helps. Otherwise you’re stabbing in the dark.

It’s not pretty. It requires a functional knowledge of pycuda, textures, and CUDA. A working test setup. And persistence. And perhaps luck.

But that’s programming, in my view.

PS, had you not decided to come back and post a complete example, I wouldn’t have bothered with this.