How to create cudaTextureObject for ktx2 data in Block Compressed (BC7) format?

I’m trying to read the ktx2 file compressed by Nvidia Texture Tools Exporter(BC7) with cuda toolkit 11.5 and KTX-Software. But cudaCreateTextureObject return an error. Does anayone know how to create it properly? Thanks!

I may be running into the same issue. I’ve tried with various block compression formats, but all seem to fail the same way.

I’m creating the underlying resource using a CUDA array:

cudaChannelFormatDesc uint4_channel_desc = cudaCreateChannelDesc(32, 32, 32, 32, cudaChannelFormatKindUnsigned);
cudaMallocArray(&tex_array, &uint4_channel_desc, bw, bh);
cudaMemcpy2DToArray(tex_array, 0, 0, src, bw * sizeof(uint4), bw, bh, cudaMemcpyHostToDevice);

And invoke cudaCreateTextureObject as follows:

cudaResourceDesc tex_res = {};
tex_res.resType = cudaResourceTypeArray;
tex_res.res.array.array = tex_array;

cudaTextureDesc tex_desc = {};

cudaResourceViewDesc tex_view = {};
tex_view.format = cudaResViewFormatUnsignedBlockCompressed7;
tex_view.width = w; // bw == 4*w
tex_view.height = h; // bh == 4*h
tex_view.depth = 1;

cudaCreateTextureObject(&tex_obj, &tex_res, &tex_desc, &tex_view);

After cudaCreateTextureObject I get a generic “invalid argument” error, but there’s no indication what argument is invalid.

I’ve also tried to create the texture object from a cudaResourceTypePitch2D cudaResourceDesc as follows:

void* tex_ptr = nullptr;
size_t tex_pitch = 0;
cudaMallocPitch(&tex_ptr, &tex_pitch, bw * sizeof(uint4), bh);
cudaMemcpy2D(tex_ptr, tex_pitch, src, bw * sizeof(uint4), bw * sizeof(uint4), bw, cudaMemcpyHostToDevice);

cudaResourceDesc tex_res = {};
tex_res.resType = cudaResourceTypePitch2D;
tex_res.res.pitch2D.desc = uint4_channel_desc;
tex_res.res.pitch2D.devPtr = tex_ptr;
tex_res.res.pitch2D.height = bh;
tex_res.res.pitch2D.width = bw;
tex_res.res.pitch2D.pitchInBytes = bw * sizeof(uint4);

This succeeds, but then kernels sampling from the resulting texture seem to only read back garbage. Any ideas?

Actually, it’s not garbage, but the result of sampling the uint4 data directly. It appears as if the cudaResourceViewDesc argument was ignored. In fact, passing null to cudaCreateTextureObject for that argument produces the same results.

Hi, I’ve tried the following code, and read out the mipmap data.

    cudaChannelFormatDesc channel_desc=cudaCreateChannelDesc<cudaChannelFormatKindUnsignedBlockCompressed7>();
	cudaArray_t mCudaArray{ nullptr };

	cudaError_t res=cudaMallocArray(&mCudaArray,&channel_desc, baseWidth, baseHeight);
	res = cudaMemcpyToArray(mCudaArray, 0, 0, image, imagesize, cudaMemcpyHostToDevice);

	cudaResourceDesc res_desc = {};
	res_desc.resType = cudaResourceTypeArray;
	res_desc.res.array.array = mCudaArray;

	cudaTextureDesc tex_desc = {};
	tex_desc.addressMode[0] = cudaAddressModeWrap;
	tex_desc.addressMode[1] = cudaAddressModeWrap;
	tex_desc.filterMode = cudaFilterModePoint;
	tex_desc.readMode = cudaReadModeNormalizedFloat;
	tex_desc.borderColor[0] = 1.0f;
	tex_desc.sRGB = 0;
	tex_desc.normalizedCoords = 1;

	cudaTextureObject_t mTextureObject;
	res = cudaCreateTextureObject(&mTextureObject, &res_desc, &tex_desc, nullptr);

Ah, interesting. Looks like cudaChannelFormatKindUnsignedBlockCompressed7 is a new addition to the CUDA 11 API. That seems to work, but it’s not what I wanted, which is to reinterpret uint4 data generated by a previous kernel as a BC7 texture.

Luckily I stumbled upon the solution, the only problem was that the depth was set to 1, but should have been zero:

tex_view.depth = 0;

Better debug/error messages would make these dumb issues much easier to figure out!