How to get GPU kernels using global texture references thread-safe (for multiple CPU threads using a

We are having troubles when we are running certain GPU functions (e.g. a point tracker where we use texture references on multiple places) parallel in multiple CPU-threads (all CPU threads use the same GPU).

It seems that the ‘texture references’, which afaik are sort of ‘global device variables’ are the problem, as these are the only ‘global’ variables we have (note ‘constant memory’ might be probably also a issue but we will focus for now on the texture references problem). We use mainly texture references to 2D images (pitched-linear memory) as we are in image processing field.

How can we rewrite the kernels, which use the texture references, so that they are CPU-thread-safe ? Is it possible at all ? Note that in our framework we plan to have exactly 4 CPU threads for each GPU (each CPU-thread is a GPU-Worker-Thread which gets some ‘GPU job’ issued which he then executed).

This question seems to be related to the problem of ‘arrays of texture references’, I don’t know if an array of texture reference is possible now with newer Cuda Toolkit / newer GPU architectures.
See forum postings at

Or just search the cuda forum for ‘texture references array’ and notice that seems to be really an hot topic :-)

In one of this postings a function ‘cuTexRefCreate’ was mentioned, is that the way to go ? I suppose it can be used also in the cuda runtime api.

Any help on this (i would say very important) question, especially from the guys at NVIDIA, would be appreciated. Note any possible strategies should work also on Fermi architecture GPUs.

An related question is also whether this multi-threading issue is also a problem for latest Kepler architecture where pointers which are of type ‘const __restrict’ may get mapped automatically to a texture object.