I have had a problem using 3D textures and surfaces during the last 2 weeks and I thought it would be interesting to share it with the community in case someone else makes the same mistake as I did.
I knew that the texture and surface references had to be global variables but I did not know that the scope of these references was only the file in which they are declared. In my code, the cuda data structures (textures and surfaces) initialization was done in one file and the kernels were in a different file. Both files included a header file (with guard) in which a texture and surface references where declared. The code compiled without any warning but I did only get zero values reading from the textures and surfaces although I had created correctly created the cuda arrays, copied data into the arrays and did also bind the arrays to the corresponding textures and surfaces.
It was only today when I noticed that the only way to make it work was to have all the cuda code in one file and the texture references in the same file. Without any changes everything worked as expected. It is actually not everything in the same file but in files included by #include into one file.
I checked the programming guide again and I found a note about the file scope of these references but I think it should be clearly stated that trying to use a texture reference from a different file will result in a non working program without any warning. I would have also expected an error from the compiler (undefined reference). It was really difficult to understand why it my code did not work if the compiler did not complain about any undefined references.
I did also want to post this information to make sure that I now understand what the problem was and that i am not mistaken about the root of the non functioning code.
Did you check for errors at runtime after the kernel call? (cudaDeviceSynchronize() and then cudaGetLastError()). You should have gotten a runtime error message.
It is frustrating, I know, but the compiler really can’t tell that you are trying to use the same tex reference in more than one compilation unit. Think about it - put a texture reference in a header file and include it in two .cu files. In each compilation unit (a separate run of nvcc) the preprocessed file looks like you just defined a texture in each.
AFAIK,as the header file had a guard, the textures did not get declared several times. Moreover, the declaration being used twice would have to produced an error related to the same variable(texture) being declared multiple times
My point is that the behaviour I expected would be to either get a “multiple definition error” if I made I mistake at placing the texture definition or to work if the compiler does not complain about it.
I also tried placing the texture definition in a globals.cu file and using “extern” in the two files where the texture was used and got the same result. The compiler did not complain but it did not work.
I had no errors at runtime. I checked with the debugger and the memory allocation and binding operation returned cudaSuccess.
But what about checking for errors after the kernel call (after calling cudaDeviceSynchronize to ensure the kernel has been executed)? That is where you will get an unbound texture error as I said before.
And the inclusion guard is to prevent the same header from being included twice in a single compilation unit. In multiple compilation units, the header must be included once in each.
Yes, it is annoying that the compiler doesn’t give an error when you make the texture reference extern. I complained about that for CUDA version 0.8!
I have checked again and cudaDeviceSynchronize() returns cudaSuccess. cuda-memcheck does not complain either.
I was unsure if the one definition rule was valid in this case and that’s the reason I also tried to define the texture in a header file, but my first attempt was defining the texture in a .cu file and using extern references from the other files (It has been ages since I last used global variables)
I fully agree and that was the main reason for my post:
if extern references are not allowed the compiler should give an error
if you try, like it did, to include a texture definition in a header file, the compiler should give an error bout multiple definition (g++ does).
So, if I am not mistaken, the compiler has to receive all the cuda code in one file if you are using textures. You might split the code in several .cu files but they have to be included (by using include) into one single .cu file for compilation.
I am surprised that there are not many people complaining about this.
Weird. Create a minimal test case .cu file and post it here, I can submit it as a bug (or you can do so yourself). This should produce a runtime error.
There were a wave of complaints about 2 years ago :) Fermi’s L1 cache handles 90% of the cases where you need a cache, so my guess is a lot less people are using textures now than before.