weird behaviour when including the wrong way


I encountered strange errors today when I tried to use textures in my CUDA application. Basically, the application stopped at certain CUDA calls and took up 100% CPU usage.

The reason for this is most likely that I modified the Makefile from the template project from the SDK, because I was annoyed by the kernel only being compiled when I changed the normal .cu file. I added to the CUFILES. As a result, I had to declare my_kernel and a member, tex, as externs - see the attached code.

The problem seems to come from the extern texture, it works fine if I uncomment the cudaBindTextureToArray line.

But if I don’t, cuda gives me an “unspecified driver error”, the first call of cublasInit() yields a return value of 1 (CUBLAS_STATUS_NOT_INITIALIZED), and the second one causes the program to hang as described.

Other cuda (cudaMalloc, etc) calls produce similar effects.

The solution for me was to remove the kernel from the Makefile and to #include it in the .cu file - as in the template project.

I use version 1.0 on suse linux, with a geforce 8800 GTS.

Does anyone have an idea why this happens?

#include <stdio.h>

#include "cublas.h"

extern texture<float2,2> tex;

extern __global__ void my_kernel(char*);

int main(int argc, char** argv){

        cublasStatus s = cublasInit();

        printf("error %d\n",s);

        s = cublasInit();

        printf("error %d\n",s);

       cudaChannelFormatDesc dXdesc = cudaCreateChannelDesc<float2>();

        cudaArray* dXcolarr;

        cudaMallocArray(&dXcolarr, &dXdesc, 128, 20);

       cudaBindTextureToArray(colXtex, dXcolarr);

        my_kernel<<< dim3(2,2), dim3(2,2) >>>( "hello world");


texture<float2,2> colXtex;

__global__ void my_kernel(char* arg) {

        // do something


Textures don’t work as externs. I can’t explain your particular error, but when I tried to use textures as externs I got an error that the texture reference was not bound when calling the kernel. Somehow, the extern texture in the driver .cu file is “different” then the extern texture in the kernel .cu file.

If you want proper dependancy checking, CMake combined with FindCUDA.CMake work fine.

How also NOT to do it:

  • The same thing happens if I declare the texture in the kernel as static and doing everything else correctly. But only if you DON’T do any texture fetches. If you do texture fetches, the error goes away but the texture fetches will only return zero.

  • If you add the kernel to the Makefile CUFILES instead of using #include “” you get “multiple definitions of …” errors when compiling.
    You can get rid of the compiler’s complaints by writing a “my_kernel.h” file and #including that one.
    It compiles fine, but will lead to the same error.

It works now. What could I do wrong next? :/

The safest way is to only use one .cu file per project (or per independent job), and #include any code you want to reuse.
Even if interaction between .cu files is absolutely necessary. Do not make anything device extern. Copy/paste kernels, texture/constant declarations instead of reusing them. Share device pointers in host externs. Wrap kernel using host function and extern that.