[CUDA 4.0] : __CUDA_ARCH__ undefined in device code

All is in the title.

I don’t know why but this code

__global__ void produitMatricielKernel(struct cudaPitchedPtr A, struct cudaPitchedPtr B, struct cudaPitchedPtr C)

{

const int& i = threadIdx.x;

const int& j = threadIdx.y;

const int& k = threadIdx.z;

#ifdef __CUDA_ARCH__

   #if (__CUDA_ARCH__ >= 200)

   printf("threadIdx.x = %d\nthreadIdx.y = %d\nthreadIdx.z %d\n", threadIdx.x, threadIdx.y, threadIdx.z);

   #else

   #error "CUDA compute capability < 2.0"

   #endif

   #error "__CUDA_ARCH__ undefined"

#endif

generates :

src\produitMatrice.cu|17|fatal error C1189: #error :  "__CUDA_ARCH__ undefined"|

Have you experienced same behaviour ?

By advance, thanks a lot.

The code misses an [font=“Courier New”]#else[/font]. You probably intended to write

#ifdef __CUDA_ARCH__

   ...

 #else

   #error "__CUDA_ARCH__ undefined"

#endif

Well seen. Thank you. That’s stupid from my part : if I read carefully my code…

Nevertheless, I have a new problem

Error: External calls are not supported

experienced by the MS C++ Compiler (VS Express 2010)

I’m seeing that printf should be inlined … in the same compilation unit … I have no power of action on that…

I’m seeking.

You need to [font=“Courier New”]#include <stdio.h>[/font] at the top of the code, just like on the CPU. Since CUDA doesn’t usually require any includes, it’s easy to forget that one.

I’ve solved this problem.
Now, I don’t know why output of printf in my kernel is not printed.
I will post if I give my tongue to the cat.

std::cout<<"Call to kernel produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU)"<<std::endl;

produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU);

std::cout<<"Return from kernel produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU)"<<std::endl;

Outputs :

[i]

Call to kernel produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU)

Return from kernel produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU)

[/i]

as expected.

__global__ void produitMatricielKernel(struct cudaPitchedPtr A, struct cudaPitchedPtr B, struct cudaPitchedPtr C)

{

const int& i = threadIdx.x;

const int& j = threadIdx.y;

const int& k = threadIdx.z;

printf("threadIdx.x = %d\nthreadIdx.y = %d\nthreadIdx.z %d\n", threadIdx.x, threadIdx.y, threadIdx.z);

...

outputs nothing.

I have added #include<stdio.h> as you suggested as header of file .cu but nothing better.

Output from kernels is only printed when one of the actions listed in appendix B.14.2 of the Programming Guide is performed:

    Kernel launch via <<<>>> or cuLaunchKernel() (at the start of the launch, and if the CUDA_LAUNCH_BLOCKING environment variable is set to 1, at the end of the launch as well),

    Synchronization via cudaDeviceSynchronize(), cuCtxSynchronize(), cudaStreamSynchronize(), cuStreamSynchronize(), cudaEventSynchronize(), or cuEventSynchronize(),

    []Memory copies via any blocking version of cudaMemcpy() or cuMemcpy*(),

    Module loading/unloading via cuModuleLoad() or cuModuleUnload(),

    Context destruction via cudaDeviceReset() or cuCtxDestroy().

Yes, I’ve seen and I have added cudaDeviceSynchronize() after the kernel’s call.

And seems remaining the same.

I’ve commented the rest of the kernel code, and kernel’s output works, now.

I knew that I have a bug. It seems output depends of the correct completion of the kernel.

I don’t know if exceptions exist in CUDA. It seems not but the code seems aborting and output-buffer not flushed.

It’s by commiting errors that I learn. :rolleyes:

Thanks a lot, tera !
:thanks: :thumbup: