[CUDA 4.0] : __CUDA_ARCH__ undefined in device code

didu31 · July 14, 2011, 10:30am

All is in the title.

I don’t know why but this code

__global__ void produitMatricielKernel(struct cudaPitchedPtr A, struct cudaPitchedPtr B, struct cudaPitchedPtr C)

{

const int& i = threadIdx.x;

const int& j = threadIdx.y;

const int& k = threadIdx.z;

#ifdef __CUDA_ARCH__

   #if (__CUDA_ARCH__ >= 200)

   printf("threadIdx.x = %d\nthreadIdx.y = %d\nthreadIdx.z %d\n", threadIdx.x, threadIdx.y, threadIdx.z);

   #else

   #error "CUDA compute capability < 2.0"

   #endif

   #error "__CUDA_ARCH__ undefined"

#endif

generates :

src\produitMatrice.cu|17|fatal error C1189: #error :  "__CUDA_ARCH__ undefined"|

Have you experienced same behaviour ?

By advance, thanks a lot.

tera · July 14, 2011, 10:36am

The code misses an [font=“Courier New”]#else[/font]. You probably intended to write

#ifdef __CUDA_ARCH__

   ...

 #else

   #error "__CUDA_ARCH__ undefined"

#endif

didu31 · July 14, 2011, 10:54am

Well seen. Thank you. That’s stupid from my part : if I read carefully my code…

Nevertheless, I have a new problem

Error: External calls are not supported

experienced by the MS C++ Compiler (VS Express 2010)

I’m seeing that printf should be inlined … in the same compilation unit … I have no power of action on that…

I’m seeking.

tera · July 14, 2011, 11:01am

You need to [font=“Courier New”]#include <stdio.h>[/font] at the top of the code, just like on the CPU. Since CUDA doesn’t usually require any includes, it’s easy to forget that one.

didu31 · July 14, 2011, 11:16am

I’ve solved this problem.
Now, I don’t know why output of printf in my kernel is not printed.
I will post if I give my tongue to the cat.

didu31 · July 14, 2011, 11:29am

std::cout<<"Call to kernel produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU)"<<std::endl;

produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU);

std::cout<<"Return from kernel produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU)"<<std::endl;

Outputs :

[i]

Call to kernel produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU)

Return from kernel produitMatricielKernel<<<1,dim3(m,n,p)>>>(A_on_GPU, B_on_GPU, C_on_GPU)

[/i]

as expected.

__global__ void produitMatricielKernel(struct cudaPitchedPtr A, struct cudaPitchedPtr B, struct cudaPitchedPtr C)

{

const int& i = threadIdx.x;

const int& j = threadIdx.y;

const int& k = threadIdx.z;

printf("threadIdx.x = %d\nthreadIdx.y = %d\nthreadIdx.z %d\n", threadIdx.x, threadIdx.y, threadIdx.z);

...

outputs nothing.

I have added #include<stdio.h> as you suggested as header of file .cu but nothing better.

tera · July 14, 2011, 11:37am

Output from kernels is only printed when one of the actions listed in appendix B.14.2 of the Programming Guide is performed:

[*]Kernel launch via <<<>>> or cuLaunchKernel() (at the start of the launch, and if the CUDA_LAUNCH_BLOCKING environment variable is set to 1, at the end of the launch as well),

[*]Synchronization via cudaDeviceSynchronize(), cuCtxSynchronize(), cudaStreamSynchronize(), cuStreamSynchronize(), cudaEventSynchronize(), or cuEventSynchronize(),

[]Memory copies via any blocking version of cudaMemcpy() or cuMemcpy*(),

[*]Module loading/unloading via cuModuleLoad() or cuModuleUnload(),

[*]Context destruction via cudaDeviceReset() or cuCtxDestroy().

didu31 · July 14, 2011, 11:43am

Yes, I’ve seen and I have added cudaDeviceSynchronize() after the kernel’s call.

And seems remaining the same.

didu31 · July 14, 2011, 11:47am

I’ve commented the rest of the kernel code, and kernel’s output works, now.

I knew that I have a bug. It seems output depends of the correct completion of the kernel.

I don’t know if exceptions exist in CUDA. It seems not but the code seems aborting and output-buffer not flushed.

It’s by commiting errors that I learn. External Image

didu31 · July 14, 2011, 12:42pm

Thanks a lot, tera !
External Image External Image