Necessary includes for atomicAdd?

BugMeNot · July 14, 2007, 4:20pm

Hey there,

Turns out I am too stupid to use atomicAdd. I can’t get my kernel to compile as soon as i add a line with a call to “atomicAdd”.

simple_kernel.cu", line 44: error: identifier "atomicAdd" is undefined

This is what I get. I’ve got a 8600 GT. The kernel has the following line at the very top:

#include "device_functions.h"

Is there anything else I need to include or take care of? Any compiler flags? I also tried using the absolute path “C:\CUDA\include\device_functions.h” for the include but it still wouldnt work.

BugMeNot · July 14, 2007, 4:43pm

Once again I should’ve been thinking a bit harder before posting. I got this problem resolved by looking at the histogram64 sample in the SDK. Its just another command line argument that is necessary to be added.

gatoatigrado · July 16, 2008, 3:23am

next time you solve something please actually post the answer: nvcc flags –gpu-name compute_11 as on man nvcc.

Dark_Helmet · December 19, 2010, 9:57pm

On CUDA 2.3, it’s changed to “-arch compute_11” to include global memory atomics, and “-arch compute_12” for global and shared memory atomics.

jimpjimp · June 29, 2011, 10:48am

Anyone know where should I include the command “-arch compute_11”? Thanks.

jimpjimp · June 30, 2011, 7:58am

Below is my cuda c code, and I got the error: error: identifier “atomicAdd” is undefined

anyone can advise? Many thanks.

include “device_functions.h”

include <cuda_runtime.h>

include “cuda.h”

include “C:\Users\a0034508\Desktop\Research\CUDA in VC\common\book.h”

include “C:\Users\a0034508\Desktop\Research\CUDA in VC\common\cpu_anim.h”

define GPU_ARCH 10

define GPU_arch_sm_10 10

define GPU_arch_sm_11 11

define GPU_arch_sm_12 12

define GPU_arch_sm_13 13

define architecture(s) GPU_arch_sm##s##_

define SIZE (10010241024)

global void histo_kernel( unsigned char *buffer,long size,unsigned int *histo )

{

int i = threadIdx.x + blockIdx.x * blockDim.x;

int stride = blockDim.x * gridDim.x;

while (i < size)

{

atomicAdd( &(histo[buffer[i]]), 1 );

i += stride;

}

int main( void )

{

unsigned char buffer = (unsigned char)big_random_block( SIZE );

cudaEvent_t start, stop;

HANDLE_ERROR( cudaEventCreate( &start ) );

HANDLE_ERROR( cudaEventCreate( &stop ) );

HANDLE_ERROR( cudaEventRecord( start, 0 ) );

// allocate memory on the GPU for the file’s data

unsigned char *dev_buffer;

unsigned int *dev_histo;

HANDLE_ERROR( cudaMalloc( (void**)&dev_buffer, SIZE ) );

HANDLE_ERROR( cudaMemcpy( dev_buffer, buffer, SIZE,cudaMemcpyHostToDevice ) );

HANDLE_ERROR( cudaMalloc( (void**)&dev_histo,256 * sizeof( long ) ) );

HANDLE_ERROR( cudaMemset( dev_histo, 0,256 * sizeof( int ) ) );

cudaDeviceProp prop;

HANDLE_ERROR( cudaGetDeviceProperties( &prop, 0 ) );

int blocks = prop.multiProcessorCount;

histo_kernel<<<blocks*2,256>>>( dev_buffer, SIZE, dev_histo );

unsigned int histo[256];

HANDLE_ERROR( cudaMemcpy( histo, dev_histo,256 * sizeof( int ),cudaMemcpyDeviceToHost ) );

// get stop time, and display the timing results

HANDLE_ERROR( cudaEventRecord( stop, 0 ) );

HANDLE_ERROR( cudaEventSynchronize( stop ) );

float elapsedTime;

HANDLE_ERROR( cudaEventElapsedTime( &elapsedTime,

start, stop ) );

printf( “Time to generate: %3.1f ms\n”, elapsedTime );

long histoCount = 0;

for (int i=0; i<256; i++) {

histoCount += histo[i];

}

printf( “Histogram Sum: %ld\n”, histoCount );

// verify that we have the same counts via CPU

for (int i=0; i<SIZE; i++)

histo[buffer[i]]–;

for (int i=0; i<256; i++) {

if (histo[i] != 0)

printf( “Failure at %d!\n”, i );

}

HANDLE_ERROR( cudaEventDestroy( start ) );

HANDLE_ERROR( cudaEventDestroy( stop ) );

cudaFree( dev_histo );

cudaFree( dev_buffer );

free( buffer );

return 0;

}

jimpjimp · June 30, 2011, 8:22am

I also change the GPU architecture: sm_12 at CUDA Build Rule v3.0.0.
However, my CUDA version is v3.2, does that matter?

thanks.

Metusalem · July 25, 2011, 10:32am

If you’re using Visual studio, you should right click the solution and select properties, when you get the cuda runtime api - gpu dialog (se attatched jpg) box change the gpuarchitecture(1) to sm_13

dulcegao · September 26, 2011, 10:57pm

you can add “-arch compute_11” when you compile your code nvcc -arch compute_11 myProgram.cu

S_HS · June 1, 2012, 2:55pm

how should I change the gpuarchitecture on code blocks?

Gatchan · June 16, 2021, 12:54am

To solve errors from analyzers (before compilation) from Visual Studio or Resharper:

#if defined (__INTELLISENSE__) | defined (__RESHARPER__)
template<class T1, class T2>
__device__ void atomicAdd(T1 x, T2 y);
#endif

Topic		Replies	Views
AtomicAdd with Visual Studio 2013 CUDA Setup and Installation	11	5500	February 26, 2015
where -arch sm_11 in VS2008? error: identifier "atomicAdd" is undefined CUDA Programming and Performance	3	2097	May 18, 2010
atomic operations atomicExch is undefined CUDA Programming and Performance	8	17411	November 22, 2008
Cuda atomicAdd_block is undefined CUDA Programming and Performance	4	1044	May 8, 2022
error: identifier "atomic" is undefined CUDA Programming and Performance	1	1257	December 28, 2016
Easy Question, what compile flag for atomicAdd ? CUDA Programming and Performance	7	8006	March 1, 2011
atomic functions CUDA Programming and Performance	17	14310	April 10, 2011
The method atomicAdd could not pass the compile CUDA Programming and Performance	2	339	July 20, 2023
Problem with atomicAdd. CUDA Programming and Performance	7	21128	December 10, 2011
error: identifier "atomicAdd" is undefinede CUDA Programming and Performance	6	4113	May 18, 2010

Necessary includes for atomicAdd?

Related topics