compiling with CUDA 3.0, builtin_bswap32/64 undefined

Hi everyone!
I am trying to build some project that is using CUDA3.0. I installed the right version of CUDA and now when I’m trying to compile it, I’m getting these errors:

/usr/include/x86_64-linux-gnu/bits/byteswap.h(47): error: identifier "__builtin_bswap32" is undefined

/usr/include/x86_64-linux-gnu/bits/byteswap.h(111): error: identifier "__builtin_bswap64" is undefined error: identifier "clock64" is undefined error: identifier "__ballot" is undefined

I read something that is because of my gcc version is incompatible, so I downgraded it to ver 4.4.7 and again got these errors. Now I have no idea what to do.

My system version: 2.6.32-73-server #141-Ubuntu SMP Tue Mar 10 17:29:01 UTC 2015

My compilation command:

/usr/local/cuda/bin/nvcc -O3 -use_fast_math -DUSING_CUDA -DKERNEL_BAL -D_NOT_USING_MPI -Iusr/local/cuda/include/ -Iusr/local/cuda/samples/common/inc/ -Iusr/local/cuda/C/common/inc/ -Lusr/local/cuda/lib/ -Lusr/local/cuda/lib64/  -lcutil_x86_64 -lcudart -lm --ptxas-options=-v -gencode=arch=compute_30,code=sm_30  -gencode=arch=compute_20,code=sm_20 -gencode=arch=compute_20,code=compute_20 -o mcgpu.x

Anyone has any idea?

Downgrade to gcc 4.1.2

Thanks a lot @txbob! These first two errors disappeared, now I have only these: error: identifier "clock64" is undefined error: identifier "__ballot" is undefined

Why these functions are undefined?

The problem may be that these functions are not supported on one of the target architectures you are specifying for the compilation. Best I can determine, both of these require compute capability >= 2.0.

Yep, I just read about it, my card support capability 1.1, is there other similar instruction that I could replace with this __ballot?

Not that I am aware of. GPUs with sm_11 are ancient (and haven’t been supported by CUDA for a couple of years), and CUDA 3.0 is likewise extremely antiquated. I would strongly suggest upgrading to modern hardware and software. An entry-level Maxwell class GPU is likely faster than an old GPU with compute capability 1.1, even if that was high-end when it was bought many years ago.

Ok, thanks a lot for your help! I’ll ask my teacher for better/newer gpu hadrware and then try.

Ok, now I have GeForce GTX TITAN Z and still geting the same errors, but now graphic card is quite good. I was trying compilling on Cuda3 and Cuda4, and with gcc3.4. Is it possible that in these versions there is no func like __ballot?

The currently shipping version of CUDA is 7.5, I would suggest trying that. A mix of new hardware with ancient software is not a configuration that many people would have experience with, and it is quite possible that the GPU architecture of the Titan Z is not supported by CUDA 3 and 4 as it did not exist yet when those shipped (I think, this is all ancient history and my memory isn’t that good).

Note: Make sure the Linux version you use is on the list of supported distros for the CUDA version you install. This information can be found in the “Installation Guide Linux” ( for CUDA 7.5.

I believe CUDA 6.5 was the last toolkit that supported sm_1x devices.

But since you have a TITAN Z then get the latest toolkit! So many improvements…

But the newest CUDA doesn’t have these “cutil” functions and this program is using a lot of this.

“cutil” never was a part of CUDA. NVIDIA was always very clear about that, and specifically stated that it should not be used in customer projects. As I recall, it was used as a utility library for the SDK examples in order to tighten the code so the examples could focus on whatever particular CUDA feature they wanted to demonstrate, without any code clutter obscuring that.

So what should I do to make this compilation done?

s218176@supermicro:~$ /usr/local/cuda-7.5/bin/nvcc -DUSING_CUDA -DKERNEL_ORG mcgpu/
In file included from mcgpu/
mcgpu/mcgpu.h:100:28: fatal error: cutil_inline.h: No such file or directory
   #include <cutil_inline.h>
compilation terminated.

Here my trying on cuda-7.5 and without all this -gencode=arch=compute_xx,code=sm_xx flags. Like I said cutil is needed but now u said that this is not a part of CUDA so now I’m confused.

Note every file that ships with the CUDA package is part of CUDA proper. In particular all the files for example programs (which include “cutil”) are not part of CUDA and can change or go away without prior warning between CUDA versions. So how can you deal with the improper inclusion of “cutil” in a 3rd party project?

Short term: Try copying over the cutil_inline.h from an older CUDA SDK that included this file. I have not tried this, there may be some incompabilities between the “cutil” code and modern CUDA.

Long term: Purge all uses of “cutil” from the code base. Using it was a bad idea to begin with, and you might as well clean it up at the earliest opportunity.

I’m really surprised, the documentation says that this program works only with cuda3, thanks to @njuffa I compiled it on cuda7.5 (I included all cutil libs and .h files).

But now when I’m trying to run it I am getting this error: : cudaSafeCall() Runtime API error : invalid device symbol.

In this line I have something like this:

cutilSafeCall(cudaMemcpyToSymbol("voxel_data_CONST",     voxel_data,     sizeof(struct voxel_struct)));

Anyone knows what’s going on?

Problem solved! <3

cutilSafeCall(cudaMemcpyToSymbol("voxel_data_CONST",     voxel_data,     sizeof(struct voxel_struct)));

To this:

cutilSafeCall(cudaMemcpyToSymbol(voxel_data_CONST,     voxel_data,     sizeof(struct voxel_struct)));

Thanks a lot for all ur help!

At one point, string arguments were removed from CUDA APIs and replaced with symbol arguments. String arguments are generally considered poor practice in software engineering. I don’t recall when exactly (at which CUDA version) this change took place, but it seems you already figured it out and adjusted the code accordingly.