Identifier __hiloint2double is undefined

I am trying to use this function in my kernel, and I am getting a compiler error that it is undefined.

double A;

int2 i;

...

A = __hiloint2double(i.y, i.x)

Do I need to include a special header or something to use this function?

The problem is resolved… well sort of.

I set up my machine to dual boot Windows 7 and CentOS.

This problem affects my Windows 7 setup but not CentOS. Since I didn’t do anything fancy during the setup process, my conclusion is that this is some sort of an issue with the CUDA toolkit running on Windows 7.

Edit:
The trick seems to be telling nvcc to compile for the correct architecture. When I remove the -arch sm_20 from my compilation command, the problem returns, even on CentOS. I do not recall if I was using this flag on Windows 7, so it may not be an OS issue after all.

The problem is resolved… well sort of.

I set up my machine to dual boot Windows 7 and CentOS.

This problem affects my Windows 7 setup but not CentOS. Since I didn’t do anything fancy during the setup process, my conclusion is that this is some sort of an issue with the CUDA toolkit running on Windows 7.

Edit:
The trick seems to be telling nvcc to compile for the correct architecture. When I remove the -arch sm_20 from my compilation command, the problem returns, even on CentOS. I do not recall if I was using this flag on Windows 7, so it may not be an OS issue after all.

Hello

I have the same problem when using CUDA3.0 in RedHat Enterprise Linux.
“error: identifier “__hiloint2double” is undefined”

How did you solve it ? Is it in CUDA Toolkit library?

Hello

I have the same problem when using CUDA3.0 in RedHat Enterprise Linux.
“error: identifier “__hiloint2double” is undefined”

How did you solve it ? Is it in CUDA Toolkit library?

Did you read my edit? If you have a device with compute capability 2.0, add “arch sm_20” to your nvcc command or make file. For me, I can make the problem come back and go away by removing and adding this flag.

If your device has a lower compute capability, then I don’t think you can use this function. You’ll have to search for it and copy the source code into your application.

(begin speculation)

My suspicion is that this function was added at the same time the instruction set was changed to support native execution of double precision arithmetic. To facilitate backward compatibility (new code running on old devices), nvcc automatically compiles for one of the 1.X architectures, unless the arch flag is set. If you want the benefits of native double precision, or access to this function, you have to give up backward compatibility and compile for architecture 2.0.

(/end speculation)

Please post back if this fix also works for you.

Did you read my edit? If you have a device with compute capability 2.0, add “arch sm_20” to your nvcc command or make file. For me, I can make the problem come back and go away by removing and adding this flag.

If your device has a lower compute capability, then I don’t think you can use this function. You’ll have to search for it and copy the source code into your application.

(begin speculation)

My suspicion is that this function was added at the same time the instruction set was changed to support native execution of double precision arithmetic. To facilitate backward compatibility (new code running on old devices), nvcc automatically compiles for one of the 1.X architectures, unless the arch flag is set. If you want the benefits of native double precision, or access to this function, you have to give up backward compatibility and compile for architecture 2.0.

(/end speculation)

Please post back if this fix also works for you.

[/end speculation]
__hiloint2double was indeed introduced together with double precision support in compute capability 1.3. Thus you’ll have to specify a compute capability or architecture >= 1.3 to nvcc as the default is 1.0.

[/end speculation]
__hiloint2double was indeed introduced together with double precision support in compute capability 1.3. Thus you’ll have to specify a compute capability or architecture >= 1.3 to nvcc as the default is 1.0.

I modified and used the common.mk in SDK as my makefile, it seems to have included the “sm_20” flag by default:
here is :

Compiler-specific flags (by default, we always use sm_10 and sm_20), unless we use the SMVERSION template

GENCODE_SM10 := -gencode=arch=compute_10,code="sm_10,compute_10"
GENCODE_SM20 := -gencode=arch=compute_20,code="sm_20,compute_20"

but it does not work.

BTW, there is a flag “SMVERSIONFLAGS” in common.mk, so I add in my makefile like this:
SMVERSIONFLAGS := -arch=sm_20

it does not work neither.

I modified and used the common.mk in SDK as my makefile, it seems to have included the “sm_20” flag by default:
here is :

Compiler-specific flags (by default, we always use sm_10 and sm_20), unless we use the SMVERSION template

GENCODE_SM10 := -gencode=arch=compute_10,code="sm_10,compute_10"
GENCODE_SM20 := -gencode=arch=compute_20,code="sm_20,compute_20"

but it does not work.

BTW, there is a flag “SMVERSIONFLAGS” in common.mk, so I add in my makefile like this:
SMVERSIONFLAGS := -arch=sm_20

it does not work neither.

Thank you. I was hoping to get some feedback those comments.

Thank you. I was hoping to get some feedback those comments.

Looks like your makefile is generating the flags for two architectures - 1.0 and 2.0. Which one is it compiling for? There should be some conditional logic somewhere in the makefile that selects between GENCODE_SM10 and GENCODE_SM20. Later on you specify sm_20 in the SMVERSIONFLAGS, but if the GENCODE_SM10 flags are getting paired with -arch=sm_20, it seems reasonable that nvcc would still produce the error.

I would suggest temporarily ditching the makefile and calling nvcc directly from the command line until you get this working. It will take a lot of time to figure out all the necessary nvcc args, but it will be worth it in the end because you’ll come away with a clear understanding of the compilation step. (Refer to your_cuda_install_directory/doc/nvcc_2.0.pdf for more help with nvcc.)

Looks like your makefile is generating the flags for two architectures - 1.0 and 2.0. Which one is it compiling for? There should be some conditional logic somewhere in the makefile that selects between GENCODE_SM10 and GENCODE_SM20. Later on you specify sm_20 in the SMVERSIONFLAGS, but if the GENCODE_SM10 flags are getting paired with -arch=sm_20, it seems reasonable that nvcc would still produce the error.

I would suggest temporarily ditching the makefile and calling nvcc directly from the command line until you get this working. It will take a lot of time to figure out all the necessary nvcc args, but it will be worth it in the end because you’ll come away with a clear understanding of the compilation step. (Refer to your_cuda_install_directory/doc/nvcc_2.0.pdf for more help with nvcc.)

Thank you for your advice.

I compile only with GENCODE_SM20 := -gencode=arch=compute_20,code="sm_20,compute_20" and not the SM10 one, it works.

but here is another problem during the rumtime, the error is:

Cuda error in file ‘texture.h’ in line 20 : invalid texture reference.

I used int2 to acquire double texture, it work in CUDA2.3, but error in CUDA3.1

the code in my texture.h is:

texture<int2, 1>  tex_x_double;

void bind_vec(double *x) 

{   

   CUDA_SAFE_CALL(cudaBindTexture(NULL, tex_x_double, x));   

} 

__inline__ __device__ double fetch_vec(const int &i, double *x)

{

	int2 v = tex1Dfetch(tex_x_double, i);

	return __hiloint2double(v.y, v.x);//avoid banks conflict

}

Thank you for your advice.

I compile only with GENCODE_SM20 := -gencode=arch=compute_20,code="sm_20,compute_20" and not the SM10 one, it works.

but here is another problem during the rumtime, the error is:

Cuda error in file ‘texture.h’ in line 20 : invalid texture reference.

I used int2 to acquire double texture, it work in CUDA2.3, but error in CUDA3.1

the code in my texture.h is:

texture<int2, 1>  tex_x_double;

void bind_vec(double *x) 

{   

   CUDA_SAFE_CALL(cudaBindTexture(NULL, tex_x_double, x));   

} 

__inline__ __device__ double fetch_vec(const int &i, double *x)

{

	int2 v = tex1Dfetch(tex_x_double, i);

	return __hiloint2double(v.y, v.x);//avoid banks conflict

}

I’m very glad to hear you got past the compilation problem. Its also good that you got the architecture flag set correctly since that could have caused suboptimal performance.

I suggest you create a new thread about the texture problem. I’m not exactly an expert on using textures, and I’m also extremely busy today trying to meet a deadline, so I won’t get to look at this until late tonight. If you create a new thread, more people will see it.

Thanks again for posting back that you got it working.

I’m very glad to hear you got past the compilation problem. Its also good that you got the architecture flag set correctly since that could have caused suboptimal performance.

I suggest you create a new thread about the texture problem. I’m not exactly an expert on using textures, and I’m also extremely busy today trying to meet a deadline, so I won’t get to look at this until late tonight. If you create a new thread, more people will see it.

Thanks again for posting back that you got it working.