Switched to CUDA 2.0 today and got the similar error:
1>Compiling...
1>CudaGlBufferObject.cu
1>d:/Jesper/Med/VolumeRendering/trunk/VolumeRenderingAPI/src/CudaGl/CudaGlBufferObject.cu(27): warning: variable "argv" is used before its value is set
1>d:/Jesper/Med/VolumeRendering/trunk/VolumeRenderingAPI/src/CudaGl/CudaGlBufferObject.cu(27): warning: variable "argc" is used before its value is set
1>tmpxft_00000a7c_00000000-3_CudaGlBufferObject.cudafe1.gpu
1>tmpxft_00000a7c_00000000-8_CudaGlBufferObject.cudafe2.gpu
1>d:/Jesper/Med/VolumeRendering/trunk/VolumeRenderingAPI/src/CudaGl/CudaGlBufferObject.cu(27): warning: variable "argv" is used before its value is set
1>d:/Jesper/Med/VolumeRendering/trunk/VolumeRenderingAPI/src/CudaGl/CudaGlBufferObject.cu(27): warning: variable "argc" is used before its value is set
1>tmpxft_00000a7c_00000000-3_CudaGlBufferObject.cudafe1.cpp
1>tmpxft_00000a7c_00000000-12_CudaGlBufferObject.ii
1>Compiling...
1>VolumeRendering.cu
1>d:/Jesper/Med/VolumeRendering/trunk/VolumeRenderingAPI/src/VolumeRendering.cu(153): warning: variable "argv" is used before its value is set
1>d:/Jesper/Med/VolumeRendering/trunk/VolumeRenderingAPI/src/VolumeRendering.cu(153): warning: variable "argc" is used before its value is set
1>tmpxft_00000fd0_00000000-3_VolumeRendering.cudafe1.gpu
1>tmpxft_00000fd0_00000000-8_VolumeRendering.cudafe2.gpu
1>### Assertion failure at line 1261 of ../../common/com/symtab.cxx:
1>### Compiler Error in file C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/tmpxft_00000fd0_00000000-9_VolumeRendering.cpp3.i during PU_adjust_addr_flags phase:
1>### couldn't find matching field
1>nvopencc ERROR: D:\CUDA\bin/../open64/lib//be.exe returned non-zero status 1
The additional warnings are due to the definition change of _CUT_DEVICE_INIT(2 params required), I workaround this by:
int argc;
char **argv;
CUT_DEVICE_INIT(argc,argv);
Attached the NVCC command options which is identical to the CUDA samples:
Thanks. I have VS2005 sp1 installed but it doesn’t help.
And I’ve shrinked the error to several pieces of code. Here is one of them:
#define FIND_MINMAX_N_THREADS 128 // must be a power of 2 for the algorithm below to work
#define VALUES_PER_THREAD (sizeof(uint4)/sizeof(VolumeSource_t))
__global__ void
FindMinMax_kernel (
VolumeSource_t*
dp_voxels,
VolumeSource_t*
dp_minmax
)
{
unsigned int
i = ThreadId() * VALUES_PER_THREAD;
uint4 data = *(uint4 *)(dp_voxels + i); // memory fetch
VolumeSource_t* val = (VolumeSource_t*)&data; // address of register
//__shared__ VolumeSource_t minValue[ FIND_MINMAX_N_THREADS ]; //error code
//__shared__ VolumeSource_t maxValue[ FIND_MINMAX_N_THREADS ]; //error code
VolumeSource_t minValue[ FIND_MINMAX_N_THREADS ]; //substitution
VolumeSource_t maxValue[ FIND_MINMAX_N_THREADS ]; //substitution
minValue[ threadIdx.x ] = maxValue[ threadIdx.x ] = val[0];
for(int i = 1; i<VALUES_PER_THREAD; i++)
{
if( val[i] < minValue[ threadIdx.x ] ) minValue[ threadIdx.x ] = val[i];
if( val[i] > maxValue[ threadIdx.x ] ) maxValue[ threadIdx.x ] = val[i];
}
__syncthreads();
// sum half-half-half
// I can't bother about wraps... yet...
// I hope there's no bank conflicts
for(int i = FIND_MINMAX_N_THREADS/2; i>0; i>>=1 )
{
__syncthreads();
if( threadIdx.x<i)
{
int j = threadIdx.x + i;
if( minValue[ j ]<minValue[ threadIdx.x ] ) minValue[ threadIdx.x ] = minValue[ j ];
if( maxValue[ j ]>maxValue[ threadIdx.x ] ) maxValue[ threadIdx.x ] = maxValue[ j ];
} else
return;
}
if(threadIdx.x == 0)
{
#if ATOMICS
// this is slower though...
atomicMin( &((unsigned int*)dp_minmax)[0], minValue[0] );
atomicMax( &((unsigned int*)dp_minmax)[1], maxValue[0] );
#else
unsigned int bId = 2*BlockId();
//dp_minmax[ bId + 0 ] = minValue[threadIdx.x]; //error code
//dp_minmax[ bId + 1 ] = maxValue[threadIdx.x]; //error code
#endif
}
}
As you can see, the code can build after I commented all the 4 lines(marked as error code) and give 2-line substitution. This is not the solution since I need the multi-thread functionality.
So it seems to be something related to shared parameters. Is there any workarounds on this?
Please tell me if you need more detailed information.
After further investigation, I found the error is due to the following 2 lines:
uint4 data = *(uint4 *)(dp_voxels + i); // memory fetch
VolumeSource_t* val = (VolumeSource_t*)&data; // address of register
BTW, VolumeSource_t is unsigned short here. And the error’s gone after modified the code as:
VolumeSource_t* val = dp_voxels + i;
Actually I’m migrating the project from CUDA 1.0 to 2.0 for newer graphic card support and looks like the type uint4 has different behavior in these 2 versoins.
Is there any known issue on this? Or any better solution?