Hi,
The following kernel is given in the document Scan using CUDA on page number 10 to 12.
When I compile the code I get the following errors:
[root@kiran Download]# nvcc Scan13.cu -deviceemu -o Scan
Scan13.cu(59): error: unrecognized token
Scan13.cu(59): error: expected a “]”
Scan13.cu(59): error: unrecognized token
Scan13.cu(59): error: unrecognized token
4 errors detected in the compilation of “/tmp/tmpxft_00001307_00000000-4_Scan13.cpp1.ii”.
I have folllwoing questions here:
1- In the line : #define CONFLICT_FREE_OFFSET(n) \ ((n) >> NUM_BANKS + (n) >> (2 * LOG_NUM_BANKS))
what is the meaning ""? should it not be “/” ?
2- I am getting all the four errors in the following line of the code:
if (thid==0) { temp[n - 1 + CONFLICT_FREE_OFFSET(n - 1)] = 0; } //// clear the last element
3 of these 4 errors say : “unrecognized token”. What is the meaning of token here with respect to line 59
where I am clearing last element. And why the same error is coming up
three times when I think the instruction is executed only once during compilation.
3- I am unable to understand “/tmp/tmpxft_00001307_00000000-4_Scan13.cpp1.ii”. What is tmp, tmpxft_00001307_00000000-4_Scan13.cpp1.ii
What is .ii file etc?
4-why "expected a “]” error is coming in the line 59 i.e:
if (thid==0) { temp[n - 1 + CONFLICT_FREE_OFFSET(n - 1)] = 0; } //// clear the last element
Thanks in advance!!
==============
The kernel along with the Macro is given below (please notice the bold lines):
#define NUM_BANKS 16
#define LOG_NUM_BANKS 4
#ifdef ZERO_BANK_CONFLICTS
#define CONFLICT_FREE_OFFSET(n) \ ((n) >> NUM_BANKS + (n) >> (2 * LOG_NUM_BANKS))#else
#define CONFLICT_FREE_OFFSET(n) ((n) >> LOG_NUM_BANKS)
#endif
global void prescan(float *g_odata, float *g_idata, int n)
{
extern shared float temp;// allocated on invocation
int thid = threadIdx.x;
int offset = 1;
int ai = thid;
int bi = thid + (n/2);
int bankOffsetA = CONFLICT_FREE_OFFSET(ai);
int bankOffsetB = CONFLICT_FREE_OFFSET(ai);
temp[ai + bankOffsetA] = g_idata[ai];
temp[bi + bankOffsetB] = g_idata[bi];
for (int d = n>>1; d > 0; d >>= 1) // build sum in place up the tree
{
__syncthreads();
if (thid < d)
{
int ai = offset*(2thid+1)-1;
int bi = offset(2*thid+2)-1;
ai += CONFLICT_FREE_OFFSET(ai);
bi += CONFLICT_FREE_OFFSET(bi);
temp[bi] += temp[ai];
}
offset *= 2;
}
if (thid==0) { temp[n - 1 + CONFLICT_FREE_OFFSET(n - 1)] = 0; } //// clear the last element
for (int d = 1; d < n; d *= 2) // traverse down tree & build scan
{
offset >>= 1;
__syncthreads();
if (thid < d)
{
int ai = offset*(2thid+1)-1;
int bi = offset(2*thid+2)-1;
ai += CONFLICT_FREE_OFFSET(ai);
bi += CONFLICT_FREE_OFFSET(bi);
float t = temp[ai];
temp[ai] = temp[bi];
temp[bi] += t;
}
}
__syncthreads();
g_odata[ai] = temp[ai + bankOffsetA];
g_odata[bi] = temp[bi + bankOffsetB];
}