problems with cutBankChecker

Hi,

I’m confused with the output of cutBankChecker tool from cuda_sdk.

Here is a sample code:

#define CHECK_BANK_CONFLICTS

#ifdef CHECK_BANK_CONFLICTS

#define MEM(base, index)   CUT_BANK_CHECKER(base, index)

#else

#define MEM(base, index)   base[index]

#endif

__global__ void test_back_conflicts() {

   uint thid = threadIdx.x, half_dim = blockDim.x / 2;

    __shared__ uint data[512];

    if(thid < half_dim && (thid & 2)) {

          uint mem_ai = thid;

          uint x = MEM(data, mem_ai);

            x--; // blah blah

      }

}

...

dim3 threads(256, 1, 1);

test_back_conflicts<<<1, threads>>>();

Here the bank checker reports 112 bank conflicts, in the following form:

threadIdx.x = 14 threadIdx.y = 0 threadIdx.z = 0 :: index = 14

threadIdx.x = 30 threadIdx.y = 0 threadIdx.z = 0 :: index = 30

threadIdx.x = 46 threadIdx.y = 0 threadIdx.z = 0 :: index = 46

threadIdx.x = 62 threadIdx.y = 0 threadIdx.z = 0 :: index = 62

threadIdx.x = 78 threadIdx.y = 0 threadIdx.z = 0 :: index = 78

threadIdx.x = 94 threadIdx.y = 0 threadIdx.z = 0 :: index = 94

threadIdx.x = 110 threadIdx.y = 0 threadIdx.z = 0 :: index = 110

threadIdx.x = 126 threadIdx.y = 0 threadIdx.z = 0 :: index = 126

although it’s clear from the code above that the memory is accessed sequencially by the threads

with the condition (thid & 2) that causes some threads to diverge.

-------- i.e., if you remove (thid & 2) from the code you’ll get 0 bank conflicts --------------

Didn’t I understand it correctly that the threads thid and thid+16 belong to different half-warps, and therefore there cannot be bank conflicts between them ?

or threads are grouped to warps in some “non-trivial” order ?

thanks

ok, guys, maybe I didn’t state it clearly, my question is

“whether threads thid and thid+16 always belong to different half-warps
such that there can’t be any bank conflicts between them ?”
it that true ? I don’t believe there is no one on this forum knowing this issue,
help me plz !!

ok just continue talking with myself ;)


found out that: cutBankChecker is UNRELIABLE !!!

PLEASE DON’T USE IT!!!


it works correctly only when none of your threads of a half-warp diverge during memory access,

just have a look at bank checker source code:

void BankChecker::

access( unsigned int tidx, unsigned int tidy, unsigned int tidz,

        unsigned int bdimx, unsigned int bdimy, unsigned int bdimz,

        const char* file, const int line, const std::string& aname,

        const int index)

{ 

    is_active = true;

   // linearized thread id

    unsigned int ltid = getLtid( tidx, tidy, tidz, bdimx, bdimy, bdimz);

   // reset state if new warp

    // ltid = 0, 16, 32, ...

    if( 0 == (ltid & (warp_size - 1))) 

    {

        // double check to handle multiple shared mem accesses in one line

        if( last_ltid != 0) 

        {

            reset();

        }

    }

   AccessLocation loc( file, line, aname, ltid);

    AccessInfo info( ltid, tidx, tidy, tidz, index);

    access_data[loc][(index % warp_size)].push_back( info);

   // ltid == 15, 31, 47, ...

    if( 15 == (ltid & (warp_size - 1))) 

    {

        analyse( access_data. find( loc));

    }

   last_ltid = (ltid % warp_size); // 0..15

}  

"if( 0 == (ltid & (warp_size - 1))) {

reset();

}

i.e., accumulated access information is reset only when the thread with id divisible by 16 accesses the memory, however if this thread diverges, cutBankChecker would count false bank conflicts !!!

Still no comments on this post??
I think it’s a basic question regarding bank conflicts, isn’t it? which I don’t know the answer myself …

CUT_BACK_CHECKER is just a tool to aid I believe. Regarding bank-conflicts, there is a lot of information in the Programming Guide.