about cuda sdk 4.0 "simpleZeroCopy"

about the source file “simpleZeroCopy.cu”

#define MEMORY_ALIGNMENT  4096

#define ALIGN_UP(x,size) ( ((size_t)x+(size-1))&(~(size-1)) )

the code at line 122

a_UA = (float *) malloc( bytes + MEMORY_ALIGNMENT );

    b_UA = (float *) malloc( bytes + MEMORY_ALIGNMENT );

    c_UA = (float *) malloc( bytes + MEMORY_ALIGNMENT );

Although it is not important for this sample.

But, i think that is right logic for this.

a_UA = (float *) malloc( bytes + MEMORY_ALIGNMENT - 1);

    b_UA = (float *) malloc( bytes + MEMORY_ALIGNMENT - 1);

    c_UA = (float *) malloc( bytes + MEMORY_ALIGNMENT - 1);

is right for my logic?

ps.my english is very poor…