Global memory coalescing for CHAR type

Hello!

my gpu is a geforce 425m, compute capability 2.1

i have a 1D data of char in global memory:

unsigned char* cdata;
cudaMalloc(&cdata,csize);

and a kernel launch with as many number of threads as many elements cdata have,
every thread writes a char type data to cdata[i],
i is equal to the index of the thread (in my case: blockIdx.x*blockDim.x+threadIdx.x)

Is the global memory access coalesced for char type?

For every warp there will be 32*1byte data to write to the global memory.
Therefore i configured the global memory acces cache to use only the 32byte sized L2 cache.
Am i thinking correctly that this should be faster than the use of L1 and L2 cache both?

Thanks,

Gaszton