Profiler, GPU/CPU time

As I understand, GPU time must be smaller than CPU time in profiler output unless it is asynchronous call. I got the following result from Visual Profiler 1.1.

----------- GPU time CPU time
memcopy 265.472 226.844

This memcopy is NOT asynchronous call, so it seems weird. Even if I use cudaThreadSynchronize() around this memcpy, the result is same. My guess is that this memcpy is host to device copy, so even though memcpy is done there is something going on inside the GPU. Or could this be just a bug?

I use Vista 64bit, visual profiler 1.1, CUDA 2.1.

Any opinion?