How to get timeline trace of memcpy and execution on Linux? which tool/script/trick to use?

Hi, I’m trying to capture timeline traces of memory copy and kernel execution overlaps on Tesla C2050.
How can I get the timeline trace on a Linux machine? Is there any tool/script/trick?

p.s. Machine: Ubuntu 10.10 and I’ve got only remote access at the moment.
We managed to install virtual machine with Windows 7 in order to run parallel nsight, but the cuda driver refuses to install (no hardware found).

On Linux, I think this is done by the Compute Visual Profiler, found in the /usr/local/cuda/computeprof/bin directory. (I haven’t used this in a long time, so I don’t know much about the details now.)