I just moved to CUDA 2.1 and now when I profile my app, the profiler does not log the memcopy routines in the output file. I’ve tried running from the command line and tried using the visual profiler (v1.1.08) under windows. In neither case does the .csv log file contain any entries for the memcopies. Everything worked fine under CUDA 2.0 (visual profiler version).
Anyone getting profile data with memcopies in them? I’m trying to figure out the best way to schedule the copies with kernel launches and am really trying to understand if my async memcopies are actually asynch!!!