cudaMemcpy(dataDev, dataHost, mem_size, cudaMemcpyHostToDevice) execution time to long

cudaMemcpy(dataDev, dataHost, mem_size, cudaMemcpyHostToDevice) execution time

this: http://openpaste.org/en/18964/ code

do this:

mem_size = 2098304

Processing time dev : 0.026260 (ms)

Processing time dev copy : 1023.973694 (ms)

Processing time host: 1487.460327 (ms)

Total Errors = 0

Press ENTER to exit...

output

“Processing time dev copy” not relative “mem_size”

but hard relative with “Processing time dev” and “Processing time host”

WTF ???

Your timing is wrong. The kernel launch is asynchronous, so you need to do two things to fix this: use cudaThreadSynchronize() before stopping the timer…

…and wash you mouth out with soap.

working well

mem_size = 2098304

Processing time dev : 1024.689697 (ms)

Processing time dev copy : 3.146769 (ms)

Processing time host: 1490.070679 (ms)

Total Errors = 0

Press ENTER to exit...

is anyone CUDA optimized version of linux crypt(3) function?