mem_size = 2098304
Processing time dev : 0.026260 (ms)
Processing time dev copy : 1023.973694 (ms)
Processing time host: 1487.460327 (ms)
Total Errors = 0
Press ENTER to exit...
output
“Processing time dev copy” not relative “mem_size”
but hard relative with “Processing time dev” and “Processing time host”
Your timing is wrong. The kernel launch is asynchronous, so you need to do two things to fix this: use cudaThreadSynchronize() before stopping the timer…
mem_size = 2098304
Processing time dev : 1024.689697 (ms)
Processing time dev copy : 3.146769 (ms)
Processing time host: 1490.070679 (ms)
Total Errors = 0
Press ENTER to exit...
is anyone CUDA optimized version of linux crypt(3) function?