nvcc and visual studio compiler

I write the following code

#include <stdio.h>
#include <cuda.h>
#include <time.h>

using namespace std;

int main()
clock_t start_time, stop_time;
long int n = 0;
start_time = clock();
for(int i = 0; i<=200000; i++)
for(int j = 0; j<=10000; j++)
stop_time = clock();
cout<<“the output is”<<n<<" and the consumed time is"<<(double)((stop_time-start_time))/CLOCKS_PER_SEC<<endl;

Firstly, I used visual studio complier to compile the code ( the file name is xxx.cpp) and ran the binary. After a while, the result was printed out on the console and it showed that it took about 16 sec before the program finished.

However, when I used nvcc to compile the code ( the file name is xxx.cu) and ran the binary, the output showed that the elapsed time is 0!! And the result of n is CORRECT and printed out IMMEDIATELY!

Does nvcc optimize the code by somehow implicitly? How could these two compilers have such a huge performance difference?

nvcc won’t have compiled that code - it isn’t a compiler, but rather a compiler driver. It basically works out which parts of an input file are host code, and which parts are device code, and then it pre-processes both before sending the code to the respective host and device compilers (in this case the MS C++ compiler for the host code, and nvopencc for the device code). So the same compiler has compiled the code in both cases. It is likely that nvcc is turning on optimizations in the host compiler that your “naïve” compilation didn’t - obviously in the fast cast the compiler has just precomputed the loop result and replace the loop with an assignment. Disassembly of the compiler output would verify this.

Thank you very much! Your explanation is very clear.