Long time to call and execute

Fairly new to CUDA and have a question:

CUDA toolkit 10.1

When using the template in visual studio 2017 I’m puzzled by the time it takes to one call:

I modified the sample code and embedded the cuda magic inside a loop:
int main()
{

const int arraySize = 5;
const int a[arraySize] = { 1, 2, 3, 4, 5 };
const int b[arraySize] = { 10, 20, 30, 40, 50 };
int c[arraySize] = { 0 };

// Add vectors in parallel.
cudaError_t cudaStatus;
for (int i = 0; i < 5000; i++)
{
	cudaStatus = addWithCuda(c, a, b, arraySize);
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "addWithCuda failed!");
		return 1;
	}
}

printf("{1,2,3,4,5} + {10,20,30,40,50} = {%d,%d,%d,%d,%d}\n",
    c[0], c[1], c[2], c[3], c[4]);

… and so on.

The 5000 iterations took about a second. I guess this is due to transfer of cuda code and data. Is there something I’m missing here?