Kernel time measurements return 0 after some time

tiputa · March 10, 2023, 8:10pm

What I am doing wrong?

968000000
CUDA time is 0.000000 seconds

Under conditon, I need to set the lobal variable that every thread would have to check, from time to time, and the thread exits when it sees the global variable set. When all threads exit, then the kernel will “return”-


#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <iostream>
#include <time.h>

// NVIDIA GeForce GTX 960M (compute/sm 50, 5x SM)
# define blocks 2
# define threads 1024

cudaError_t cudaStatus;

__global__ void blackcat(void) {

	//uint64_t n = 218340105584896ull / threads;	// Number of search cycles per thread 1,073,741,824 for 1,024 threads
	uint64_t n = 1000000000ull;

	uint64_t a = 0;

	while (n --> 0) {
		a++;
		if (threadIdx.x == 0 && blockIdx.x == 0 && a == 968000000ull) {
			printf("%lld\n", a);
		}
	}
}

int main() {

	cudaEvent_t start, stop;	// CUDA time
	float time;
	cudaEventCreate(&start);
	cudaEventCreate(&stop);

	cudaSetDevice(0);
	cudaStatus = cudaGetLastError();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaSetDevice failed!  Do you have a CUDA-capable GPU installed?");
	}

	cudaEventRecord(start, 0);
	blackcat << <blocks, threads >> > ();
	cudaEventRecord(stop, 0);
	cudaEventSynchronize(stop);

	cudaEventElapsedTime(&time, start, stop);
	cudaEventDestroy(start);
	cudaEventDestroy(stop);
	printf("CUDA time is %f seconds\n", &time);

	cudaStatus = cudaGetLastError();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "Kernel launch failed: %s\n", cudaGetErrorString(cudaStatus));
	}
	cudaDeviceSynchronize();
	cudaStatus = cudaGetLastError();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching addKernel!\n", cudaStatus);
	}

	return 0;
}

Robert_Crovella · March 10, 2023, 8:18pm

This is incorrect:

printf("CUDA time is %f seconds\n", &time);
                                    ^

Also, your kernel launch error checking is no longer trustworthy. I suggest following the recommendations carefully. You can’t just put that “Kernel launch failed” error check anywhere you want, and expect it to give you useful and reliable information. However that is not the cause of the issue.

tiputa · March 10, 2023, 9:48pm

Thank you for your advice. I fixed it, but now is the result:

printf("CUDA time is %f seconds\n", time);

968000000
CUDA time is 10712.498047 seconds

while measurement on my watches shows just 10 seconds. Just got it, milliseconds…

tiputa · March 10, 2023, 10:17pm

Does it mean that 1 cycle takes 5.231 E-12 s? I.e. 10.712498047 s / 1024 threads * 2 blocks * 1E9 (loop) = 5.231 E-12 s/cycle?

Robert_Crovella · March 10, 2023, 11:13pm

Since the GPU is doing work in parallel, that particular calculation scheme provides very little insight for me. I don’t know what value a calculation like that provides unless you use it in some comparative way (and even then, carefully). The GPU is not calculating anything in such a way that any useful, measurable, or noticeable work is done in 5 picoseconds.

The actual clock cycles of GPUs are much longer than that, on the order of 500 to 1000 picoseconds, approximately. Even then, its nearly impossible to identify what exactly has transpired on a GPU in a given, single clock cycle.

tiputa · March 11, 2023, 7:41am

With only 1 block set, I think this may exit all threads on a condition using a shared memory flag variable:


#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <iostream>
#include <time.h>


// NVIDIA GeForce GTX 960M (compute/sm 50, 5x SM)
//# define blocks 2
//# define threads 1024
# define blocks 1
# define threads 1024

cudaError_t cudaStatus;

__global__ void blackcat(void) {

	uint64_t n = 10000000ull;	// 10E6
	uint64_t a = 0;
	__shared__ bool flag;

	flag = false;

	while (n --> 0) {
		a++;
		if (flag) {
			break;
		}
		if (threadIdx.x == 512 && a == 1000000ull) {	// 1E6, just 1 block
			printf("%lld\n", a);
			flag = true;
		}
	}
}

int main() {

	cudaEvent_t start, stop;	// CUDA time
	float time;
	cudaEventCreate(&start);
	cudaEventCreate(&stop);

	cudaSetDevice(0);
	cudaStatus = cudaGetLastError();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaSetDevice failed!  Do you have a CUDA-capable GPU installed?");
	}

	cudaEventRecord(start, 0);
	blackcat << <blocks, threads >> > ();
	cudaEventRecord(stop, 0);
	cudaEventSynchronize(stop);

	cudaEventElapsedTime(&time, start, stop);
	cudaEventDestroy(start);
	cudaEventDestroy(stop);
	printf("CUDA time is %f s\n", time/1000);

	cudaStatus = cudaGetLastError();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "Kernel launch failed: %s\n", cudaGetErrorString(cudaStatus));
	}
	cudaDeviceSynchronize();
	cudaStatus = cudaGetLastError();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching addKernel!\n", cudaStatus);
	}

	return 0;
}

Topic		Replies	Views
clock() doesn't work properly CUDA Programming and Performance	10	6458	July 3, 2009
Error in lunching a kernel "the launch timed out and was terminated" CUDA Programming and Performance	1	943	April 13, 2011
timer value coming zero CUDA Programming and Performance	2	1150	August 30, 2010
how to measure the time elapsed (or no. of clock cycles) between the start and the end of a cuda thr CUDA Programming and Performance	7	2906	December 13, 2009
How can I calculate the computing time in GPU? CUDA Programming and Performance	8	8351	December 22, 2009
Timing CUDA Code To find the best way to time CUDA code CUDA Programming and Performance	5	2088	January 6, 2009
Measure the kernel duration ... CUDA Programming and Performance	2	746	May 7, 2013
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1927	November 4, 2016
11sec kernel then 600us CUDA Programming and Performance	2	3872	January 24, 2008
minus sign in kernel time CUDA Programming and Performance	5	1828	November 24, 2014

Kernel time measurements return 0 after some time

Related topics