Handling large integer calculations inside kernel without overflow?

Please advise, the best practice to handle large integer calculations inside kernel without overflow. Please suggest the right datatypes that can be used and respective header files required.

Example- Say if I have to multiply 1 billion into 7 billion
1,000,330,000 x 7,000,567,989

What do you mean by large integers? The result of your example fits in a signed 64 bit integer.

#include <stdio.h>

__global__
void foo(int64_t* c, int64_t a, int64_t b){
	*c = a * b;
}

int main(){
	int64_t* c;
	cudaMallocManaged(&c, sizeof(int64_t));
	foo<<<1,1>>>(c, 1000330000ULL, 7000567989ULL);
	cudaDeviceSynchronize();
	printf("%ld\n", *c);
	cudaFree(c);
}

Prints: 7002878176436370000

There are also integer intrinsic functions to calculate the upper 32 / 64 bit of a 32 / 64 bit multiplication
__mul64hi ( long long int x, long long int y)
__mulhi (int x, int y)

see CUDA Math API :: CUDA Toolkit Documentation

The safer choice of format here would be “%lld” since ‘long’ is a 32-bit type on some platforms (notably Windows).