cublas handle reuse

realbas89 · June 14, 2016, 10:45pm

Reading [1], see that reuse of cublasHandle_t is a good practice but if I need to make multiple calls per thread will continue to be a good practice?

If create a handle outside of kernel how can reference it to kernel?

//~ nvcc -rdc=true -arch=sm_35 -o t123 t123.cu -lcublas -lcublas_device -lcudadevrt
//~ sudo optirun --no-xorg ./t123
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
#include <cublas_v2.h>

__global__ void kernel2(cublasHandle_t handle, double *x){
	double alpha = 2.0;
	double *ptrAlpha = α
	
	cublasDscal(handle, _size, ptrAlpha, x, 1); //don't work with handle
	cudaDeviceSynchronize();

	cublasHandle_t handle2;
	cublasStatus_t stat;
	stat = cublasCreate(&handle2);
	if(stat != CUBLAS_STATUS_SUCCESS){
		printf("CUBLAS initialization failed\n");
		return;
	}
	
	cublasDscal(handle2, _size, ptrAlpha, x, 1); //work with handle2
	cublasDestroy(handle2);
}

int main(int argc, char **argv){

	cublasHandle_t handle;
	cublasStatus_t stat;
	stat = cublasCreate(&handle);
	if(stat != CUBLAS_STATUS_SUCCESS){
		printf("CUBLAS initialization failed\n");
		return EXIT_FAILURE;
	}

	double *vetor = new double[_size], *vetorOut = new double[_size];
	assert(vetor);
	assert(vetorOut);
	for(int i=0; i<_size; i++){
		vetor[i] = (double)i;
	}
	double *ptrvetor;
	cudaMalloc((void**) &ptrvetor, _size*sizeof(double));
	cudaMemcpy(ptrvetor, vetor, _size*sizeof(double), cudaMemcpyHostToDevice);
	kernel2<<<1, 1>>>(handle, ptrvetor);
	cudaDeviceSynchronize();
	cudaMemcpy(vetorOut, ptrvetor, _size*sizeof(double), cudaMemcpyDeviceToHost);
	printf("%f\n", vetorOut[_size-1]);
	cudaFree(ptrvetor);
	delete [] vetor;
	vetor = NULL;
	delete [] vetorOut;
	vetorOut = NULL;
	
	cublasDestroy(handle);
	
	return 0;
}

[1]:http://stackoverflow.com/questions/20999382/should-we-reuse-the-cublashandle-t-across-different-calls

Robert_Crovella · June 14, 2016, 11:43pm

Yes, any handles that you need to use on the device should be created in device code.

Note the description of the cublasHandle_t:

[url]http://docs.nvidia.com/cuda/cublas/index.html#cublashandle_t[/url]

“The cublasHandle_t type is a pointer type to an opaque structure holding the cuBLAS library context.”

Since it is a pointer type, when you create it on the host you have effectively created a host pointer (i.e. a pointer that points to an opaque structure in host memory).

When you pass such a pointer by-value to the device via kernel parameters, only the pointer value gets copied, not what it points to. So it is invalid to attempt to use that pointer in device code.

If you want to preserve a handle from one kernel call to the next, you could use:

device cublasHandle_t my_cublas_handle;

realbas89 · June 15, 2016, 5:49am

If my_cublas_handle was declared outside of kernel1 and created in kernel1, the handle is same for all threads? (even resource for all or each have your own?)

Robert_Crovella · June 15, 2016, 6:21pm

multiple threads should not share the same CUBLAS handle:

[url]http://docs.nvidia.com/cuda/cublas/index.html#thread-safety2[/url]

Topic		Replies	Views
Using cuBLAS in different CUDA streams GPU-Accelerated Libraries	3	3527	June 3, 2015
Combining cuBlas and Kernel code CUDA Programming and Performance	14	6484	April 1, 2017
Call cublas API from kernel GPU-Accelerated Libraries	3	5045	December 8, 2015
Issue when calling cublasDdot from within kernel GPU-Accelerated Libraries	7	926	March 21, 2018
CUBLAS functions in a kernel CUDA Programming and Performance	5	6949	June 4, 2008
cuBLAS call from kernel in CUDA 10.0 GPU-Accelerated Libraries	9	4844	April 7, 2021
Does CUBLAS 4 RC-2 support using multiple contexts from a single host-thread? CUDA Programming and Performance	11	10619	August 19, 2011
Newbie question about cublas CUDA Programming and Performance	10	3343	December 2, 2010
CUDA + CPU threads CUDA Programming and Performance	5	11665	August 20, 2008
memcopy fails in multiple pthreads with cudaSetDevice() i m unable to use pthread with multiple GPUs CUDA Programming and Performance	5	3282	August 8, 2011

cublas handle reuse

Related topics