About multi thread and cusolver cost time

liuyang · July 5, 2017, 9:50am

I’m using tesla k40m and cusolver to calculate matrix eigenvalue and eigenvector， the matrix is 16x16, the result is correct, and single thread cost 3.5ms, but when I create 32 threads and each thread creates a cuda stream for sync and execute cusolver function, the time is 116ms, but the GPU usage is just 37%, why do this？ Does the cusolver can not use in multi thread？How can I solver this?

BulatZiganshin · July 5, 2017, 12:24pm

the task is so small that most part of this 3.5 ms should be a time spend in driver (pushing job to GPU and receiving the answer). i don’t know cusolver but the only way to improve performance is to use batch API if it is available. Or call cusolver routines directly from your GPU code

Topic		Replies	Views
multi-thread multi stream optimization with cublas CUDA Programming and Performance	0	1075	August 9, 2018
How properly counting a performance/program time ? CUDA Programming and Performance	4	2573	August 28, 2007
problem with multi GPU application CUDA Programming and Performance	2	4288	March 4, 2009
About optimize cuda program and get more throughput on T4 TensorRT	0	293	August 4, 2019
Performance drop using multiple cuda devices with pthread CUDA Programming and Performance	3	1087	April 23, 2013
Low performance of cuSOLVER compared to CSparse? CUDA Programming and Performance	0	382	June 27, 2020
Use pthread_create function to create 4 threads in CPU to execute a GPU code,the time is nearly the CUDA Programming and Performance	1	597	June 2, 2016
Parallel large SVD GPU-Accelerated Libraries cusolver	5	1174	October 20, 2023
performance problem with matrix calculation CUDA Programming and Performance	1	1065	February 19, 2009
CUBLAS on CUDA Streams CUDA Programming and Performance	0	7480	September 5, 2011

About multi thread and cusolver cost time

Related topics