Problem: Cuda on Qt slower than naive code


I write Cuda code on Qt on Jetson TX2. The results are the same, but Cuda code is slower 10 times than naive code. I don’t know that I need to active (enable) GPU on Jetson or not?

global void accum(float *a, float *b, float *c, int sizeN)
shared float cache[1024];
int tid = blockDim.x * blockIdx.x + threadIdx.x;
int cacheIndex = threadIdx.x;

cache[threadIdx.x] = a[tid] * b[tid];

if(cacheIndex == 0)
    float temp=0;
    for(int i=0;i<sizeN;i++)
        temp += cache[i];

//Call Kernel
accum<<<(sizeNN+1023) / 1024,1024>>>(d_A, d_B, d_C,sizeNN);

Could anyone give me advices? Thank you


I also realize that there is only one CPU turn on, when I do something, only CPU1 active (100%), others are off (0%). And, when I run command line: sudo ./tegrastats, it showed that:

RAM 1641/7846MB (lfb 1260x4MB) CPU [22%2036,off,off,32%2036,15%2036,28%2034]

Could, please, give me some advices? Thank you


You can try to maximize the system performance first:

sudo nvpmodel -m 0
sudo ./

By the way, you should check the GR3D node which indicates GPU utilization.


Hi AastaLLL,

Thank you for your reply. I followed 2 your commands, but it isn’t effective.

In fact, I am a newbie in CUDA + QT and GPU. So, I don’t know how to check the GR3D? Actually, tagrastats command doesn’t show that.

And, could you tell me how can I watch the performance of GPU on Jetson TX2 when run the code?

Thank you.


Please run tegrastats with root authority.
You should be able to find the GR3D term with tegrastats.

For example:

RAM 1743/15819MB (lfb 3395x4MB) CPU [0%@1190,1%@1190,0%@1190,0%@1190,off,off,off,off] EMC_FREQ 1%@1331 <b>GR3D_FREQ 0%@675</b> APE 150 MTS fg 0% bg 0% AO@25C GPU@25.5C Tboard@25C Tdiode@28.75C AUX@24.5C CPU@26.5C thermal@25.4C PMIC@100C GPU 309/309 CPU 154/154 SOC 1239/1239 CV 0/0 VDDRQ 309/309 SYS5V 2089/208