Strange performance with TK1 on CUBLAS


just received a Jetson TK1 and I couldn’t wait to bench it on CGEMM. Here is what I got :

there is clearly something not working. In comparison, this is what I got with the NVIDIA CARMA board:

Somebody experiencing the same issue ?


This is just speculation, but have you monitored CPU use during this time? See this thread:

You could compile and install htop to watch as things progress, perhaps idle hardware is kicking in.

Thanks ! Gonna check it this evening. I will edit my post with the results.

EDIT: I did what ebrower suggested but I didn’t change anything on CUBLAS :

# check state of cpuquiet auto-hotplug
    root@tegra-ubuntu:~# cat /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable

    # disable cpuquiet and verify
    root@tegra-ubuntu:~# echo 0 > /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable
    root@tegra-ubuntu:~# cat /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable

    # online CPU cores manually
    root@tegra-ubuntu:~# echo 1 > /sys/devices/system/cpu/cpu0/online
    bash: echo: write error: Invalid argument
    root@tegra-ubuntu:~# echo 1 > /sys/devices/system/cpu/cpu1/online
    root@tegra-ubuntu:~# echo 1 > /sys/devices/system/cpu/cpu2/online
    root@tegra-ubuntu:~# echo 1 > /sys/devices/system/cpu/cpu3/online

and I have:

cat `find /sys/devices/system/cpu -name 'online'`

If anyone has results on cuBLAS, please share ;))

Just remember…there are triggers to CPUs beyond the first one kicking in. Once something has triggered this, those CPUs will continue running for awhile. You will have to watch when CPUs other than than first one are not running, and only then start the program test. The question will remain as to whether the performance jumps at the instant the other CPUs kick in or not.

An alternative is to turn off the idling via information to be found in that other thread, and see if the behavior of performance increase jump goes away.

I thought I turned off idling by doing " # online CPU cores manually" in #4, didn’t I ? (I guess I should post this in the other thread).

That said, there is also a idling system on the CARMA board and I didn’t have any performance problem. 1 core is not enough to control the GPU ?

EDIT: that may be the same issue here:

Gonna test it :)

Activated all cores and set gpu/mem frequency to maximum. I guess I was expecting a bit more from the TK1. I got more or less the same performance than CARMA in single precision, but less in double precision :