tegrastats on TK1 always shows GPU usage is zero

Hi, I am running various CUDA samples (e.g. nbody and BlackScholes) on my TK1 to try and understand the performance statistic output using tegrastats (and also cat /sys/devices/platform/host1x/gk20a.0/load). I have edited the BlackScholes to run for many minutes on the GPU (e.g. > 10min) to make sure that there would not be some kind of statistical sampling issue. Note that I have manually overclocked the GPU and also turned on all 4 cores. Throughout the entire time I run the simulation using CUDA calls I get the following from tegrastats:
RAM 542/1746MB (lfb 170x4MB) cpu [0%,100%,0%,0%]@2320 EMC 71%@924 AVP 0%@40 VDE 120 GR3D 0%@804 EDP limit 2320
Wouldnt this indicate that the GPU is never used? Note that at the end of the simulation the BlackScholes executable the CPU and GPU times are printed so I assume it is calling CUDA routines.

Why is the GPU load always zero in this case (I have also seen this when I run nbody even when nbody gets similar performance as reported here for TK1 GPU usage: https://www.pugetsystems.com/labs/hpc/NVIDIA-Jetson-TK1-CUDA-performance-569/ … Do I need to turn on the counters or set some environment variable? I checked many other posts that report this issue and there was no clear answer. Thanks.

I have no TK1 with me for checking now, but are you running tegrastats with sudo ?

Yes, running as sudo.

Do you have any special config, such as custom board, or system command line boot without X server running ?
Any error related to gpu (gk20a) in dmesg ?

Hi, no special HW setup only that I had to manually overclock the GPU and turn all 4 CPUs on, in fact I have three different boards and all with the same behavior. dmesg shows the following as the only failure/error from the GPU (also on all boards with the same error)

[ 1639.618724] gk20a gk20a.0: nvhost_gk20a_finalize_poweron: failed to init gk20
a pmu_hw2
[ 1953.016161] gk20a gk20a.0: gk20a_init_pmu_setup_hw2: PG_INIT_ACK failed, rema
ining timeout : 0x0

Here is the output from BlackScholes:
[./BlackScholes] - Starting…
GPU Device 0: “GK20A” with compute capability 3.2

Initializing data…
…allocating CPU memory for options.
…allocating GPU memory for options.
…generating input data in CPU mem.
…copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (153600 iterations)…
Options count : 8000000
BlackScholesGPU() time : 2257.502686 msec
Effective memory bandwidth: 0.035437 GB/s
Gigaoptions per second : 0.003544

BlackScholes, Throughput = 0.0035 GOptions/s, Time = 2.25750 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results…
Checking the results…
…running CPU calculations.

Comparing the results…
L1 norm: 1.708455E-07
Max absolute error: 1.239777E-05

Shutting down…
…releasing GPU memory.
…releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary
Test passed

Just for reference:

Which L4T release are you using ?

head -1 /etc/nv_tegra_release

Could you check your nvidia install with:

sha1sum -c /etc/nv_tegra_release

Just for reference:
Which L4T release are you using ?

head -1 /etc/nv_tegra_release

Could you check your nvidia install with:

sha1sum -c /etc/nv_tegra_release

Hi CaitC,

Below is running CUDA samples(simpleGL) on TK1’s tegrastats:
RAM 698/1892MB (lfb 8x4MB) cpu [91%,off,off,50%]@2065 EMC 15%@924 AVP 0%@204 VDE 120 GR3D 61%@180 EDP limit 0
RAM 707/1892MB (lfb 8x4MB) cpu [98%,off,off,30%]@2065 EMC 19%@924 AVP 0%@204 VDE 120 GR3D 63%@180 EDP limit 0
RAM 722/1892MB (lfb 8x4MB) cpu [99%,off,off,30%]@2065 EMC 19%@924 AVP 0%@204 VDE 120 GR3D 42%@252 EDP limit 0
RAM 696/1892MB (lfb 8x4MB) cpu [25%,off,off,100%]@2065 EMC 20%@924 AVP 0%@204 VDE 120 GR3D 46%@252 EDP limit 0
RAM 699/1892MB (lfb 8x4MB) cpu [25%,off,off,100%]@2065 EMC 21%@924 AVP 0%@204 VDE 120 GR3D 50%@252 EDP limit 0

Please install JetPack3.0 on your TK1 and run cuda samples.

sudo ./tegrastats

Honey_Patouceul, here is the output per your request:

root@tegra2aa:~# head -1 /etc/nv_tegra_release

R21 (release), REVISION: 3.0, GCID: 5091063, BOARD: ardbeg, EABI: hard, DATE: Tue Feb 3 02:03:23 UTC 2015

root@tegra2aa:~# sha1sum -c /etc/nv_tegra_release
/usr/lib/xorg/modules/extensions/libglx.so: OK
/usr/lib/xorg/modules/drivers/nvidia_drv.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_utils.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtnr.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libtegrav4l2.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libjpeg.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvparser.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_audio.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvddk_2d_v2.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvomxilclient.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvsm.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvomx.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_camera_v3.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_utils.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvdc.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_video.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvrm_graphics.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvapputil.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtestio.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_parser.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libglx.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvrm.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtvmr.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvodm_query.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvos.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvwinsys.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_image.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvodm_imager.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvddk_vic.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtestresults.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvavp.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_contentpipe.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvfusebypass.so: OK

Your install looks clean, I cannot say much more.
However, many things have been improved from R21.3 to last release, so as advised by @carolyuu, you may reflash a L4T R21.5 on your TK1 with last JetPack3.0.

Hi. After installing JetPack 3.0 I am able to read the GPU stats. However, note that on my TK1 the full install of JP3.0 actually failed because GameWorks and VisionWorks installation failed. Now I am back to another issue I posted on – I am unable to get accurate mnist results w/ Keras 1.0.4 on TK1 although this works on TF 0.8 on TK1. Thanks to all.