Hi, I am running various CUDA samples (e.g. nbody and BlackScholes) on my TK1 to try and understand the performance statistic output using tegrastats (and also cat /sys/devices/platform/host1x/gk20a.0/load). I have edited the BlackScholes to run for many minutes on the GPU (e.g. > 10min) to make sure that there would not be some kind of statistical sampling issue. Note that I have manually overclocked the GPU and also turned on all 4 cores. Throughout the entire time I run the simulation using CUDA calls I get the following from tegrastats:
RAM 542/1746MB (lfb 170x4MB) cpu [0%,100%,0%,0%]@2320 EMC 71%@924 AVP 0%@40 VDE 120 GR3D 0%@804 EDP limit 2320
Wouldnt this indicate that the GPU is never used? Note that at the end of the simulation the BlackScholes executable the CPU and GPU times are printed so I assume it is calling CUDA routines.
Why is the GPU load always zero in this case (I have also seen this when I run nbody even when nbody gets similar performance as reported here for TK1 GPU usage: NVIDIA Jetson TK1 CUDA performance … Do I need to turn on the counters or set some environment variable? I checked many other posts that report this issue and there was no clear answer. Thanks.
Do you have any special config, such as custom board, or system command line boot without X server running ?
Any error related to gpu (gk20a) in dmesg ?
Hi, no special HW setup only that I had to manually overclock the GPU and turn all 4 CPUs on, in fact I have three different boards and all with the same behavior. dmesg shows the following as the only failure/error from the GPU (also on all boards with the same error)
Here is the output from BlackScholes:
[./BlackScholes] - Starting…
GPU Device 0: “GK20A” with compute capability 3.2
Initializing data…
…allocating CPU memory for options.
…allocating GPU memory for options.
…generating input data in CPU mem.
…copying input data to GPU mem.
Data init done.
Executing Black-Scholes GPU kernel (153600 iterations)…
Options count : 8000000
BlackScholesGPU() time : 2257.502686 msec
Effective memory bandwidth: 0.035437 GB/s
Gigaoptions per second : 0.003544
Honey_Patouceul, here is the output per your request:
root@tegra2aa:~# head -1 /etc/nv_tegra_release
R21 (release), REVISION: 3.0, GCID: 5091063, BOARD: ardbeg, EABI: hard, DATE: Tue Feb 3 02:03:23 UTC 2015
root@tegra2aa:~# sha1sum -c /etc/nv_tegra_release
/usr/lib/xorg/modules/extensions/libglx.so: OK
/usr/lib/xorg/modules/drivers/nvidia_drv.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_utils.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtnr.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libtegrav4l2.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libjpeg.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvparser.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_audio.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvddk_2d_v2.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvomxilclient.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvsm.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvomx.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_camera_v3.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_utils.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvdc.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_video.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvrm_graphics.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvapputil.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtestio.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_parser.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libglx.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvrm.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtvmr.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvodm_query.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvos.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvwinsys.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_image.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvodm_imager.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvddk_vic.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtestresults.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvavp.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_contentpipe.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvfusebypass.so: OK
Your install looks clean, I cannot say much more.
However, many things have been improved from R21.3 to last release, so as advised by @carolyuu, you may reflash a L4T R21.5 on your TK1 with last JetPack3.0.
Hi. After installing JetPack 3.0 I am able to read the GPU stats. However, note that on my TK1 the full install of JP3.0 actually failed because GameWorks and VisionWorks installation failed. Now I am back to another issue I posted on – I am unable to get accurate mnist results w/ Keras 1.0.4 on TK1 although this works on TF 0.8 on TK1. Thanks to all.