Incorrect CUDA deviceQuery Results?

Hi, I was trying to see the CUDA architecture on the Jetson Nano by running the deviceQuery function that came with CUDA 10.0, and found some weird results for the memory clock rate:

/usr/local/cuda/samples/1_Utilities/deviceQuery</b></font>$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.0 / 10.0
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3965 MBytes (4157145088 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

The LPDDR4 memory is not actually clocked at 13 Mhz, is it? That seems to be two orders of magnitude lower than the expected frequency of around 1600 Mhz. Is this having the issue as Jetson TX2, mentioned here: https://devtalk.nvidia.com/default/topic/1024058/jetson-tx2/cuda-devicequery-and-bandwidth-test-on-tx2-are-weird-/? If this is actually correct, I would be surprised if the GPU and CPU are not starved of data under any appreciable workload.

Hi kss223, yes you are right, that is an incorrect output from deviceQuery. You can monitor the true memory frequency using tegrastats utility. It is reported as EMC in the tool. Launch it with sudo to get the EMC frequency.

Hi dusty, thanks for the prompt reply! However, after running tegrastats, I think it is incorrect as well:

tegrastats
RAM 2691/3965MB (lfb 107x4MB) CPU [3%@1428,2%@1428,2%@1428,2%@1428] EMC_FREQ 0% GR3D_FREQ 0% PLL@21.5C CPU@23C iwlwifi@28C PMIC@100C GPU@22.5C AO@30.5C thermal@22.75C POM_5V_IN 1831/1831 POM_5V_GPU 122/122 POM_5V_CPU 203/203

It doesn’t show a percentage followed by a frequency, as shown here for a TX2 board:
https://devtalk.nvidia.com/default/topic/1027315/jetson-tx2/command-to-check-if-gpus-are-enabled-on-nvidia-jetson-tx2/post/5225270/
But running the excellent jetson stats by rbonghi at the same time https://github.com/rbonghi/jetson_stats shows the following: https://imgur.com/a/dmZ0Wkz
So there is indeed a load on the EMC (GPU memory). Not a huge issue though, but I figured other people will run into this sooner or later.

Also for people who can’t figure out how to run jetsonstats, it seems to have changed for this version of the release. It is not an executable in some folder or in the home directory. It is just a command.

Hi,

Please execute the tegrastats with root authority.

sudo tegrastats

Thanks.

That fixed it, did not know that was required, thanks.

Will there be any fix in cudaDeviceQuery? It seems to be still an issue in jetpack 4.4-b144. The memory clock rate is still given as 13MHz.