is there any max performance turnning script for TK1?

iamsyt · October 22, 2016, 5:03am

hi guys,

i have have a quick question:
is there any max performance turnning script for TK1, just like the one for TX1 as below?

Cuda 7.0 Jetson TX1 performance and benchmarks
https://devtalk.nvidia.com/default/topic/901337/post/4747186/#4747186

because i find my TK1 borad is not max performance yet, even after setting 852000khz to GPU freq, by checking the output of simpleMultiCopy sample:

sudo ./simpleMultiCopy
[simpleMultiCopy] - Starting...
modprobe: FATAL: Module nvidia not found.
> Using CUDA device [0]: GK20A
[GK20A] has 1 MP(s) x 192 (Cores/MP) = 192 (Cores)
> Device name: GK20A
> CUDA Capability 3.2 hardware with 1 multi-processors
> scale_factor = 1.00
> array_size   = 4194304


Relevant properties of this CUDA device
(X) Can overlap one CPU<>GPU data transfer with GPU kernel execution (device property "deviceOverlap")
( ) Can overlap two CPU<>GPU data transfers with GPU kernel execution
    (Compute Capability >= 2.0 AND (Tesla product OR Quadro 4000/5000/6000/K5000)

Measured timings (throughput):
 Memcpy host to device  : 2.529416 ms (6.632842 GB/s)
 Memcpy device to host  : 2.591583 ms (6.473733 GB/s)
 Kernel                 : 4.475915 ms (37.483322 GB/s)

Theoretical limits for speedup gained from overlapped data transfers:
No overlap at all (transfer-kernel-transfer): 9.596914 ms 
Compute can overlap with one transfer: 5.120999 ms
Compute can overlap with both data transfers: 4.475915 ms

Average measured timings over 10 repetitions:
 Avg. time when execution fully serialized      : 9.785031 ms
 Avg. time when overlapped using 4 streams      : 8.434423 ms
 Avg. speedup gained (serialized - overlapped)  : 1.350608 ms

Measured throughput:
 Fully serialized execution             : 3.429159 GB/s
 Overlapped using 4 streams             : 3.978272 GB/s

thanks in advance
-zhi

linuxdev · October 22, 2016, 1:37pm

Yes. See [url]http://elinux.org/Jetson/Performance[/url]

Topic		Replies	Views
Jetson TK1 performance bottleneck CUDA Programming and Performance	4	2714	February 10, 2016
Cuda 7.0 Jetson TX1 performance and benchmarks Jetson TX1	21	17176	March 16, 2017
Jetson vs Kayla (compute capability), which one to choose ? Jetson TK1	2	4836	April 2, 2014
GPU Max Clock rate is slow, only 72 MHz (0.07 GHz) Jetson TX1	3	536	October 18, 2021
kenel overhead time in Jetson TX1? Jetson TX1	5	719	October 18, 2021
Performance spikes on Jetson TX1 using CUDA multithreading Jetson TX1	2	715	October 18, 2021
Jetson TK1 CUDA performance in multithreaded app Jetson TK1	0	938	September 21, 2015
Jetson TK1 Overclock Jetson TK1	1	3076	July 5, 2016
Jetson Tegra K1 double precision performance Jetson TK1	1	3511	April 1, 2014
Anyone know how to monitor the GPU MHz? Jetson TK1	15	31695	October 27, 2014

is there any max performance turnning script for TK1?

Related topics