TK1 vs Geforce 680

GBP · May 2, 2014, 5:50pm

I just got tegra k1 dev kit. i’m a researcher and do a lot of DSP acceleration on GPU. I have a geforce 680 that i have been using over the years and compared to 680, tk1’s runtime is a lot slower which I find surprising. I’d like to know if some of you are having similar performance drop. for example I run cufft (about 4000 pt fft) and on geforce 680 it runs 0.2 ms and on tk1 it’s running like 10 ms. i’m using same identical code on both machines. only difference is i compiled to sm_32 arch for tk1 as it should be otherwise it’s the same. has anyone noticed performance drop like this? in general different kernels are running at least 10 times slower compared to 680. both are built on kepler and yes tk1 only has 1 smx. I was hoping that because it’s an embedded device it might run comparable to 680 if not slightly faster. in addition i’m not running a large # of threads here, around 4000 threads. no register spillage, etc. let me know what you are getting, i’m under a time crunch due to paper deadline and really like to include tk1 results but if it runs this slow i’m not so sure…

GBP · May 2, 2014, 6:02pm

also the run time seems to be inconsistent. for example on 680, it runs 2 ms +/-0.1 ms but on tk1 it fluctuates a lot between say 10 ms to 50 ms

allanmac · May 2, 2014, 6:37pm

Doesn’t 10x sound about right?

GFLOPS: 300 vs. 3000
GB/sec: 14 vs. 192
cores: 192 vs. 1536
registers: 2^15 vs. 2^19 = 16x
shared mem (KB): 48 vs. 384
Watts: <10 vs. <225

Pretty sure that the K1’s sm_32 SMX has 32K registers and not the standard Kepler/Maxwell 64K registers.

Since it seems you’re interested in seeing how your work scales across the GPUs you might also want to try to get a $45 GT630/635 with a GK208 chip. It has even less device mem bandwidth than the TK1 (!) at 14.4 GB/sec. But it has 2 full sm_35 SMXs for 384 cores. It’s really cool to compare CUDA code on the 680 (which remains an utter beast of a card), the GK208 and a 750 Ti Maxwell. Can’t wait to add the TK1 to that list.

jonnycowboy · May 2, 2014, 6:44pm

Just comparing the power consumption, 200W for the 680 vs <10W (estimated), that’s a 20x power reduction. What you gain in the TK1 is portability/efficiency and cost, not performance…

I understand your disappointment though. You have to think, is the application you’re looking to use this one, power, noise or size-constrained? If not, then stick with the 680.

Jon

GBP · May 3, 2014, 4:02am

yes great info. i did more background reading on tk1 and yea it’s a different animal. i’ll keep playing around with it and see what happens. cuda6.0 , they have a lot of new features so i have to learn that stuff as well, it’s a bit of step up from 5.5. i have been using older 260, 460, 680 cards in the past and cuda over the years. it runs well out of the box, but make sure to connec to ehternet when isntall toolkit. it runs pretty slow too overall.

what i noticed now is that when i include memory xfer in my runtime measurement (cuda timer that is), the run time doesn’t seem to change compare to no memory xfer . i assume that’s because there’s no pci express and it’s all on chip so that latency may be negligible, that’s an encouraging news.

i’d like to hear more more feedback from others regarding desktop gpu vs mobile gpu performance. speedup/down, runtime measurement, etc. i’ll share more as i go along as well. keep em coming. thanks.

Topic		Replies	Views
CUDA Kernel runs much slower on TX1 than on discrete GPU Jetson TX1	8	2494	March 2, 2016
Jetson TK1 memory allocation/kernel launch perfomance compared to GTX 760 CUDA Programming and Performance	0	736	October 30, 2014
Performance comparision TK1 vs TX1 Jetson TX1	6	3970	October 18, 2021
TX1 vs TK1 CPU Jetson TX1	7	21023	December 17, 2015
Porting from TK1 to TX1 Jetson TX1	3	1406	January 20, 2016
Tesla k20 vs GTX680 benchmarks...!!!!! CUDA Setup and Installation	6	9883	January 28, 2013
TX1 slower than TK1 Jetson TX1	5	1314	August 19, 2016
Jetson TK1 GK20A vs GF108M game performance Jetson TK1	4	4647	July 19, 2014
confused,Our programs run on TX1 is slower than TK1. Jetson TX1	9	1154	October 18, 2021
Hardware for a high-end development system CUDA Programming and Performance	11	3792	June 26, 2012

TK1 vs Geforce 680

Related topics