Porting from TK1 to TX1

Milliarde · January 11, 2016, 9:27pm

I did a benchmarking project a year and a half ago using the Jetson TK1 and am hoping to perform the same task again on the TX1.

I know that the TX1 is an upgrade in several regards (more CUDA cores, texture units, etc.) but I don’t believe I should have to make any changes to my code.

I have some globals defining max_threads_per_block as well as total_shared_memory_per_block, but these values appear to be the same (1024 and 49152) for the TK1 and TX1.

Am I missing anything, or should porting code between the devices be straightforward assuming no use of peripherals or external devices?

kulve · January 12, 2016, 10:47am

Overall the APIs are quite same. There are some changes, like new CUDA version etc. but porting from TK1 to TX1 should be straightforward. I’m not familiar with CUDA, so I can’t say how to fine tune it for TX1 though.

Jimmy_Pettersson · January 16, 2016, 1:09pm

Update your compute SM flags during compilation and you should be fine.

As for kepler VS Maxwell arch differences it’s not really a Jetson TX1 specific question.

You should generally see a quite large performance increase but certain kernels that for example have a lot of local memory spilling might not scale as well.

phill-EuL · January 20, 2016, 8:45am

We do image processing with both platforms. We noticed some interesting aspects migrating from tk1 to tx1. It’s probably a Kepler vs Maxwell thing but due to unified memory there might be tk1/tx1 specific things.

Without problems one can use the same cuda kernels on both machines. To benefit from tx1 improvements make sure to compile with the right computing capabilities and start enough blocks and threads to use all CUDA cores.

The interesting effect is that some kernels are slower on the tx1 when compiled with all debug stuff and without optimization. But setting maximum optimization and switching of all debug stuff the tx1 can be almost double the performance of your kernels i. e. half the processing time.

Another interesting effect is that during code optimization on a cuda kernel we realized that a performance gain on tx1 led to longer computing times on tk1.

Make sure to optimize platform specific and profile always in release mode (maximum compiler optimization and without debug settings). You might miss optimization potential or even make it worse in release, when you think you found an improvement while working with a debug version.

Topic		Replies	Views
CUDA Kernel runs much slower on TX1 than on discrete GPU Jetson TX1	8	2494	March 2, 2016
TX1 vs TK1 CPU Jetson TX1	7	21034	December 17, 2015
Jetson TK1 performance bottleneck CUDA Programming and Performance	4	2714	February 10, 2016
porting from TX1 to TX2 Jetson TX2	2	384	October 18, 2021
TX1 / TX2 Performance Comparison Jetson TX2	3	1630	October 18, 2021
The best cuda version for TX1 Jetson TX1	5	1954	October 18, 2021
Running applications built for Jetson TX1 to Jetson TX2 Jetson TX2	9	2277	October 18, 2021
TK1 vs Geforce 680 Jetson TK1	4	2292	May 3, 2014
TX1 slower than TK1 Jetson TX1	5	1314	August 19, 2016
What is the best embedded board supporting CUDA? Jetson TK1	2	715	December 15, 2015

Porting from TK1 to TX1

Related topics