Porting from TK1 to TX1

I did a benchmarking project a year and a half ago using the Jetson TK1 and am hoping to perform the same task again on the TX1.

I know that the TX1 is an upgrade in several regards (more CUDA cores, texture units, etc.) but I don’t believe I should have to make any changes to my code.

I have some globals defining max_threads_per_block as well as total_shared_memory_per_block, but these values appear to be the same (1024 and 49152) for the TK1 and TX1.

Am I missing anything, or should porting code between the devices be straightforward assuming no use of peripherals or external devices?

Overall the APIs are quite same. There are some changes, like new CUDA version etc. but porting from TK1 to TX1 should be straightforward. I’m not familiar with CUDA, so I can’t say how to fine tune it for TX1 though.

Update your compute SM flags during compilation and you should be fine.

As for kepler VS Maxwell arch differences it’s not really a Jetson TX1 specific question.

You should generally see a quite large performance increase but certain kernels that for example have a lot of local memory spilling might not scale as well.

We do image processing with both platforms. We noticed some interesting aspects migrating from tk1 to tx1. It’s probably a Kepler vs Maxwell thing but due to unified memory there might be tk1/tx1 specific things.

Without problems one can use the same cuda kernels on both machines. To benefit from tx1 improvements make sure to compile with the right computing capabilities and start enough blocks and threads to use all CUDA cores.

The interesting effect is that some kernels are slower on the tx1 when compiled with all debug stuff and without optimization. But setting maximum optimization and switching of all debug stuff the tx1 can be almost double the performance of your kernels i. e. half the processing time.

Another interesting effect is that during code optimization on a cuda kernel we realized that a performance gain on tx1 led to longer computing times on tk1.

Make sure to optimize platform specific and profile always in release mode (maximum compiler optimization and without debug settings). You might miss optimization potential or even make it worse in release, when you think you found an improvement while working with a debug version.