I have recently started working arm cortex-a15 on board Jetson TK1. Where I have to optimize code. As on my algorithm i am getting following results.

Running on 1Ghz freq 5-6 fps & Running on 2GHz freq 10-11 fps

But i want to optimize it upto minimum 18-19 fps at 1GHZ. I have read lot of things related to VFP & NEON. And i found VFP is not a parallel architecture like NEON.Even Mixing NEON and VFP instructions will give poor performance.

And in my algorithm most of the functionality is sequentially dependent. i.e, they are interdependent.

I have not implemented VFP or NEON in my algo. I want to know how could i use VFP and NEON in my algorithm to optimize it.

Hi sanket, kanzarkar,

Not sure the details of your algorithm, but please check below topic if it’s helpful: https://devtalk.nvidia.com/default/topic/1021997/


can you tell me g++ options for tk1 in order to properly use VFP or NEON

I don’t know about VFP, but the NEON with hard float calling convention is:

-march=armv7-a -mfpu=neon -mfloat-abi=hard

If you are compiling natively you won’t need the armv7-a, nor the mfloat-abi since this is the default. The “-mfpu-neon” is the one which makes the NEON available, but you still have to use NEON in your code.

EDIT: Just noticed you are talking about hardware floating point when you used VFP acronym (I sometimes suffer from “acronym psychosis”). In ARMv7-a the older hardware didn’t support a hardware floating point and it was soft (“software”) floating point. If you were to install a cross compiler on a PC you’d look for the “armhf” in the compiler name, and if the compiler has that, then it is able to use hardware floating point instructions (it’s a calling convention on how return values are used related to using software methods or the hardware floating point unit). You won’t need a separate command line argument for the compiler to use or recognize hardware floating point if the compiler itself is correct. For naming purposes “armhf” is the “E-ABI” calling convention using “hardware floating point”. This is what the compilers are in the TK1 and the one installed on host if you ran JetPack for the host itself. The Ubuntu cross arch arm32 compilers also have armhf available (and if you install to host via JetPack this is likely part of what you will get).

One more edit…you can explore options with “man gcc” or “man g++”. The “/” key searches for regular expression terms in man pages, so for example you can “/float-abi”.