Libtensorflow_jni for Jetson Nano

Hello,
I would like to use tensorflow for java on my Jetson Nano board. In order to run my application on it, I need the libtensorflow_jni.so & libtensorflow_framework.so files optimized for the board.
I allready compiled tensorflow 1.13.1 with bazel 0.24.1. CPU-only version is working but slow. GPU version compiled but I have the following error at runtime : conv_2d_gpu.h:614 NON-OK-status: CudaLaunchKernel( SwapDimension1And2InTensor3UsingTiles…) status : Internal: unknown error.
Does some know what this message means ?
What would be best for me is to get those files optimized and ready to use on Jetson Nano…

Thank you.

Hi,

We don’t have a jni library so you will need to build it from source.

Are you using TensorFlow v2.0? There is a similar issue on v2.0 before.
If yes, please switch back to TensorFlow v1.13.1 which is verified in the python interface.

Thanks.

Hi,
I am using Tensorflow 1.13.1 which I checked in python and is confirmed when logging the version from within java.
The build was “successfull” but when I run the prediction I get the unknown error message I wrote above.

Hi,

If it is a runtime bug, it may be related to the CUDA toolkit and driver.

Could you share your bazel configure with us?
Nano’s compute capacity is 5.3. Do you add it correctly?

Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] <b>5.3</b>

Thanks.

Hi,

I am using most of default settings in bazel configuration : python 2.7, XLA JIT support : Yes, OpenCL SYCL support : NO, ROCm support : No, CUDA support : yes, TensorRT : tested both Yes/No, compute capabilities : 3.5,7.0, clang as Cuda compiler : No, gcc :7.4.0, MPI support : No, bazel optimization flags :-march=native -Wno-sign-compare, Android Workspace : No.
Then I run “bazel build --config opt //tensorflow/java:tensorflow //tensorflow/java:libtensorflow_jni” and
I get a successfull build 17 hours after that…

nvcc --version returns V10.0.166.

I will change the compute capabilities to 5.3 and rebuild it later today. If I have setup something wrong, please let me know.

Thank you.

Hello,

I was able to build tensorflow 1.12.2 for java and it works now.
I followed this tutorial : https://jkjung-avt.github.io/build-tensorflow-1.12.2/ and I made some modifications to get the .so files.
But when running MobileNet-V2 (1.4 224x224) for classification, I get 4 FPS. Whereas Nvidia benchmark indicates 64 FPS with MobileNet-V2 (300x300). I used the model provided here : https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/README.md . Does your model have some specific optimizations ?

Thank you.

Hi,

The profiling result is using TensorRT rather than TensorFlow.
https://developer.nvidia.com/tensorrt

It’s recommended to convert your model into TensorRT to have some acceleration.
Here is a python based example for your reference:
https://github.com/NVIDIA-AI-IOT/tf_trt_models

Thanks.

thank you for your help,
I converted the model to tf_trt but I can’t load it. I get Invalid GraphDef error when loading directly in tensorflow. I loaded the library “_trt_engine_op.so” like a have seen on some forums with no success.
I am wondering if I could translate the python example to java. For this I would need the trt.create_inference_graph function. Is it accessible via native function ?

Thanks.

Hi,

Sorry that we are mot really familiar with TensorFlow Java interface.
Is other language a possible option for you?

Thanks.

Hello,

I can use C / C++ and call it from Java.

Hi,

Let me confirm first.

The process should be the same as the C++ interface rather than the Java wrapper.
Is it correct?

Thanks.

Hi,
yes I could do all the AI parts in C++ and just send the results to Java.

I tried with Python using Tensorflow and TensorRT and I get an average of 18 FPS. This is much better than with Tensorflow only but still far from 64 FPS announced in the benchmarks. In your previous message, did you mean using TensorRT without Tensorflow ? If this is the case I would need some examples.

Thanks.

Hi,

I am using TensorRT from C++ now which I am calling from within Java. I converted the model to an uff file and now I get about 18FPS (29ms for inference only using FP16). That is approching my wishes (I would like 25FPS).

The loading of the uff model takes about 20 minutes, I guess that some convertion are taking place during the loading. Is this correct ? Is there a way to save (once during installation) and reload (many times) the converted model in order to load faster ?

Thank you.

Hi,

You can serialize the TensorRT engine instead of converting uff each time.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#serial_model_c

This should give you the performance close to the inference 29ms.
Thanks.

Hello,
ok, I will try that when I get back from vacation.

Many thanks !