How to preload libnvidia-ptxjitcompiler.so.1?

Custom hardware device: Orin NX 8G/16G
software version:JetPack 5.1.5

In my custom board,I set the visual recognition algorithm to power on and start automatically. After the device is powered on, loading the algorithm model takes 5 to 6 seconds. However, if I log in to the device through SSH and manually run the recognition algorithm, loading the algorithm model only takes 1.1 to 2 seconds. I compared the libraries that are dependent on different running modes and found that there are differences. During self startup, the algorithm process depends on /usr/lib/arch64-linux-gnu/tgra/libnvidia-ptxjitcompiler. so.1, but it does not depend on this dynamic library during manual run.

I tried preloading this dynamic library, but it didn’t help with the loading time of the algorithm model:

LD_PRELOAD=/usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.1 nohup nice -n -10 ./bin/visual_algo_agent > /dev/null &

Is there any way to make the algorithm model load quickly during auto startup? If it is related to libnvidia-ptxjitcompiler. so. 1, how should I optimize it?I hope to receive assistance.

Dynamic libraries dependent on automatic runtime:

2783:   /vendor_app/bin/output/bin/visual_algo_agent
linux-vdso.so.1
/vendor_app/bin/output/lib/libzmq.so.5
/vendor_app/bin/output/lib/libagent_manager.so
/lib/aarch64-linux-gnu/libpthread.so.0
/vendor_app/bin/output/lib/libhlog.so
/vendor_app/bin/output/lib/libcommon_proto.so
/vendor_app/bin/output/lib/libds_algo_api.so
/vendor_app/bin/output/lib/libprotobuf.so.28
/lib/aarch64-linux-gnu/libstdc++.so.6
/lib/aarch64-linux-gnu/libgcc_s.so.1
/lib/aarch64-linux-gnu/libc.so.6
/lib/ld-linux-aarch64.so.1
/vendor_app/bin/output/lib/libthread_manager.so
/vendor_app/bin/output/lib/libzmq_util.so
/usr/local/cuda/targets/aarch64-linux/lib/libcudart.so.11.0
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvdsgst_meta.so
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvds_meta.so
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvdsgst_helper.so
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvdsgst_customhelper.so
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvdsgst_smartrecord.so
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvds_msgbroker.so
/lib/aarch64-linux-gnu/libyaml-cpp.so.0.6
/lib/aarch64-linux-gnu/libgstrtspserver-1.0.so.0
/lib/aarch64-linux-gnu/libnvinfer.so.8
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0
/lib/aarch64-linux-gnu/libjson-glib-1.0.so.0
/lib/aarch64-linux-gnu/libgobject-2.0.so.0
/lib/aarch64-linux-gnu/libglib-2.0.so.0
/lib/aarch64-linux-gnu/libm.so.6
/lib/aarch64-linux-gnu/libz.so.1
/lib/aarch64-linux-gnu/libdl.so.2
/lib/aarch64-linux-gnu/librt.so.1
/lib/aarch64-linux-gnu/libgstrtp-1.0.so.0
/lib/aarch64-linux-gnu/libgstpbutils-1.0.so.0
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvds_logger.so
/lib/aarch64-linux-gnu/libgstnet-1.0.so.0
/lib/aarch64-linux-gnu/libgstbase-1.0.so.0
/lib/aarch64-linux-gnu/libgstrtsp-1.0.so.0
/lib/aarch64-linux-gnu/libgstsdp-1.0.so.0
/lib/aarch64-linux-gnu/libgstapp-1.0.so.0
/lib/aarch64-linux-gnu/libgio-2.0.so.0
/usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so
/usr/local/cuda/targets/aarch64-linux/lib/libcudla.so.1
/lib/aarch64-linux-gnu/libgmodule-2.0.so.0
/lib/aarch64-linux-gnu/libffi.so.7
/lib/aarch64-linux-gnu/libpcre.so.3
/lib/aarch64-linux-gnu/libgstvideo-1.0.so.0
/lib/aarch64-linux-gnu/libgstaudio-1.0.so.0
/lib/aarch64-linux-gnu/libgsttag-1.0.so.0
/lib/aarch64-linux-gnu/libmount.so.1
/lib/aarch64-linux-gnu/libselinux.so.1
/lib/aarch64-linux-gnu/libresolv.so.2
/usr/lib/aarch64-linux-gnu/tegra/libnvos.so
/usr/lib/aarch64-linux-gnu/tegra/libnvdla_runtime.so
/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
/lib/aarch64-linux-gnu/liborc-0.4.so.0
/lib/aarch64-linux-gnu/libblkid.so.1
/lib/aarch64-linux-gnu/libpcre2-8.so.0
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_host1x.so
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_mem.so
/usr/lib/aarch64-linux-gnu/tegra/libnvsocsys.so
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_sync.so
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_chip.so
/usr/lib/aarch64-linux-gnu/tegra/libnvsciipc.so
/usr/local/cuda/targets/aarch64-linux/lib/libnvrtc.so
/lib/aarch64-linux-gnu/libnvcucompat.so
/lib/aarch64-linux-gnu/libnss_files.so.2
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_multistream.so
/usr/lib/aarch64-linux-gnu/tegra/libnvbufsurface.so.1.0.0
/usr/lib/aarch64-linux-gnu/tegra/libgstnvdsseimeta.so.1.0.0
/usr/lib/aarch64-linux-gnu/tegra/libnvdsbufferpool.so.1.0.0
/usr/lib/aarch64-linux-gnu/tegra/libnvbufsurftransform.so.1.0.0
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_surface.so
/lib/aarch64-linux-gnu/libEGL.so.1
/usr/lib/aarch64-linux-gnu/tegra/libnvbuf_fdmap.so.1.0.0
/usr/lib/aarch64-linux-gnu/tegra/libnvvic.so
/lib/aarch64-linux-gnu/libGLdispatch.so.0
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_stream.so
/usr/lib/aarch64-linux-gnu/tegra/libnvcolorutil.so
/usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
/usr/lib/aarch64-linux-gnu/tegra/libnvidia-glsi.so.35.6.2
/usr/lib/aarch64-linux-gnu/tegra/libnvidia-rmapi-tegra.so.35.6.2
/usr/lib/aarch64-linux-gnu/tegra/libnvidia-egl-gbm.so.1
/lib/aarch64-linux-gnu/libdrm.so.2
/lib/aarch64-linux-gnu/libgbm.so.1
/lib/aarch64-linux-gnu/libwayland-server.so.0
/lib/aarch64-linux-gnu/libexpat.so.1
/usr/lib/aarch64-linux-gnu/tegra/libnvidia-eglcore.so.35.6.2
/usr/lib/aarch64-linux-gnu/tegra/libnvdc.so
/usr/lib/aarch64-linux-gnu/tegra/libnvimp.so
/usr/lib/aarch64-linux-gnu/tegra/libnvidia-egl-wayland.so.1
/lib/aarch64-linux-gnu/libwayland-client.so.0
/lib/aarch64-linux-gnu/libEGL_mesa.so.0
/lib/aarch64-linux-gnu/libglapi.so.0
/lib/aarch64-linux-gnu/libX11-xcb.so.1
/lib/aarch64-linux-gnu/libxcb.so.1
/lib/aarch64-linux-gnu/libxcb-dri2.so.0
/lib/aarch64-linux-gnu/libxcb-xfixes.so.0
/lib/aarch64-linux-gnu/libxcb-dri3.so.0
/lib/aarch64-linux-gnu/libxcb-present.so.0
/lib/aarch64-linux-gnu/libxcb-sync.so.1
/lib/aarch64-linux-gnu/libxshmfence.so.1
/lib/aarch64-linux-gnu/libXau.so.6
/lib/aarch64-linux-gnu/libXdmcp.so.6
/lib/aarch64-linux-gnu/libbsd.so.0
/usr/local/cuda/targets/aarch64-linux/lib/libnvToolsExt.so.1
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstrtsp.so
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstrealmedia.so
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstasf.so
/lib/aarch64-linux-gnu/libgstriff-1.0.so.0
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstcoreelements.so
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstplayback.so
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libgstnvvideoconvert.so
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstvideoparsersbad.so
/lib/aarch64-linux-gnu/libgstcodecparsers-1.0.so.0
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstrtp.so
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvvideo4linux2.so
/lib/aarch64-linux-gnu/libgstallocators-1.0.so.0
/lib/aarch64-linux-gnu/libv4l2.so.0
/usr/lib/aarch64-linux-gnu/tegra/libgstnvcustomhelper.so
/lib/aarch64-linux-gnu/libv4lconvert.so.0
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstudp.so
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_osd.so
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvds_osd.so
/lib/aarch64-linux-gnu/libcairo.so.2
/lib/aarch64-linux-gnu/libpango-1.0.so.0
/lib/aarch64-linux-gnu/libpangocairo-1.0.so.0
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvds_utils.so
/lib/aarch64-linux-gnu/libpixman-1.so.0
/lib/aarch64-linux-gnu/libfontconfig.so.1
/lib/aarch64-linux-gnu/libfreetype.so.6
/lib/aarch64-linux-gnu/libpng16.so.16
/lib/aarch64-linux-gnu/libxcb-shm.so.0
/lib/aarch64-linux-gnu/libxcb-render.so.0
/lib/aarch64-linux-gnu/libXrender.so.1
/lib/aarch64-linux-gnu/libX11.so.6
/lib/aarch64-linux-gnu/libXext.so.6
/lib/aarch64-linux-gnu/libfribidi.so.0
/lib/aarch64-linux-gnu/libthai.so.0
/lib/aarch64-linux-gnu/libharfbuzz.so.0
/lib/aarch64-linux-gnu/libpangoft2-1.0.so.0
/lib/aarch64-linux-gnu/libuuid.so.1
/lib/aarch64-linux-gnu/libdatrie.so.1
/lib/aarch64-linux-gnu/libgraphite2.so.3
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvds_infer.so
/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8
/lib/aarch64-linux-gnu/libnvonnxparser.so.8
/lib/aarch64-linux-gnu/libnvparsers.so.8
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvds_inferlogger.so
/opt/nvidia/deepstream/deepstream-6.3/lib/libnvds_inferutils.so
/usr/local/cuda/targets/aarch64-linux/lib/libcublas.so.11
/usr/local/cuda/targets/aarch64-linux/lib/libcublasLt.so.11
/lib/aarch64-linux-gnu/libcudnn.so.8
/lib/aarch64-linux-gnu/libcrypto.so.1.1
/usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.1
/vendor_app/bin/output/chcnav_algo/fvs_ai_core/libs/libnvdsinfer_custom_impl_Yolo.so
/usr/lib/aarch64-linux-gnu/gio/modules/libgiognomeproxy.so
/usr/lib/aarch64-linux-gnu/gio/modules/libdconfsettings.so
/usr/lib/aarch64-linux-gnu/gio/modules/libgiolibproxy.so
/lib/aarch64-linux-gnu/libproxy.so.1
/usr/lib/aarch64-linux-gnu/libproxy/0.4.15/modules/network_networkmanager.so
/lib/aarch64-linux-gnu/libdbus-1.so.3
/lib/aarch64-linux-gnu/libsystemd.so.0
/lib/aarch64-linux-gnu/liblzma.so.5
/lib/aarch64-linux-gnu/liblz4.so.1
/lib/aarch64-linux-gnu/libgcrypt.so.20
/lib/aarch64-linux-gnu/libgpg-error.so.0

Hi,
It seems like your algorithm does not need the lib. For a try you may backup the lib and remove the path. See if the algorithm can be run successfully.

By default the lib is required and we don’t test the without-the-lib environment. You may give it a try and please note there may be potential issues.

Hi,DaneLLL:

I removed libnvidia-ptxjitcompiler.so.1 and algorithm program run success, And the loading time of the engine has been reduced from 4.7 seconds to 2.9 seconds,There is two questions:

1.Will deleting this library have any impact on the system? I found that this is a library used by NVIDIA GPU for compilation?
2.Preloading another small model before running the algorithm program to make the GPU run will significantly reduce the time required to load the engine. The larger the model file loaded, the less time it takes to start our algorithm. I hope the algorithm can start very quickly (within 1 second), which is our requirement. Do you have any suggestions to achieve this?
Please help me.

Hi,

The library is used for Just-In-Time PTX code creation.
If all the CUDA kernels have been built with the correct GPU architecture, the package may not be used.

You can confirm this with $CUDA_DISABLE_PTX_JIT=1 ./test.

Thanks.

Hi,AastaLLL:

In the Orin startup script, I added the command:

CUDA_DISABLE_PTX_JIT=1 ./bin/visual_algo_agent > /home/nvidia/visual.log &

If the algorithm program is launched in a self starting script and the algorithm model is loaded, it will cause the program to exit directly,However, if I execute the same command on the terminal(by ssh), the algorithm model can be loaded normally and the program runs normally.

Here is a partial code:

ret = gst_element_get_state(g_ctx.appCtx->pipeline.pipeline, NULL, NULL, 5 * GST_SECOND);
    gettimeofday(&step10_state_changed, NULL);
    step10_paused_total = (step10_state_changed.tv_sec - step10_start.tv_sec) * 1000000 + (step10_state_changed.tv_usec - step10_start.tv_usec);
    g_print("[ALGO] [TIME] step 10.1-set PAUSED (including TensorRT loading): %lld us (%.3f ms)\n", step10_paused_total, step10_paused_total / 1000.0);
    HLOG(ALGO_API, INFO, "step 10.1-set PAUSED (including TensorRT loading): %.3f ms", step10_paused_total / 1000.0);
    hlog_flush();
g_print("[ALGO] step 10.1: set pipeline to PAUSED (will trigger TensorRT engine loading)...\n");

    HLOG(ALGO_API, INFO, "step 10.1.0-260109-zx-debug-0");
    hlog_flush();

    GstStateChangeReturn ret = gst_element_set_state(
        g_ctx.appCtx->pipeline.pipeline, GST_STATE_PAUSED);

    HLOG(ALGO_API, INFO, "step 10.1.1-260109-zx-debug-1");
    hlog_flush();

    if (ret == GST_STATE_CHANGE_FAILURE) {
        g_print("[ALGO] Failed to set pipeline to PAUSED\n");
        set_error_state(ALGO_ERROR_PIPELINE_FAILED);
        g_main_loop_unref(g_ctx.main_loop);
        destroy_pipeline(g_ctx.appCtx);
        g_free(g_ctx.testAppCtx);
        g_free(g_ctx.appCtx);
        return 1;  // 失败
    }

Code snippets for program execution and exit:

GstStateChangeReturn ret = gst_element_set_state(
        g_ctx.appCtx->pipeline.pipeline, GST_STATE_PAUSED);

Hi,

Is there any log/error shown in the /home/nvidia/visual.log?

Thanks.

Hi,AastaLLL:

No log output and no coredump file generated.

The program exited without any errors.

Hi,
If it does not work well when the lib is removed, we would suggest keep it. We never try the environment removing the lib so there may be potential issues.

Hi,DaneLLL:

But if I directly delete libnvidia-ptxjitcompiler. so. 1, the algorithm program can also run, and the algorithm startup time only takes 2.4 seconds. If I add libnvidia-ptxjitcompiler. so. 1, the startup time takes 5 seconds.

Hi,
If you prefer removing the lib, you may remove it as customization. If certain applications are not run properly, you may add it back.

Hi,DaneLLL:

I’m not sure about the risks of removing the library. Can we do a good job with libnvidia-ptxjitcompiler. so. 1 when generating the model? This way, there is no need to use this compilation library when running the algorithm.