Tegra System Profiler fails on multithreaded application on TX1

CUPTI fails to allocate buffers with the Tegra System Profile attempting to launch a multithreaded application with the libToolsInjection32.

Running the following command:

CUDA_INJECTION32_PATH=/opt/nvidia/tegra_system_profiler/libToolsInjection32.so ./my_app

Shows that the TegraProfilerInjection spawns multiple worker threads:

I:TegraProfilerInjection: Naming service thread 17826 as [TSP].
I:TegraProfilerInjection: Naming service thread 17827 as CUPTI worker thread.
I:TegraProfilerInjection: Naming service thread 17839 as [TSP].
I:TegraProfilerInjection: Naming service thread 17840 as CUPTI worker thread.

Which will shortly emit many repeated error messages about failing to allocate the CUPTI buffer:

E:[TegraProfilerCUDA] /home/android/buildAgent/work/target-arm-linux-32/Rel/QuadD_Hotel.1/sw/devtools/Agora/Rel/QuadD_Hotel.1/QuadD/Common/InjectionSupp/Injection/Cuda/CudaInjectionInit.cpp 366 bufferRequested: Cannot allocate CUPTI buffer

For reference, this works fine with a simple single threaded app:

CUDA_INJECTION32_PATH=/opt/nvidia/tegra_system_profiler/libToolsInjection32
.so ./fp16ScalarProduct                                                                                                                     GPU Device 0: "GM20B" with compute capability 5.3

I:TegraProfilerInjection: Naming service thread 17822 as [TSP].
I:TegraProfilerInjection: Naming service thread 17823 as CUPTI worker thread.
Result: 585487.625000