Timeout issue connecting ncu to a python application

When running a python binary under ncu I keep hitting an error message

==ERROR== Failed to connect to process

This happens in both launch-and-attach as well as launch mode (followed by an attach). I tried to get more debug information by creation config to flush output using a config as described here (NSight Compute not finding kernels - #6 by veraj). the logs there point to a timeout when trying to attach the process, see logs below.

I16:24:03:852|CmdlineProfiler|PID415001|TID415001|:294[]:Connecting to new process 414901

I16:24:03:852|profiler_multi_api_debugger_client|PID415001|TID415001|:270[]:Creating an API debugger and attaching...

I16:24:03:852|TPS_Comms|PID415001|TID415001|:87[]:Remaining transactions: 1

I16:24:03:852|tps_sync|PID415001|TID415001|:54[]:Begin WaitForCompletion

I16:24:03:852|tps_sync|PID415001|TID415001|:30[]:Create: 0x1f981080

I16:24:08:852|tps_sync|PID415001|TID415001|:40[]:result: Timeout

I16:24:08:852|tps_sync|PID415001|TID415001|:97[]:End WaitForCompletion

I16:24:08:852|tps_sync|PID415001|TID415001|:35[]:Destroy: 0x1f981080

E16:24:08:852|synchronous_multi_api_debugger_client|PID415001|TID415001|:285[]:Failed to attach to ApiDebugger.

E16:24:08:852|profiler_multi_api_debugger_client|PID415001|TID415001|:278[]:Failed to attach ApiDebugger.

E16:24:08:852|CmdlineProfiler|PID415001|TID415001|:331[]:Failed to create ProfilerMultiApiDebugger.

I16:24:08:852|CmdlineProfiler|PID415001|TID415001|:1307[]:End

I16:24:08:852|CmdlineProfiler|PID415001|TID415001|:1331[]:Profiler shutdown requested

I am able to profile simple python applications with my setup, but I’m having issues with a specific python application (which I am unable to share more details about). I am using CUDA 12.4 with NCU 2023.1 and python 3.9