Random freezes and CUDA errors

Hi,

I might have a similar problem. I see similar messages in dmesg.

Can you check what the C stack trace is at the time it hangs? E.g. via gdb -p $PID -ex 'thread apply all bt' -ex="set confirm off" -ex quit. I see cuEventSynchronize in there.

See here for more details:

Maybe this is also related: