Not sure if I understand your problem correctly.
It seems that you try to run the inference as a kind of callback function from the internet.
Then, a common error is that the CUDA context is refreshed and mixed up with other applications.
Please store the CUDA context before leaving the yolo_detection function and restore it when back.