I am using jetson xavier nx , jetpack version :4.4.1,cuda:10.2.89, ternsorrt:7.1.3
And I happen to know that unified memory is a good fit for jetson ,as its iGPu and CPU shared the same physical memory.
I use cudaMallocManaged to create memory for input and output ,when I fill the input with data from cpu , then provide it to tensorrt context to infer , then I will call cudaStreamSynchronize() as I use stream , I assume it is equivalent as to call cudaDeviceSynchronize() , I will repeat to use the input and output bufffer .
It turns out , it can success to infer 2 times ,then there is segmentation fault or bus error .
ps : I have checked the return value of cudaStreamSynchronize if it’s cudaSuccess , if not ,I will repeat call synchronize until it is ready . But I have found the return value is always cudaSuccess even there is the segmentation fault
It is different.
cudaStreamSynchronize only wait for the task that submits ob the given CUDA stream.
Did you launch TensorRT with the same cuda stream?
Hi, AastaLLL , thanks for your message . I havel also tried cudaDeviceSynchronize() ,even use them together , put cudaDeviceSynchronize() after cudaStreamSynchronize() , there is still segmentation fault or bus error . The error occurs when I try to access the input buffer or output buffer from cpu .
I put the stream in context->enqueue() method
BTW , in my process ,there are many inferences ,some not use unified memory, that is , copy from cpu memory to cuda memory and vice versa.
The bus error is usually caused by the CPU and GPU concurrent access which is NOT supported on Jetson.
Could you double-check if there is any concurrent access in your source?