Model Used : Detector : PeopleNet (Resnet34 ) Classification : Resnet50 (classification-1) , Resnet18 (classification-2) and Resnet18 (classification-3).
I have used deepstream-imagedata-multistream test application and customize its pipeline according to my use case. The application works fine till 2 hours but after that it gives the illegal memory uses error.
Error screen shot is mentioned below.
Hi @Pritam ,
Could you rin nvidia-smi to monitor the memorys usage during your application is running to check if CUDA memory consumption increases continuously,
We have checked the memory in between it does not rises after using 3.5 GB of GPU memory out of 11GB, it seems not be an issue. We have observed same problem again with low memory usage. Please refer logs for attached if they provide some more clue.surveillance_gateway.err.log (2.0 MB)
It majorly gives Corrupted double linked-list error, like following errors:
nvbufsurftransform.cpp(2369) : getLastCudaError() CUDA error : Recevied NvBufSurfTransformError_Execution_Error : (400) invalid resource handle.
corrupted double-linked list
nvbufsurftransform.cpp(2369) : getLastCudaError() CUDA error : Recevied NvBufSurfTransformError_Execution_Error : (709) context is destroyed.
corrupted double-linked list
nvbufsurftransform.cpp(2369) : getLastCudaError() CUDA error : Recevied NvBufSurfTransformError_Execution_Error : (400) invalid resource handle.
nvbufsurftransform.cpp(2369) : getLastCudaError() CUDA error : Recevied NvBufSurfTransformError_Execution_Error : (400) invalid resource handle.
corrupted size vs. prev_size
nvbufsurftransform.cpp(2369) : getLastCudaError() CUDA error : Recevied NvBufSurfTransformError_Execution_Error : (700) an illegal memory access was encountered.
corrupted double-linked list
nvbufsurftransform.cpp(2369) : getLastCudaError() CUDA error : Recevied NvBufSurfTransformError_Execution_Error : (46) all CUDA-capable devices are busy or unavailable.
free(): corrupted unsorted chunks
GPUassert: an illegal memory access was encountered src/modules/cuDCF/cudaCropScaleInTexture2D.cu 1254
corrupted double-linked list
GPUassert: an illegal memory access was encountered src/modules/NvDCF/NvDCF.cpp 3461
corrupted double-linked list
from the log. the earliset failure log is log below, seems the CUDA conext is destroyed when the CUDA task is still running, could you check why & where CUDA conext is destroyed?
Hi,
I am also facing the same issue, I have customized the deepstream-test python apps. It runs 24x7 with around 10-15 rtsp streams. We are getting this issue after 5-6 hours of run.
As you have suggested to check the CUDA context state, can you please elaborate how we can do the same? sorry if I am asking too basic thing, as I am not a pro at this.