Problem tensorrt5 python api with flask service

mengzhangjian · December 17, 2018, 11:45am

Hi, we want to use python flask to start web service to use tensorrt,when program looks like below ,everythining is ok
@app()
def inference():
cuda.init()
ctx = device.make_context()
ctx.push()

   engine = build_engie()

as you konw,build_engile will take too much time, this cause tensorrt speed has no advantages.I want to take build_engine outside of inference,looks like
def init()
build_engine()
def inference():
cuda.init()
ctx = device …
…
it will not work when cuda.memcpy_dtoh_async() and print :cuStreamSynchronize failed an illegal memory access was encountered .
please help me find how to solve this

NVES · December 17, 2018, 4:07pm

Hello,

I’m not sure where or how cuda.memcpy_dtoh_async() and print :cuStreamSynchronize are called in your workflow, but I’d recommend serializing your TRT engine/model to a file offline, ready to deploy.

Take a look at this presentation on designing Flask application to expose TRT models via REST endpoints.

regards,
NVIDIA Enterprise Support

mengzhangjianwi7oq · December 18, 2018, 6:34am

@NVES，thanks for your replay.here is the code.

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger()

def get_engine(engineFilePath):
with open(engineFilePath,“rb” ) as f,trt.Runtime(TRT_LOGGRR) as runtime:
return(runtime.deserialize_cuda_engine(f.read()))

@app.route(‘./yoloimg’,methods=[‘’])
def img():
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
ctx.push()
****
engine = get_engine()
inference code
******
ctx.pop()

above code is ok ,while get_engine take much time,so i want to put the piece code outside of img,looks like

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger()

def get_engine(engineFilePath):
with open(engineFilePath,“rb” ) as f,trt.Runtime(TRT_LOGGRR) as runtime:
return(runtime.deserialize_cuda_engine(f.read()))
++++++engine = get_engine()
@app.route(‘./yoloimg’,methods=[‘’])
def img():
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
ctx.push()
****
----- engine = get_engine()
inference code
******
ctx.pop()
it will make wrong. i find that if

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
+++++cuda.init()
+++++ device = cuda.Device(0)
+++++ctx = device.make_context()
TRT_LOGGER = trt.Logger()

def get_engine(engineFilePath):
with open(engineFilePath,“rb” ) as f,trt.Runtime(TRT_LOGGRR) as runtime:
return(runtime.deserialize_cuda_engine(f.read()))
++++++engine = get_engine()
++++++ctx.push()
@app.route(‘./yoloimg’,methods=[‘’])
def img():
— cuda.init()
— device = cuda.Device(0)
— ctx = device.make_context()
ctx.push()
****
----- engine = get_engine()
inference code
******
ctx.pop()
it will only success once.since cuda.init() only initialize once. can you help me find the solution? if neeed,i will provide you the code.

mengzhangjianwi7oq · December 18, 2018, 7:03am

@NVES,close this issue, i have solved

chienchih.lin · October 27, 2019, 4:19am

@mengzhangjian @mengzhangjianwi7oq
Could you please share how do you solve this issue? I encounter the same issue as well.

462181979 · June 18, 2020, 1:04am

I have the same problem,could you share your way to solve this issue?

krzysztof.osinski · October 15, 2021, 10:40am

can you let us know how to solve this, I’ve experience same problem?

NVES · October 15, 2021, 5:02pm

Hi,
Please check the below link, as they might answer your concerns

Thanks!

Topic		Replies	Views
Tensorrt engine failed to infer in a Flask server TensorRT	11	2371	December 30, 2021
Cuda Error in launchPwgenKernel- When running a specific engine in async TensorRT tensorrt	9	2161	June 11, 2022
How to use TensorRT by the multi-threading package of python Jetson AGX Xavier tensorrt	13	18772	October 18, 2021
Run inference on a batch of images & parallel inference using cuda on python threads TensorRT tensorrt , cuda	6	2301	January 6, 2022
Cuda Runtime (invalid resource handle) when use TensorRT and Pytorch(on GPU) simultaneously TensorRT	5	2972	December 17, 2024
[TensorRT] ERROR: 1: [resize.cu::performLinearKernelLaunch::457] Error Code 1: Cuda Runtime (invalid argument) TensorRT tensorrt , cupy	4	5261	June 14, 2022
Work with batch in TensorRT TensorRT tensorrt , opencv , cuda , tensorflow	20	3838	July 20, 2021
TensorRT ROS2 Node TensorRT	1	826	November 15, 2023
TensorRT 8.5.2-1+cuda11.8: pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid argument TensorRT tensorrt , jetson-inference , pycuda , jetson	2	1903	March 8, 2023
TensorRT inference context in ROS callback TensorRT tensorrt , cuda	13	2553	January 8, 2023

Problem tensorrt5 python api with flask service

Related topics