Problem tensorrt5 python api with flask service

Hi, we want to use python flask to start web service to use tensorrt,when program looks like below ,everythining is ok
@app()
def inference():
cuda.init()
ctx = device.make_context()
ctx.push()

   engine = build_engie()

as you konw,build_engile will take too much time, this cause tensorrt speed has no advantages.I want to take build_engine outside of inference,looks like
def init()
build_engine()
def inference():
cuda.init()
ctx = device …

it will not work when cuda.memcpy_dtoh_async() and print :cuStreamSynchronize failed an illegal memory access was encountered .
please help me find how to solve this

Hello,

I’m not sure where or how cuda.memcpy_dtoh_async() and print :cuStreamSynchronize are called in your workflow, but I’d recommend serializing your TRT engine/model to a file offline, ready to deploy.

Take a look at this presentation on designing Flask application to expose TRT models via REST endpoints.

regards,
NVIDIA Enterprise Support

@NVES,thanks for your replay.here is the code.

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger()

def get_engine(engineFilePath):
with open(engineFilePath,“rb” ) as f,trt.Runtime(TRT_LOGGRR) as runtime:
return(runtime.deserialize_cuda_engine(f.read()))

@app.route(‘./yoloimg’,methods=[‘’])
def img():
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
ctx.push()
****
engine = get_engine()
inference code
******
ctx.pop()

above code is ok ,while get_engine take much time,so i want to put the piece code outside of img,looks like

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger()

def get_engine(engineFilePath):
with open(engineFilePath,“rb” ) as f,trt.Runtime(TRT_LOGGRR) as runtime:
return(runtime.deserialize_cuda_engine(f.read()))
++++++engine = get_engine()
@app.route(‘./yoloimg’,methods=[‘’])
def img():
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
ctx.push()
****
----- engine = get_engine()
inference code
******
ctx.pop()
it will make wrong. i find that if

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
+++++cuda.init()
+++++ device = cuda.Device(0)
+++++ctx = device.make_context()
TRT_LOGGER = trt.Logger()

def get_engine(engineFilePath):
with open(engineFilePath,“rb” ) as f,trt.Runtime(TRT_LOGGRR) as runtime:
return(runtime.deserialize_cuda_engine(f.read()))
++++++engine = get_engine()
++++++ctx.push()
@app.route(‘./yoloimg’,methods=[‘’])
def img():
— cuda.init()
— device = cuda.Device(0)
— ctx = device.make_context()
ctx.push()
****
----- engine = get_engine()
inference code
******
ctx.pop()
it will only success once.since cuda.init() only initialize once. can you help me find the solution? if neeed,i will provide you the code.

@NVES,close this issue, i have solved

@mengzhangjian @mengzhangjianwi7oq
Could you please share how do you solve this issue? I encounter the same issue as well.

I have the same problem,could you share your way to solve this issue?

can you let us know how to solve this, I’ve experience same problem?

Hi,
Please check the below link, as they might answer your concerns

Thanks!