Problem tensorrt5 python api with flask service

Hi, we want to use python flask to start web service to use tensorrt,when program looks like below ,everythining is ok
@app()
def inference():
cuda.init()
ctx = device.make_context()
ctx.push()

   engine = build_engie()

as you konw,build_engile will take too much time, this cause tensorrt speed has no advantages.I want to take build_engine outside of inference,looks like
def init()
build_engine()
def inference():
cuda.init()
ctx = device …

it will not work when cuda.memcpy_dtoh_async() and print :cuStreamSynchronize failed an illegal memory access was encountered .
please help me find how to solve this

Hello,

I’m not sure where or how cuda.memcpy_dtoh_async() and print :cuStreamSynchronize are called in your workflow, but I’d recommend serializing your TRT engine/model to a file offline, ready to deploy.

Take a look at this presentation on designing Flask application to expose TRT models via REST endpoints.

regards,
NVIDIA Enterprise Support

@NVES,thanks for your replay.here is the code.

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger()

def get_engine(engineFilePath):
with open(engineFilePath,“rb” ) as f,trt.Runtime(TRT_LOGGRR) as runtime:
return(runtime.deserialize_cuda_engine(f.read()))

@app.route(’./yoloimg’,methods=[’’])
def img():
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
ctx.push()
****
engine = get_engine()
inference code
******
ctx.pop()

above code is ok ,while get_engine take much time,so i want to put the piece code outside of img,looks like

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger()

def get_engine(engineFilePath):
with open(engineFilePath,“rb” ) as f,trt.Runtime(TRT_LOGGRR) as runtime:
return(runtime.deserialize_cuda_engine(f.read()))
++++++engine = get_engine()
@app.route(’./yoloimg’,methods=[’’])
def img():
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
ctx.push()
****
----- engine = get_engine()
inference code
******
ctx.pop()
it will make wrong. i find that if

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
+++++cuda.init()
+++++ device = cuda.Device(0)
+++++ctx = device.make_context()
TRT_LOGGER = trt.Logger()

def get_engine(engineFilePath):
with open(engineFilePath,“rb” ) as f,trt.Runtime(TRT_LOGGRR) as runtime:
return(runtime.deserialize_cuda_engine(f.read()))
++++++engine = get_engine()
++++++ctx.push()
@app.route(’./yoloimg’,methods=[’’])
def img():
— cuda.init()
— device = cuda.Device(0)
— ctx = device.make_context()
ctx.push()
****
----- engine = get_engine()
inference code
******
ctx.pop()
it will only success once.since cuda.init() only initialize once. can you help me find the solution? if neeed,i will provide you the code.

@NVES,close this issue, i have solved

@mengzhangjian @mengzhangjianwi7oq
Could you please share how do you solve this issue? I encounter the same issue as well.

I have the same problem,could you share your way to solve this issue?