Tensorrt engine failed to infer in a Flask server

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import tensorrt as trt
import cv2
import time
import numpy as np
import _init_paths

#import server modules
from flask import Flask, request, Response
import jsonpickle 

import jsonpickle.ext.numpy as jsonpickle_numpy
jsonpickle_numpy.register_handlers()

from trt_demo import create_face_detector


face_detector = create_face_detector()

#initialize flask application
app = Flask(__name__)


#route http posts to this method
@app.route("/api/mtcnn", methods=["POST"])
def test():
    r = request
    nparr = np.fromstring(r.data, np.uint8)
    #decode image
    img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    #print("Shape of image is ", img.shape)
    #detect face

    all_boxes, landmarks = face_detector.detect_face(img)

    response = {"bboxes" : all_boxes, "landmarks" : landmarks}

    response_pickled = jsonpickle.encode(response)

    return Response(response=response_pickled, status=200, mimetype="application/json")


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8000)

Hi I built a HTTP server like shown above. When I send a image to the server, the model could be built but the inference failed like below:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 2309, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 2295, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1741, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/src/app/mtcnn_trt/utils/server.py", line 125, in test
    all_boxes, landmarks = face_detector.detect_face(img)
  File "/usr/src/app/mtcnn_trt/lib/face_detector/mtcnn_face_detector.py", line 441, in detect_face
    boxes, boxes_c, landmark = self.detect_pnet(img)
  File "/usr/src/app/mtcnn_trt/lib/face_detector/mtcnn_face_detector.py", line 210, in detect_pnet
    cls_cls_map, reg = self.pnet_predict(im_resized, scale_cnt)
  File "/usr/src/app/mtcnn_trt/lib/face_detector/mtcnn_face_detector.py", line 191, in pnet_predict
    cls_prob, bbox_pred, _ = engine.infer(np.transpose(img, (2, 0, 1)))
  File "/usr/lib/python2.7/dist-packages/tensorrt/lite/engine.py", line 658, in infer
    stream = cuda.Stream()
LogicError: explicit_context_dependent failed: invalid device context - no currently active context?

But if I try to detect for an image outside this app, it will be fine. Any advice would be greatly appreciated!

1 Like

I have the same problem. Have you resoled this ? Thanks.

The problem is that under Python HTTP context PyCUDA would fail and it’s inevitable. I ended up switching to a TCP server. If you insist to use a HTTP server, try GO - C++ combo which does not impose any context to your handler function (I am the guy who posted this and I lost that account because it was locked)

Can you share you C++ code ? thanks.

Check this github, it’s great and It has auto scaling capability. we added a lot of our own developments based on this.
[url]https://github.com/NVIDIA/gpu-rest-engine[/url]

get! thank you very much.

Could you please try this one and let me know the result?

app.run(host="localhost", port=5000, debug=False, use_reloader=False)

It will prevent flask from spawning a new process to handle the request (pyCUDA would fail with multiprocess in Python)

Sorry for my delay, I was learned tf serving the other day. The result is same as before.
[2018-08-15 06:15:13,012] ERROR in app: Exception on /classify [POST]
Traceback (most recent call last):
File “/usr/local/lib/python2.7/dist-packages/flask/app.py”, line 2292, in wsgi_app
response = self.full_dispatch_request()
File “/usr/local/lib/python2.7/dist-packages/flask/app.py”, line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File “/usr/local/lib/python2.7/dist-packages/flask/app.py”, line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File “/usr/local/lib/python2.7/dist-packages/flask/app.py”, line 1813, in full_dispatch_request
rv = self.dispatch_request()
File “/usr/local/lib/python2.7/dist-packages/flask/app.py”, line 1799, in dispatch_request
return self.view_functionsrule.endpoint
File “/usr/local/lib/python2.7/dist-packages/tensorrt/examples/resnet_as_a_service/resnet_as_a_service.py”, line 117, in json_classify
results = engine.infer(image_to_np_CHW(img))[0]
File “/usr/local/lib/python2.7/dist-packages/tensorrt/lite/engine.py”, line 658, in infer
stream = cuda.Stream()
LogicError: explicit_context_dependent failed: invalid device context - no currently active context?
127.0.0.1 - - [15/Aug/2018 06:15:13] “POST /classify HTTP/1.1” 500 -

i think i fix it .
1、you must put your mode init and infer into one thread
2、when you doing infer add attach and detach when you allocate buffer like this
ctx = cuda.Context.attach()
inputs, outputs, bindings, stream = common.allocate_buffers(self.engine)
ctx.detach()

I tried it,
It doesn’t solve the problem
But thanks a lot for your hint

Hi,
We recommend you to raise this query in TRITON forum for better assistance.

Thanks!

use threaded = False
app.run(“0.0.0.0”,port =5000,Threaded = False)