Multiple Processes performing inference using the same TF Graph

Hello,

I’m currently working on an python based application performing basic object detection using tensorflow. My application is modeled off of https://github.com/datitran/object_detector_app with the main process taking frames from the camera (which is running in a seperate process), passing them into a queue which is then read by n worker processes which perform the inference. This application works fine if I only have 1 worker performing inference. When I have more than 1 worker each worker will load the graph from memory, process exactly 1 frame of data, then fail without raising an exception. The only reason I can tell it fails is that the application will begin spawning another worker process.

I’ve run the exact same code on a PC running the CPU version of TF with no problem. If anyone has any ideas for why this is happening I would love to hear them. Otherwise, does anyone have an idea to increase the framerate without spawning additional worker processes?

Thanks,
Ben

Hi,

May I know the number of TensorFlow sessions you created?
Do you use the same TF session or create one for each process?

TensorFlow by default allocates all the available memory, which may cause the second session without enough memory.
Is it a possible cause for your application?

Maybe you can try to limit the amount of memory first:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

Thanks.

So my default was to spawn two workers which would be two separate tensorflow sessions. I used different tensorflow sessions for each graph, as I thought that using the same tensoflow session across two workers would not provide an increase in total FPS (bottle necked at the TF graph). So I don’t know for sure, but in the stdout log when the TF session instantiates, it lists total GPU memory and available GPU memory. When running this I remember that after the first two graphs are created it lists available GPU memory as ~8GB, the problem is when after the first two workers fail (for whatever reason, no exception is thrown) they still appear to have a lock on the memory, because after the second two workers and graphs are created it lists the available GPU memory as ~150MB. I typically exit the program at this point because if the second two workers fail and the third set of two workers are created then the Jetson typically freezes.

I’m working remotely on a different project today so I don’t have access to the Jetson, but I will provide a log of the stdout tomorrow. I will also try limiting the amount of memory as well.

Hi,

Here are two more suggestions for your reference:

1. Try to limit the memory of each worker no more than 0.4 fraction.

2. Try to add session.close() to force the app return the memory.
Although this may not work for an unexpected failure.

Thanks.

Hey sorry for the late reply on this, work got crazy on a two week sprint for a seperate project and this was sidelined for the last two weeks.

I will try limiting the amount of memory that each worker receives. Does this number (0.4) change if I also use TensorRT? If I decide to use more than two workers what is the total maximum memory usage you recommend?

I have a session.close() located in myworker code;however, it doesn’t seem to work. Moving the session object into a context manager wouldn’t help in this case because python is not returning an exception.

Hi,

Thanks for your feedback.

The value will limit the memory amount that allocated by TensorFlow.
For example, 0.4 indicates tf.session() will request an [Total Memory]*0.4 memory when creating.
So you can set the value based on the number of workers you want to use.

Thanks.