Hello everyone,
We are currently developing a real-time Computer Vision system at Ajinomoto, focused on guided operations in an industrial environment. The application consists of multiple custom neural networks running in parallel, each responsible for validating distinct steps within structured operational flows (called “conduct groups”).
Each group is composed of several stages like object detection, product transfer validation, wrapping confirmation, and more — all executed and validated live using video streams. We heavily rely on Docker containers, PyTorch/TensorFlow models, and GPU inference for continuous processing.
📌 Technical Stack:
- Programming Language: Python
- Frameworks: PyTorch, TensorFlow (hybrid usage)
- Containers: Multiple Docker containers (one per neural network pipeline)
- Real-time video processing with continuous GPU inference
⚙️ Current Hardware:
- CPU: Intel64 Family 6 Model 183
- RAM: 32 GB
- GPU: NVIDIA GeForce RTX 4060
- OS: Windows 10
- Deployment: On-premises industrial desktop
📈 Performance Observations:
| Phase | GPU Usage | RAM Usage |
|---|---|---|
| Standby | 4–22% | 49% |
| Process Start | Peaks at 90% | 52% |
| Mid-Process Load | 63–66% | 53% |
| CPU Usage | 12–19% |
The GPU is clearly the bottleneck, especially as we scale with more neural networks and additional logic (PLCs, APIs, parallel inspections).
❓ Our Questions:
- Optimization: How can we better optimize GPU usage? Should we integrate NVIDIA SDKs like TensorRT, DeepStream, or CUDA directly into our pipelines?
- Hardware Recommendation: What would be the ideal hardware setup as we scale? Should we move to RTX Professional, A-Series, or even Jetson devices for decentralized processing nodes?
- Architecture: Any suggestions for architectural improvements when dealing with parallel inference pipelines?
Any recommendations or pointers to similar use cases would be greatly appreciated. Our goal is to build a scalable, robust solution aligned with NVIDIA’s ecosystem both in hardware and software.
Thank you!