[Jetson Orin NX | JetPack 6.2 | Metropolis Stack] - Prometheus target error on emdat-analytics (port 6000) + RedisTS issues after RTSP disconnections

juancruzgassoloncan1 · April 10, 2025, 12:35pm

Hardware Platform: Jetson Orin NX 16GB/8GB
JetPack Version: 6.2 (L4T 36.4.3)
Metropolis Components:
- ai_nvr 2.0.1 (jps_v1.2.9)
- nvidia-jetson-services/stable 2.0.0
Monitoring Stack: Prometheus + Grafana

We have a valid and working prometheus.yaml file with a specific job defined for the emdat-analytics service:

- job_name: 'emdat-analytics'
  scrape_interval: 10s
  static_configs:
    - targets: ['localhost:6000']

We are experiencing intermittent failures in one of our Prometheus targets — specifically, the emdat-analytics endpoint on localhost:6000.

Prometheus shows the following error:

Error scraping target:
Get "http://localhost:6000/metrics": dial tcp 127.0.0.1:6000: connect: connection refused

This also fails locally on the host (curl http://localhost:6000/metrics).
The port seems closed and no process is listening after some time.
We have confirmed that the target is correctly defined in prometheus.yaml (not autogenerated).

This behavior seems to be associated with the emdx-analytics module, likely involving Redis TimeSeries and streaming conditions:

We are using RTSP streams with DeepStream.
RTSP cameras intermittently disconnect (likely due to network or source instability).
After disconnection:
- DeepStream doesn’t reconnect.
- The SDR service does not re-aggregate the sensor streams.
- Visual elements such as ROIs and Tripwires are visible in the VST, but their counters remain stuck at 0.
In some cases, restarting services manually restores functionality:
- Restarting sdr, sdr-emdx, and emdx-analytics-01/02 is sometimes enough.
- Other times, we must restart the entire Docker Compose stack.

Which service or component is responsible for exposing the /metrics endpoint on port 6000?

Are there any known issues with RedisTS or metric exporters under unstable input conditions?
What logs or traces can we inspect to better understand the root cause?

We’re considering using Grafana Alerts + Webhooks to restart services automatically.
- Is this a valid approach in the context of Metropolis?
- Would restarting specific containers (instead of full docker-compose down && up) be safe?
Are there any NVIDIA-recommended tools or patterns for high-availability service recovery?

We can provide:

We’re trying to:

We would appreciate any guidance from the NVIDIA team or community users who have experienced similar behavior.

kesong · April 11, 2025, 3:20am

Thanks for your report. I will check and feedback later.

Topic		Replies	Views
Cannot access FOV API Endpoints Metropolis Microservices for Jetson deepstream	7	54	September 23, 2024
Metropolis microservices not working Metropolis Microservices for Jetson	2	368	March 14, 2024
“NETDEV WATCHDOG: eth0 (r8168) error Jetson Orin NX ethernet	11	838	June 4, 2024
The GPU does not work while DLA is running, and restarting the container does not allocate memory properly Metropolis Microservices for Jetson	17	55	September 23, 2024
Deepstream docker restarts every time in nvidia metropolis Metropolis Microservices for Jetson docker , metropolis , deepstream	20	79	August 5, 2024
NVStreamer docker failed Metropolis Microservices for Jetson	6	526	March 3, 2024
Unable to see video analytics with metropolis Metropolis Microservices for Jetson	9	546	March 18, 2024
Unable to start AI NVR application on Orin NX16 Metropolis Microservices for Jetson	10	54	August 12, 2024
JPS 2.0 - emdx analytics issue Metropolis Microservices for Jetson nvbugs , jetson , deepstream	1	75	January 6, 2025
NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu when reboot on Orin NX module Jetson Orin NX board-design , reboot	20	1779	July 14, 2024