I’m tring to track the behavior of oversubscription.
Such as when does the eviction is triggered or page fault handling, some thing like that.
I know that there exists nvprof profiler and visual profiler tools, but
Is there any tools or APIs that tracks the behavior of oversubscription without visualization?
For example)
Such as NVML API which can tracks the current usage of gpu memory as an API.
It would be so grateful if let me know any things related to this
Hi Woosungkang! There’s plenty of APIs that can help you track the behavior of oversubscription without visualization. I know of these that I’m sure might have what your looking for…
Prometheus is an open-source systems monitoring and alerting toolkit. It collects metrics from configured targets at given intervals, evaluates rule expressions, and can trigger alerts if specified conditions are observed. Prometheus provides a query language called PromQL, which you can use to extract specific data without visualization.
psutil is a cross-platform library for retrieving information on system utilization (CPU, memory, disks, network, sensors) and on running processes and system uptime. It’s mainly meant to be a programmatic interface for developers to monitor system resources in their applications.
cAdvisor: cAdvisor (short for Container Advisor) is an open-source tool from Google that provides container users an understanding of the resource usage and performance characteristics of their running containers. It collects, processes, and exports information about running containers, allowing you to track resource usage without visualization.
OpenTelemetry is a set of APIs, libraries, agents, and instrumentation that generate and collect telemetry data (metrics, logs, and traces) from your services. It provides a unified way to collect, process, and export telemetry data to different backends for analysis and visualization.