How to output uvm page fault memory address to the terminal via using nsys 2024.1.?

Dear community,
I replaced the CUDA memory allocation from cudaMalloc to cudaMallocManaged in c10/cuda/CUDACachingAllocator.cpp in the PyTorch open-source code, and successfully compiled it. It can be used as expected, and PyTorch version is v1.13.0 .

When training a GNN, I successfully oversubscribed the GPU memory.

When I use nsys to analyze a Python program,
nsys profile --stats=true --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --trace=cuda --cuda-memory-usage=true --show-output=true python

However there’s no occurrences of any page faults.

Then I wrote a test python program using my modified pytorch.

import torch
import torch.nn as nn
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv, SAGEConv, GATConv

import torch_geometric.transforms as T
from torch_geometric.logging import init_wandb, log
import torch_sparse,torch_scatter

# 创建一个大型神经网络
model = nn.Sequential(
    nn.Linear(1000, 10000),
    nn.Linear(10000, 10000),
    nn.Linear(10000, 10000),
    nn.Linear(1000, 100)

# 将模型移动到 GPU 上
model = model.cuda()

# 持续运行网络以占用 GPU 显存
# while True:
for i in range (1):
    # 创建一个随机输入
    input_data = torch.randn(100, 1000).cuda()
    # 在 GPU 上进行前向传播计算
    output = model(input_data)
    print("Output:", output)
    # 显示当前 GPU 显存占用情况
    print("GPU memory allocated:", torch.cuda.memory_allocated() / (1024 ** 3), "GB")

It shows GPU pages fault with UVM, but there’s still question here I want to ask:

  1. Why there’s still no contain CUDA Unified Memory CPU page faults data here?
  2. How to output the page fault memory address to the terminal? I know we can find it in the GUI, but we need to collect this data via some scripts for analysis, if we can output this info in csv it will be also better to use.
  3. How to get the CUDA Kernel Statistics in both terminal and nsys-rep? What parameters should I use for nsys profiling execution?

The “nsys profile --stats” option only exports a default set of items. So I am not surprised that it isn’t part of the default output.

@jasoncohen has this been added to the sqlite export or is it only visible in the GUI in tooltips?

Hi @zwu065 - could you please share the report file and provide us with the output of nvidia-smi command on the target system?

How could I share the report file? It seems the nsys-rep is not a legal format to upload in this blog

I am not sure why it is not letting you upload the report file in your reply. Could you upload to google drive or one drive and share the link here or DM me?

When I try to upload, it gives me error like this. Can you access it?

I have requested access to the google drive link you shared. Please accept it.

Done. Please check it.