Error exporting .ncu-rep files

While executing the command:

ncu -i yolo_infer_bs_32_epoch_20.ncu-rep --csv --page raw 

I get the following error:

[libprotobuf ERROR /dvs/p4/build/sw/devtools/Agora/Rel/DTC_F/Imports/Source/ProtoBuf/protobuf-3_21_1/src/google/protobuf/message_lite.cc:133] Can't parse message of type "NV.Profiler.Messages.ProfileResult" because it is missing required fields: RuleResults[0].Body.Items[0].Message.Type
==ERROR== Failed to load report file 'yolo_infer_bs_64_epoch_20.ncu-rep'.

Can someone help me out here?

Can you attach the report file for check ?

I am not able to attach the file but I have attaching the code to reproduce the results:

Main bash script:

#!/bin/bash

batch_sizes=(32 64)
# batch_sizes=(128 256 1024)
input_script="inference_scripts/yolo_infer.py"
output_dir="results/ncu-reps/29-Jan-automation/yolo"

echo "$input_script"

script_name=$(basename "$input_script" .py)
for batch_size in "${batch_sizes[@]}"; do
    echo "Profiling for batch size $batch_size"
    sudo ncu --target-processes all --set roofline -f -o "$output_dir/${script_name}_bs_${batch_size}_epoch_20" bash exp_script.sh "$input_script" "$batch_size"
done
echo "PROFILING SUCCESSFUL!"

exp_script.sh

#!/bin/bash
python3 "$1" --batch_size "$2"

yolo_infer.py

import numpy as np
from ultralytics import YOLO
import argparse
import torch
from warnings import filterwarnings
import logging
import os

filterwarnings('ignore')

parser = argparse.ArgumentParser()
parser.add_argument('--epochs', default=20, type=int)
parser.add_argument('--batch_size', default=32, type=int)
args = parser.parse_args()

epochs = args.epochs
batch_size = args.batch_size

# Set up logging
log_filename = f'yolo_epochs_{epochs}_batchsize_{batch_size}.log'
logging.basicConfig(filename=log_filename, level=logging.INFO, format='%(asctime)s - %(message)s')

# Load the YOLOv8 model (pre-trained on COCO dataset)
model = YOLO('yolov8n.pt')  # Use 'yolov8n.pt' for a small model; adjust as needed.
model.to('cuda')

# Function to generate random images
def generate_random_images(bs=16, width=640, height=640):
    images = np.array([np.random.randint(0, 256, (height, width, 3), dtype=np.uint8)
                       for _ in range(bs)])
    return torch.tensor(images).permute(0, 3, 1, 2).float().cuda()

# Perform inference using YOLOv8
def perform_inference(model, images):
    results = []
    for idx, img in enumerate(images):
        # YOLOv8 requires images in BGR format (OpenCV)
        result = model.predict(img, conf=0.25, verbose=False)  # Set confidence threshold as needed
        results.append(result)
    return results

# Perform inference
for i in range(epochs):
    random_images = generate_random_images(bs=batch_size)
    print(random_images.shape)
    inference_results = perform_inference(model, random_images)
    logging.info(f'Epoch {i+1}/{epochs} completed')
    # print(inference_results)

NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.1.1.0 (build 32678585) (public-release)

Did you generated the report with the same version you use to display it? If not, could you give this a try?

I am not able to attach the file

Can you try to rename the file extension of report to e.g., “txt” and try again? Thanks!

I tried opening it in the same version as well. I tried changing the extension to .txt and upload it but the YOLO files are very large exceeding the maximum upload capacity.

There were other problems with YOLO that I have raised but not solved yet in the forum. These are:

  1. YOLO profiling takes too long several hours to profile (inference and training both) whereas even larger models like BERT get profiled in lesses time.
  2. YOLO files are very large in size due to this larger profiling time.

I tried opening it in the same version as well.

Can you reproduce this issue with a more minimal example/app as well?

Looking at the error, I assume the report file got corrupted, possibly because you interrupted profiling or because of the sheer size of the report when persisting it to disc.

YOLO profiling takes too long
YOLO files are very large in size

Nsight Compute is a kernel-level profiler that should be used to profile individual kernels and understand their optimization potential. To understand the overall performance of your application, start with Nsight Systems instead. Nsight Systems can help you identify which kernel(s) (if any) you should focus on to improve the performance of your application. Once identified, you can profile only these by using the various filtering options of NCU. See here for a large set of filter option examples.

If you run NCU to profile possibly 100s or 1,000s of kernels within a complex application, it will not only take a very long time, but also produce very large reports. And even if you wait long enough to collect them, they won’t be useful to you, because the UI is not visually designed to enable you to comprehend data from 1,000s of kernels simultaneously (not mentioning a reduction in usability through decreasing response times of the UI elements).

Two things you can do apart from/along with filtering to decrease profiling time and report sizes:

  • include fewer metrics (instead of using --set roofline, specify metrics with --metrics individually; also see the Metrics Reference)
  • disable rules with --apply-rules no

You should also consider using range replay or application range replay if you’re ultimately interested in overall hardware unit utilization instead of individual kernels. To do this, you will need to include NVTX ranges into your application, see this guide.