==ERROR== Failed to prepare kernel for profiling When using MPS with Nsight Compute2025.2

2023055089 · July 4, 2025, 7:01am

When I’m not using MPS, ncu works fine as follows:

However, when I turn on MPS (multi-process service), I can’t do profilling on any device. I’ve used the --mps control command from version 2025.2 and still get this error. What’s the problem?

veraj · July 4, 2025, 9:28am

Hi, @2023055089

Can you please try another simple sample to see if this is still repro ?

2023055089 · July 5, 2025, 3:59pm

Hi. I try a simple program as follows:
include <stdio.h>

global void kernel_A(double* A, int N, int M)
{
double d = 0.0;
int idx = threadIdx.x + blockIdx.x * blockDim.x;
// printf(“Kernel A\n”);

if (idx < N) {

#pragma unroll(100)
for (int j = 0; j < M; ++j) {
d += A[idx];
}

    A[idx] = d;

}

}

global void kernel_B(double* A, int N, int M)
{
double d = 0.0;
int idx = threadIdx.x + blockIdx.x * blockDim.x;

if (idx < N) {

#pragma unroll(100)
for (int j = 0; j < M; ++j) {
d += A[idx];
}

    A[idx] = d;

}

}

global void kernel_C(double* A, const double* B, int N)
{
int idx = threadIdx.x + blockIdx.x * blockDim.x;
// printf(“Kernel C\n”);

// Strided memory access: warp 0 accesses (0, stride, 2*stride, ...), warp 1 accesses
// (1, stride + 1, 2*stride + 1, ...).
const int stride = 16;
int strided_idx = threadIdx.x * stride + blockIdx.x % stride + (blockIdx.x / stride) * stride * blockDim.x;

if (strided_idx < N) {
    A[idx] = B[strided_idx] + B[strided_idx];
}

}

int main() {

double* A;
double* B;

int N = 80 * 2048 * 100;
size_t sz = N * sizeof(double);

cudaMalloc((void**) &A, sz);
cudaMalloc((void**) &B, sz);

cudaMemset(A, 0, sz);
cudaMemset(B, 0, sz);

int threadsPerBlock = 64;
int numBlocks = (N + threadsPerBlock - 1) / threadsPerBlock;

int M = 10000;
kernel_A<<<numBlocks, threadsPerBlock>>>(A, N, M);

cudaFuncSetAttribute(kernel_B, cudaFuncAttributeMaxDynamicSharedMemorySize, 48 * 1024);
kernel_B<<<numBlocks, threadsPerBlock, 48 * 1024>>>(A, N, M);

kernel_C<<<numBlocks, threadsPerBlock>>>(A, B, N);

cudaDeviceSynchronize();

}

When I do profilling without MPS, it can be executed successfully. In addition, the value of metrics is normal:

However, when I do profilling with MPS, there is an error:

In addition, the value of some tested metrics will be 0 !

Once I use MPS for profilling, I can’t get the right results. What’s the problem? How can I solve it?

veraj · July 6, 2025, 6:31am

Thanks.
Can you also tell the Driver version and GPU you used ?

2023055089 · July 6, 2025, 7:16am

The driver version is 570.124.06 and the GPU used is Tesla V100.

2023055089 · July 6, 2025, 9:44am

Hi. One possible reason I seem to have found is that using --devices in the cli at version 2025.2 shows that devices cannot be specified when using MPS. And the GPU 0 on my server was recently unavailable while running another program. As I am profilling on the 7th GPU as specified by the export CUDA_VISIBLE_DEVICES=7 command, and then I get an error. So I would like to ask if it is default to profilling with MPS on GPUs with index 0 in version 2025.2.

veraj · July 7, 2025, 2:34am

Hi, @2023055089

Your analysis is correct. --devices is not support when MPS profiling

2023055089 · July 7, 2025, 2:40am

Yes, I’ve solved the problem, as long as the profilling is done on a GPU with index 0 there is no error. But I found another problem, in version 2025.2, when I use --mps control, the profilling gets L2 utilization and dram utilization as nan! How to solve this problem?

veraj · July 7, 2025, 6:27am

Please use 575.57.08 driver Driver Details | NVIDIA

veraj · July 28, 2025, 8:46am

As you have posted Using --mps control in version2025.2 gets nan. I will close this topic. Thanks !

veraj · July 30, 2025, 8:47am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nsight compute profile run with nan value in multi-process service(MPS) Nsight Compute kernel	10	1223	July 25, 2024
Unable to profile with NCU -- WARNING: No Kernels were profiled Nsight Compute cuda , nsight , deep-learning-profiler , profiling	3	1945	May 15, 2023
MPS capability for nsight products Profiling Linux Targets nsight	0	657	November 4, 2020
Could you tell me how to use nsight compute with MPS and MPI program? Nsight Compute	3	46	December 8, 2025
Nsight Compute metrics value confused Nsight Compute performance-metrics	1	1169	December 14, 2021
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	2347	March 18, 2024
Nvprof with MPS enabled does not show the `M+C` tag before the process in `nvidia-smi` Visual Profiler and nvprof	0	609	August 13, 2023
Using --mps control in version2025.2 gets nan Nsight Compute	2	118	September 26, 2025
Cannot profile CUDA kernel using NC : Run Bottleneck returned an error Nsight Compute	4	606	October 12, 2021
Option to profile only master process Nsight Compute cuda	23	3909	December 1, 2023

==ERROR== Failed to prepare kernel for profiling When using MPS with Nsight Compute2025.2

Related topics