Nsight System fails to record CUDA kernels on WSL2

I am profiling a CUDA program on WSL using nsys, but the profiling file seems to have no trace of CUDA kernel calls. An example is provided below:

I firstly write a simple cuda program, test.cu:

#include <iostream>

__global__ void test(int* a){
	int idx = blockIdx.x * blockDim.x + threadIdx.x;
	++a[idx];
}

int main(){
	int* arr;
	cudaMalloc(&arr, sizeof(arr[0]) * 10);
	cudaMemset(arr, 0, sizeof(arr[0]) * 10);
	test<<<10,1>>>(arr);
	int arr_res[10];
	cudaMemcpy(arr_res, arr, sizeof(arr[0]) * 10, cudaMemcpyDeviceToHost);
	for(int i=0; i<10; ++i){
		if(arr_res[i] != 1) std::cout << i << ':' << arr_res[i] << std::endl;
	}
	return 0;
}

then I compile and profile it in my WSL using nvcc test.cu -o test && nsys profile ./test, and opened the generated report1.nsys-rep in Nsight System in Windows:

It does not show the GPU (kernel calls) and warns me that there is an error:

However, I reproduced the whole procedure on another linux server, and it does trace kernel executions and shows no error at all:

So how can I get kernel execution traces in WSL?

Version information:

  • The WSL server has a Nvidia GeForce RTX 3090 Ti GPU
  • The WSL server has WSL2 with distribution Ubuntu 20.02
  • The WSL server has cuda-toolkits installed and nvcc has version 12.6.85
  • The nsys on the WSL server has version 2024.5.1.113-245134619542v0
  • The Nsight Systems on windows has version 2024.7.1

P.S. Since I am quite new with CUDA programming, I am not sure how to debug this, and are willing to provide necessary files/command outputs/results if needed.

I’ve found out that the solution in the post below is useful:
https://forums.developer.nvidia.com/t/nsys-doesnt-show-cuda-kernel-and-memory-data/315536
The problem is solved by now (although the error still exists, which doesn’t affect me in my context)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.