When I use nsight system to profile GPU applications, there is no kernel performace info, but there are cuda mem related profile.
Background:
I’m using RTX 3060 TI in win11 WSL (ubuntu 18.04),
the cuda driver version is 511.79-desktop-win10-win11-64bit-international-dch-whql.exe
the nsight system version is 2022.1.1.
I demonstrated this problem by a tensorflow python script: nsys nvprof python matmul_tf.py.
In matmul_tf.py, tensorflow matmul is executed.
All other cuda applications can’t be profiled.
Could someone help me solving this problem?
matmul_tf.py script:
import tensorflow as tf
import numpy as np
tf.compat.v1.reset_default_graph()
tf.compat.v1.disable_eager_execution()
print("support gpu:", tf.test.is_gpu_available())
m = 512
k = 512
n = 512
a_shape = [m, k]
b_shape = [k, n]
np.random.seed(0)
kernel_np = np.random.uniform(low=0.0, high=1.0, size=b_shape).astype("float32")
input_np = np.random.uniform(low=0.0, high=1.0, size=a_shape).astype("float32")
pld1 = tf.compat.v1.placeholder(dtype="float32", shape=a_shape, name="input1")
kernel = tf.constant(kernel_np, dtype="float32")
feed_dict = {pld1: input_np}
result_tf = tf.raw_ops.MatMul(a=pld1, b=kernel, transpose_a=False, transpose_b=False)
with tf.compat.v1.Session() as sess:
for i in range(10):
result = sess.run(result_tf, feed_dict=feed_dict)
print("result shape:", result.shape)