ncu profile tf2.0 session run
Hi, @948496254
Sorry for the issue you met. But with the limited info, we don’t understand the exactly issue.
Can you make this more clear ?
We can try to reproduce if you provide the detailed test steps.
Thanks !
I think I met the same error, maybe i could provide some more info:
So the test is wrapped in Gtest cases like:
TEST_F(GpuPoolingKernelTest, test_gpu_pooling_forward_light)
Inside the gtest body, I call cudaMalloc/memcpy .etc and the kernel itself to test its correctness. If I just run the test binary, everything performs normal.
The error I am encountering when calling ncu ... ./test_file
is like this:
[==========] Running 9 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 9 tests from GpuPoolingKernelTest
[ RUN ] GpuPoolingKernelTest.test_gpu_pooling_forward_light
2024-09-16 23:14:49.623707: I test/gpu_pooling_kernel_test.cu:65] start cuda malloc...
unknown file: Failure
C++ exception with description "Invalid or unsupported charset:ANSI_X3.4-1968" thrown in the test body.
[ FAILED ] GpuPoolingKernelTest.test_gpu_pooling_forward_light (1 ms)
[ RUN ] GpuPoolingKernelTest.test_gpu_pooling_forward_complicated
2024-09-16 23:14:49.624634: I /test/gpu_pooling_kernel_test.cu:1779] start cuda malloc...
The only dependencies I am using when compiling the gtests besides cuda libs are: gtest 1.10.0, glog 0.4.0, eigen and TF2.5.0-gpu.
The locale output result is:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
I doubted there’s something wrong with the compatibility of ncu (which might use ANSI_X3.4 somewhere) and the TF/gtest/glog (which might use boost or some other libs that do not support ANSI_X3.4).
Besides, I searched the gconv lib based on the discussion in another post, there is a gconv installed in my system:
shaopu@n027:~/test$ ls /usr/lib/x86_64-linux-gnu/ | grep gconv
gconv
The system env is Debian 10 with CUDA11.2 installed.
os
Debian GNU/Linux 10
gconv
ls /usr/lib/x86_64-linux-gnu/ | grep gconv
gconv
cmd
ncu session_run_bin (Depends tf2.5 cuda11)
ncu version
try 2024.3/ 2020, 3
error
2024-09-18 11:15:06.752230: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
terminate called after throwing an instance of ‘N5boost6locale4conv21invalid_charset_errorE’
what(): Invalid or unsupported charset:ANSI_X3.4-1968
==ERROR== The application returned an error code (6).
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.
other info
without tf, only run cuda function no error
@veraj can you help?
Hi, @948496254
Is this issue only reproduced when profile tf2.0 ?
Can NCU work well with other simple CUDA sample ?
If only reproduced with tf2.0, how do you set up and run it ?
Is this issue only reproduced when profile tf2.0 ? — yes
Can NCU work well with other simple CUDA sample ? — yes
If only reproduced with tf2.0, how do you set up and run it ? — run C++ session run api
@veraj
Hi, @948496254
We have set up tf2 docker image on Debian12 and can not reproduce the issue. Please clarify the command you used about “run C++ session run api”.
Thanks !