Jetson AGX Xavier Python Tensorflow Issue

So currently i try to run this git project, GitHub - hccho2/wavenet-tf.layers.conv1d: wavenet, dilation convolution tf.layers.conv1d, fast wavenet generation
I installed tensorflow 1.15.2 like this:

sudo pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==1.15.2

Then i try to run the following command from the git
sudo python3 train.py --data_dir="/home/ark626/Billy Talent" --logdir="/home/ark626/ai/BillyTalent"

And the result is always a segmentation fault at this line wavenet-tf.layers.conv1d/train.py at 5d8166d39aba27928429276c83caf59ce0f4ae3a · hccho2/wavenet-tf.layers.conv1d · GitHub

Most of the issues with librosa etc i got fixed somehow but here it seems to not work.
Installed packages below.

librosa in /usr/local/lib/python3.6/dist-packages (0.5.0) Requirement already satisfied: joblib>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from librosa) (0.14.1) Requirement already satisfied: scikit-learn>=0.14.0 in /usr/local/lib/python3.6/dist-packages (from librosa) (0.23.0) Requirement already satisfied: decorator>=3.0.0 in /usr/lib/python3/dist-packages (from librosa) (4.1.2) Requirement already satisfied: resampy>=0.1.2 in /usr/local/lib/python3.6/dist-packages (from librosa) (0.2.2) Requirement already satisfied: numpy>=1.8.0 in /usr/local/lib/python3.6/dist-packages (from librosa) (1.16.1) Requirement already satisfied: scipy>=0.13.0 in /usr/local/lib/python3.6/dist-packages (from librosa) (1.4.1) Requirement already satisfied: six>=1.3 in /home/ark626/.local/lib/python3.6/site-packages (from librosa) (1.14.0) Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from librosa) (2.1.8) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from scikit-learn>=0.14.0->librosa) (2.0.0) Requirement already satisfied: numba>=0.32 in /usr/local/lib/python3.6/dist-packages (from resampy>=0.1.2->librosa) (0.42.0) Requirement already satisfied: llvmlite>=0.27.0dev0 in /usr/local/lib/python3.6/dist-packages (from numba>=0.32->resampy>=0.1.2->librosa) (0.32.1)

dmesg LOG:
[ 8114.805485] nvgpu: 17000000.gv11b gk20a_tsg_unbind_channel:164 [ERR] Channel 510 unbind failed, tearing down TSG 0
[ 8139.415356] CPU2: Booted secondary processor [4e0f0040]
[ 8139.438539] ras_fhi_enable: FHI 480 enabled on CPU2
[ 8139.438700] carmel_ras_enable: RAS enabled on cpu2
[ 8139.457637] CPU3: Booted secondary processor [4e0f0040]
[ 8139.473497] ras_fhi_enable: FHI 481 enabled on CPU3
[ 8139.473605] carmel_ras_enable: RAS enabled on cpu3
[ 8139.491122] CPU4: Booted secondary processor [4e0f0040]
[ 8139.502383] ras_fhi_enable: FHI 482 enabled on CPU4
[ 8139.502571] carmel_ras_enable: RAS enabled on cpu4
[ 8139.520444] CPU5: Booted secondary processor [4e0f0040]
[ 8139.530007] ras_fhi_enable: FHI 483 enabled on CPU5
[ 8139.530141] carmel_ras_enable: RAS enabled on cpu5
[ 8139.555474] CPU6: Booted secondary processor [4e0f0040]
[ 8139.566389] ras_fhi_enable: FHI 484 enabled on CPU6
[ 8139.566586] carmel_ras_enable: RAS enabled on cpu6
[ 8139.594388] CPU7: Booted secondary processor [4e0f0040]
[ 8139.598070] ras_fhi_enable: FHI 485 enabled on CPU7
[ 8139.598177] carmel_ras_enable: RAS enabled on cpu7
[ 8139.617831] nvgpu: 17000000.gv11b tpc_pg_mask_store:843 [INFO] no value change, same mask already set
[ 8276.150491] nvgpu: 17000000.gv11b gk20a_fifo_tsg_unbind_channel_verify_status:2200 [ERR] Channel 510 to be removed from TSG 0 has NEXT set!
[ 8276.150731] nvgpu: 17000000.gv11b gk20a_tsg_unbind_channel:164 [ERR] Channel 510 unbind failed, tearing down TSG 0
[ 8967.239609] nvgpu: 17000000.gv11b gk20a_fifo_tsg_unbind_channel_verify_status:2200 [ERR] Channel 508 to be removed from TSG 0 has NEXT set!
[ 8967.239850] nvgpu: 17000000.gv11b gk20a_tsg_unbind_channel:164 [ERR] Channel 508 unbind failed, tearing down TSG 0
[ 9554.655969] nvgpu: 17000000.gv11b gk20a_fifo_tsg_unbind_channel_verify_status:2200 [ERR] Channel 508 to be removed from TSG 0 has NEXT set!
[ 9554.656241] nvgpu: 17000000.gv11b gk20a_tsg_unbind_channel:164 [ERR] Channel 508 unbind failed, tearing down TSG 0
[10782.645552] nvgpu: 17000000.gv11b gk20a_fifo_tsg_unbind_channel_verify_status:2200 [ERR] Channel 508 to be removed from TSG 0 has NEXT set!
[10782.645793] nvgpu: 17000000.gv11b gk20a_tsg_unbind_channel:164 [ERR] Channel 508 unbind failed, tearing down TSG 0
[11086.759649] nvgpu: 17000000.gv11b gk20a_fifo_tsg_unbind_channel_verify_status:2200 [ERR] Channel 508 to be removed from TSG 0 has NEXT set!
[11086.759893] nvgpu: 17000000.gv11b gk20a_tsg_unbind_channel:164 [ERR] Channel 508 unbind failed, tearing down TSG 0
[11331.803615] nvgpu: 17000000.gv11b gk20a_fifo_tsg_unbind_channel_verify_status:2200 [ERR] Channel 508 to be removed from TSG 0 has NEXT set!
[11331.803853] nvgpu: 17000000.gv11b gk20a_tsg_unbind_channel:164 [ERR] Channel 508 unbind failed, tearing down TSG 0
[12750.009863] nvgpu: 17000000.gv11b gk20a_fifo_tsg_unbind_channel_verify_status:2200 [ERR] Channel 508 to be removed from TSG 0 has NEXT set!
[12750.010117] nvgpu: 17000000.gv11b gk20a_tsg_unbind_channel:164 [ERR] Channel 508 unbind failed, tearing down TSG 0

Okay figured out a way to bypass it.
Seems that there was either an issue with SciKit => i installed now 0.2.0.
(SciKit is needed for Librosa) => now it seems to be working.
So there is maybe an issue with the latest version of scikit-learn.

Also i recompiled the openCV according to this tutorial:

But this in general requires to have some external storage since the 32 GB of the AGX are not enough to store the Data generated while compiling.

My guess is that it was the version of scikit-learn

 pip show librosa
Name: librosa
Version: 0.7.2
Summary: Python module for audio and music processing
Home-page: http://github.com/librosa/librosa
Author: Brian McFee
Author-email: brian.mcfee@nyu.edu
License: ISC
Location: /usr/local/lib/python3.6/dist-packages
Requires: scikit-learn, six, audioread, numba, joblib, scipy, decorator, soundfile, resampy, numpy
  
Name: scikit-learn
Version: 0.20.0
Summary: A set of python modules for machine learning and data mining
Home-page: http://scikit-learn.org
Author: None
Author-email: None
License: new BSD
Location: /usr/local/lib/python3.6/dist-packages
Requires: scipy, numpy
Required-by: librosa

Also i figured that the AGX has different modes, so before trying to compile anything it seems to be smart to set it via

sudo /usr/bin/jetson_clocks

=> Automatically sets fanspeed to 255 and sets the clockings for all cpus and gpus to max. (Can spare you some hours while compiling)

Hi,

It looks like this issue is caused by the dependencies between modules.

By the way, please noticed that the installed TensorFlow package is for JetPack4.4.
Please make sure you are using JetPack4.4. to avoid other compatible issue.

Thanks.

I am using jetpack 4.4 :D

Yeah as mentioned the issue seems solved by using scikit-learn 0.2.0