Why can't I train with GPU after installing tensorflow?

I installed tensorflow using the following command:
sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
sudo apt-get install python3-pip
sudo python3 -m pip install --upgrade pip
sudo pip3 install -U testresources setuptools==65.5.0
sudo pip3 install -U numpy==1.22 future==0.18.2 mock==3.0.5 keras_preprocessing==1.1.2 keras_applications==1.0.8 gast==0.4.0 protobuf pybind11 cython pkgconfig packaging h5py==3.7.0
sudo pip3 install --extra-index-url Index of /compute/redist/jp/v512 tensorflow==2.12.0+nv23.06

Command 【sudo python3 -c “import tensorflow as tf; print("Num GPUs Available: ", len(tf.config.list_physical_devices(‘GPU’)))”】 returns 【true】

When I start training my model, it prompts 【E tensorflow/core/grappler/optimizers/meta_optimizer.cc:1014] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape ingestureCNN/dropout/dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer】, but the training is still going on and the GPU usage is basically 0. Although the GPU memory usage goes up, and I’m not sure if the training process is using the GPU.

Here is the information about the version of the software that I am using:
aarch64
Jetpack 5.1.1
Ubuntu 20.04
CUDA 11.4
cuDNN 8.6

Hi,

This looks like a known issue of TensorFlow.
Could you check the below suggestion to see if it helps?

Thanks.

But I don‘t use tf.where, my code:

-- coding: UTF-8 --

import os
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator
import gesture_model as gm

确定使用的GPU

os.environ[‘CUDA_VISIBLE_DEVICES’]=‘0’

# 保存权重路径

checkpoint_path = “weights/gestureCNN_16_50/cp-{epoch:04d}.ckpt”
checkpoint_dir = os.path.dirname(checkpoint_path)

创建一个回调,每 5 个 epochs 保存模型的权重

cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
period=5)

sh = 96;

mymodel = gm.gestureCNN(input_shape=(sh, sh, 3), num_classes=3);
mymodel.summary();
parallel_model = mymodel;

epochs = 100;
batch_size = 32;

train_datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode=“nearest”,
validation_split=0.2)

validation_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
‘gesture_dataset/gesture_train’,
target_size=(sh, sh),
batch_size=batch_size,
class_mode=‘categorical’,#or binary
subset=‘training’)

validation_generator = train_datagen.flow_from_directory(
‘gesture_dataset/gesture_train’,
target_size=(sh, sh),
batch_size=batch_size,
class_mode=‘categorical’,#or binary
subset=‘validation’)

编译具体网络

parallel_model.compile(optimizer=‘adadelta’,
loss=‘categorical_crossentropy’,
metrics=[‘accuracy’])

parallel_model.fit_generator(train_generator,
validation_data=validation_generator,
steps_per_epoch=int(592/batch_size),
validation_steps=batch_size,
epochs=epochs,
callbacks=[cp_callback],)

Hi,

Could you run a simple testing model to see if the training can run on GPU?

If GPU is only not used for your custom model, that should be an issue with TensorFlow implementation.
Then it’s recommended to check with the TensorFlow team to get better help.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.