Cutom key point detection Not Face related

I’m looking to implement a key point detection model with a custom object not face related.

It could also be similar to transfer learning body pose estimation to non human object.

What is the best way to do that in TAO?

Thanks!

For generic key point detection, please download the fpenet notebook TAO Toolkit Quick Start Guide — TAO Toolkit 4.0 documentation for the steps.
notebooks/tao_launcher_starter_kit/fpenet/fpenet.ipynb

Thanks!!

I’ll try that

Dave

@Morganh I ran all the steps in the notebook but fails with no clear error:

2023-02-17 13:26:07.302193: F ./tensorflow/core/kernels/random_op_gpu.h:225] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: the provided PTX was compiled with an unsupported toolchain.
[71ad73f96a9a:00198] *** Process received signal ***
[71ad73f96a9a:00198] Signal: Aborted (6)
[71ad73f96a9a:00198] Signal code:  (-6)
[71ad73f96a9a:00198] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f66a9358090]
[71ad73f96a9a:00198] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f66a935800b]
[71ad73f96a9a:00198] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f66a9337859]
[71ad73f96a9a:00198] [ 3] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0x20baf4)[0x7f66a5840af4]
[71ad73f96a9a:00198] [ 4] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_cc.so.1(_ZN10tensorflow7functor16FillPhiloxRandomIN5Eigen9GpuDeviceENS_6random19UniformDistributionINS4_12PhiloxRandomEfEEEclEPNS_15OpKernelContextERKS3_S6_PfxS7_+0x1d5)[0x7f65f890e7e5]
[71ad73f96a9a:00198] [ 5] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_cc.so.1(+0x8ed75ea)[0x7f65f890b5ea]
[71ad73f96a9a:00198] [ 6] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1(_ZN10tensorflow13BaseGPUDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0x3cb)[0x7f66a47333db]
[71ad73f96a9a:00198] [ 7] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1(+0x113caa7)[0x7f66a4790aa7]
[71ad73f96a9a:00198] [ 8] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1(+0x113d10f)[0x7f66a479110f]
[71ad73f96a9a:00198] [ 9] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1(_ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x285)[0x7f66a4845725]
[71ad73f96a9a:00198] [10] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x48)[0x7f66a4842268]
[71ad73f96a9a:00198] [11] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1(+0x18d69a0)[0x7f66a4f2a9a0]
[71ad73f96a9a:00198] [12] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f66a92fa609]
[71ad73f96a9a:00198] [13] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f66a9434133]
[71ad73f96a9a:00198] *** End of error message ***
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL
2023-02-17 15:26:08,361 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

The complete error log here:

fpenet error log 2023 02 17.txt (22.8 KB)

The spec file unmodified:

experiment_spec.yaml (2.3 KB)

Thanks!

David

Please update nvidia-driver. You can search this info in this forum and get similar error log.

That’s a scary proposition as in the past updating the Nvidia GPU driver has caused the corruption of two workstations that needed full reinstall…

My current driver version is:

nvidia-smi
Sun Feb 19 21:50:19 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+

And I’ll be updating to latest, 525…

Thanks!

And sure enough, the computer now gets stuck on boot after upgrading to driver 525…

Use below way.
Check with nvidia-smi, if it is 470 version, then

Uninstall current driver:
sudo apt purge nvidia-driver-470
sudo apt autoremove
sudo apt autoclean

Install new driver.
sudo apt install nvidia-driver-520

I was able to complete training after upgradi8ng driver to 525.78.01.

However, the results are not very good. In the following image, used in the default notebook, section 7, one face doesn’t get detected at all and key points are not generated, and for the second face, some of the key points are very far from their expected locations…

Thanks!

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Please share the models/exp1/result.txt

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.