MVTec Halcon does not recognize TX2 Pascal GPU

cdahms123 · February 5, 2019, 10:00pm

I can’t seem to get MVTec Halcon to recognize the Pascal GPU inside a TX2. Here is the code for the test app I’m running:

// HalconTest.cpp

#include <iostream>
#include <string>
#include <cstdlib>

#include <halconcpp/HalconCpp.h>

// function prototypes
bool stringContainsSubstringIgnoreCase(std::string fullString, std::string substring);

int main(int argc, char *argv[])
{
  // comment out this first try-catch if you don't want to try to use the GPU

  try
  {
    // query for GPUs
    HalconCpp::HTuple possibleGpuIdentifiers;
    HalconCpp::QueryAvailableComputeDevices(&possibleGpuIdentifiers);

    std::cout << "1" << "\n\n";

    auto possibleGpuIdentifiersLength = possibleGpuIdentifiers.Length();

    //auto asdfasd = possibleGpuIdentifiersLength.L();

    std::cout << "possibleGpuIdentifiersLength = " << possibleGpuIdentifiersLength << "\n\n";

    if (possibleGpuIdentifiersLength <= 0)
    {
      std::cout << "HalconCpp::QueryAvailableComputeDevices() was not able to identify any compute devices" << "\n\n";
    }
    else
    {
      HalconCpp::HTuple gpuDeviceId;
      for (int i = 0; i < possibleGpuIdentifiers.Length(); i++)
      {
        std::cout << "1a" << "\n\n";

        // get the GPU name as a HTuple
        HalconCpp::HTuple tupPossibleGpuName;
        HalconCpp::GetComputeDeviceInfo(possibleGpuIdentifiers[i], "name", &tupPossibleGpuName);

        std::cout << "1b" << "\n\n";

        // convert the GPU name to a string and log the name
        auto charArrPossibleGpuName = tupPossibleGpuName.SArr();
        std::string possibleGpuName(*charArrPossibleGpuName);

        std::cout << "possibleGpuName = " << possibleGpuName << "\n\n";

        std::cout << "1c" << "\n\n";

        if ((stringContainsSubstringIgnoreCase(possibleGpuName, "GTX") && (stringContainsSubstringIgnoreCase(possibleGpuName, "1070") || stringContainsSubstringIgnoreCase(possibleGpuName, "1080"))) ||
            (stringContainsSubstringIgnoreCase(possibleGpuName, "Quadro") && stringContainsSubstringIgnoreCase(possibleGpuName, "P2000")) ||
            (stringContainsSubstringIgnoreCase(possibleGpuName, "Pascal")))
        {
          gpuDeviceId = possibleGpuIdentifiers[i];
        }

        std::cout << "1d" << "\n\n";
      }

      std::cout << "2" << "\n\n";

      // get the GPU name as a HTuple
      HalconCpp::HTuple tupGpuName;
      HalconCpp::GetComputeDeviceInfo(gpuDeviceId, "name", &tupGpuName);

      std::cout << "3" << "\n\n";

      // convert the GPU name to a string and log the name
      auto charArrGpuName = tupGpuName.SArr();
      std::string gpuName(*charArrGpuName);

      std::cout << "4" << "\n\n";

      // open the device handle
      HalconCpp::HTuple deviceHandle;
      HalconCpp::OpenComputeDevice(gpuDeviceId, &deviceHandle);

      std::cout << "5" << "\n\n";

      // set the GPU params
      HalconCpp::SetComputeDeviceParam(deviceHandle, "asynchronous_execution", "false");

      std::cout << "6" << "\n\n";

      // use the GPU for all possible Halcon functions
      HalconCpp::InitComputeDevice(deviceHandle, "all");

      std::cout << "7" << "\n\n";

      // finally we can activate the GPU with Halcon
      HalconCpp::ActivateComputeDevice(deviceHandle);

      std::cout << "8" << "\n\n";

      std::cout << "GPU configuration successful, gpuName = " << gpuName << "\n\n";
    }
  }
  catch (HalconCpp::HException &ex)
  {
    std::cout << "unable to configure GPU with Bananas" << "\n" << ex.ErrorCode() << "\n" << ex.ErrorMessage() << "\n" << "\n";
    //return (0);
  }

// from here down is GPU-independent, show an image as a test

  try
  {
    // open the image
    HalconCpp::HImage image("image.png");

    // get the image width and height
    HalconCpp::HTuple imageWidth;
    HalconCpp::HTuple imageHeight;
    HalconCpp::GetImageSize(image, &imageWidth, &imageHeight);

    // show the image width and height
    std::cout << "imageWidth = " << imageWidth.ToString() << "\n\n";
    std::cout << "imageHeight = " << imageHeight.ToString() << "\n\n";

    // instantiate an HWindow
    HalconCpp::HWindow hWindow(0, 0, imageWidth, imageHeight);

    // show the HImage in the HWindow
    hWindow.DispImage(image);

    // wait for a click, then clear the window
    hWindow.Click();
    hWindow.ClearWindow();
  }
  catch (HalconCpp::HException& exception)
  {
    std::cout << "Halcon error: " << exception.ErrorCode() << "\n" << exception.ErrorMessage() << "\n";
  }

}

bool stringContainsSubstringIgnoreCase(std::string fullString, std::string substring)
{
  // note that these string variables are pass by value so changing them here does not affect the variables in the calling function

  // convert fullString to lower case for case-insensitive comparison
  for (int i = 0; i < fullString.length(); i++)
  {
    fullString[i] = std::tolower(fullString[i]);
  }

  // convert substring to lower case for case-insensitive comparison
  for (int i = 0; i < substring.length(); i++)
  {
    substring[i] = std::tolower(substring[i]);
  }

  // if we find the substring before going off the end of the full string, then the full string contains the substring
  if (fullString.find(substring) != std::string::npos)
  {
    return true;
  }
  else  // otherwise it dosen't
  {
    return false;
  }
}

This code successfully recognizes and uses a GTX 1070 or 1080 GPU in a desktop computer and a Quadro P2000 on a server, however when I run it on a TX2 it never recognizes the GPU. The portion at the end that shows the image as a test still works on a TX2, and since the GPU portion of the code (the first 2/3) works in the other two cases I’m pretty convinced the code is doing the proper things to shake hands with Halcon.

I’ve ran this code on a desktop with a GTX 1070/1080 under both Ubuntu 16.04 w/ CUDA 9.0 and cuDNN 7.1 and Ubuntu 18.04 w/ CUDA 9.2 and cuDNN 7.2, it works on both. It also works on a server with Ubuntu 18.04 server, a Quadro P2000, and CUDA 9.2 and cuDNN 7.2

For the TX2, I’m developing on a native install Ubuntu 16.04 host. I installed JetPack 3.3 and performed the full flash of the TX2 as recommended by the JetPack 3.3 install process. I’m convinced the flash went well and the TX2 hardware is good since I can run GPU-accelerated OpenCV 3.3.1 (as installed/flashed by JetPack) and I successfully compiled Tensorflow 1.10 with Bazel 0.18.0 on the TX2 and that runs GPU-accelerated successfully on the TX2 as well.

I’m developing using Nsight on the Ubuntu 16.04 host. I’m confident I’m performing the cross-compile steps well since GPU-accelerated OpenCV and TensorFlow programs work well.

I also know the JetPack flash was successful based on the following commands and output:

nvidia@tegra-ubuntu:~/HalconTest$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Sun_Nov_19_03:16:56_CST_2017
Cuda compilation tools, release 9.0, V9.0.252

nvidia@tegra-ubuntu:~/HalconTest$ cat /usr/local/cuda/version.txt

CUDA Version 9.0.252

nvidia@tegra-ubuntu:~/HalconTest$ cat /usr/include/aarch64-linux-gnu/cudnn_v7.h | grep CUDNN_MAJOR -A 2

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

nvidia@tegra-ubuntu:~/HalconTest$ dpkg -l | grep libopencv

ii  libopencv                                   3.3.1                                        arm64        Open Computer Vision Library

I’d like to reiterate that the Halcon test app runs and shows the test image successfully, however it does not recognize the TX2’s Pascal GPU, so as far as I can tell I’ve correctly installed and configured Halcon. I did not have to perform any other special steps to get Halcon to recognize the GPU in the case of the desktop and server. I’m using Halcon 18, which is stated to work on all these platforms. Unfortunately not recognizing the TX2’s Pascal GPU is unacceptable in my case since we will be deploying a vision application on the TX2 and the processing power of the GPU is essential.

Any idea what I’m doing wrong? Has anybody else gotten Halcon to recognize the TX2’s Pascal GPU? Any suggestions as to other steps to try or stuff to check?

– Edit –

I was just looking at the Halcon documentation for the function QueryAvailableComputeDevices and I found the following:

At present, HALCON only supports OpenCL compatible GPUs supporting the OpenCL extension cl_khr_byte_addressable_store and image objects. If you are not sure whether a certain device is supported, please refer to the manufacturer.

Does the TX2 Pascal GPU supporting the OpenCL extension cl_khr_byte_addressable_store
and image objects? What does this mean and how can I check?

– Edit 2 –

I’ve discovered some more information here, and things are not looking good for getting the TX2 to run Halcon.

Bearing in mind that per the Halcon documentation, for a platform to use the Halcon GPU the platform must support OpenCV and the cl_khr_byte_addressable_store device extension specifically, on my Ubuntu 16.04 host with GTX 1080 where Halcon can successfully use the GPU, I did this:

sudo apt-get install clinfo
clinfo

I got these results:

$ clinfo
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 10.0.132
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GTX 1080
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  415.27
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Topology (NV)                            PCI-E, 01:00.0
  Max compute units                               20
  Max clock frequency                             1733MHz
  Compute Capability (NV)                         6.1
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              8511488000 (7.927GiB)
  Error Correction support                        No
  Max memory allocation                           2127872000 (1.982GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        327680
  Global Memory cache line                        128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

The important part of these results is that cl_khr_byte_addressable_store is listed under Device Extensions towards the bottom.

Then I did the same on the TX2:

sudo apt-get install clinfo
clinfo

And I got this:

Number of platforms     0

Clearly this is not encouraging. Is there some way to install OpenCL and the cl_khr_byte_addressable_store device extension on the TX2? Or am I dead in the water here?

linuxdev · February 6, 2019, 6:18pm

Most of this I can’t answer, but beware that the driver in the Jetson is not PCI. The Jetson GPU is integrated with the memory controller. If the software depends on PCI query features, then the GPU won’t be visible. For the same reason the same GPU from a desktop won’t work with the existing driver on the Jetson.

cdahms123 · February 6, 2019, 6:29pm

linuxdev thanks for the suggestion. How would I know if Halcon depends on PCI query features? I still tend to come back to the idea that if OpenCV and TensorFlow can utilize the GPU, Halcon should be able to also.

linuxdev · February 6, 2019, 6:49pm

I lack the knowledge to answer that. Perhaps someone else here could suggest a way to know if the software is trying to use PCI to detect the GPU.

cdahms123 · February 8, 2019, 3:06pm

I received an answer from MVTec on this:

Chris,

we had your case forwarded to MVTec, here's their answer:
To our knowledge NVIDIA doesn't provide OpenCL support for the Jetson boards. We have tested this previously.

According to https://elinux.org/Jetson/Installing_CUDA the group 'video' is the group of user accounts that are allowed to access the GPU on the Jetson. This has more to do with Ubuntu user administration and little with HALCON.

Best regards,
Andreas Heindl

I tried the following to add both the nvidia user and root to the video group as Jetson/Installing CUDA - eLinux.org suggests:

sudo usermod -a -G video nvidia
sudo usermod -a -G video sudo

Then I did this to verify nvidia and root are in the video group:

getent group video

Which they were, but the GPU in the TX2 is still not recognized by Halcon.

I should mention that the new deep learning functions in Halcon do seem to use the GPU on the TX2, but all the others cannot.

For the record I should cross-link a few other posts here:

https://devtalk.nvidia.com/default/topic/1010166/jetson-tx2/does-jetson-tx1-or-tx2-support-opencl/

https://superuser.com/questions/1402778/how-to-check-if-a-gpu-is-an-opencl-compatible-gpu-supporting-the-opencl-extensio

https://devtalk.nvidia.com/default/topic/1047164/jetson-agx-xavier/can-the-xavier-run-opencl-applications-/

https://askubuntu.com/questions/1116221/how-to-install-opencl-on-an-nvidia-tx2-running-ubuntu-16-04

Vision · December 8, 2019, 11:04pm

Is there any news? Did someone managed to use TX2 GPU-Power in Halcon?

Topic		Replies	Views
About halcon how to install on the jetson TX2 Jetson TX2	2	946	August 2, 2019
Need support to run OpenCL application on TX2 board Jetson TX2 cuda	5	854	September 21, 2023
OpenCV convertTo Failure Jetson TX2 opencv	24	8136	October 18, 2021
does opencv_dnn use gpu? Jetson TX2	11	3223	October 18, 2021
Can the Xavier run OpenCL applications? Jetson AGX Xavier	15	6885	October 18, 2021
Slow performance with opencv at jetson tx2 Jetson TX2	13	4093	October 18, 2021
Correct compute architecture for TX2 and OpenCV4Tegra compute architecture Jetson TX2 opencv	13	5821	October 18, 2021
No matching function for call to 'cv::cuda::CLAHE::apply(cv::gpu::GpuMat&, cv::gpu::GpuMat&) Jetson TX2	2	1044	October 18, 2021
What is the exactly GPU type of TX1？ Jetson TX1	27	2474	October 18, 2021
Running Python codes in Nvidia TX2, with tensorflow gpu and cannot detect gpu. Jetson TX2	3	622	October 18, 2021

MVTec Halcon does not recognize TX2 Pascal GPU

Related topics