Building Apache Arrow with CUDA on Jetsons

Has anyone successfully built Apache Arrow (C++ and PyArrow) for the Jetson / L4T with CUDA enabled? There have been attempts - see https://gist.github.com/heavyinfo/04e1326bb9bed9cecb19c2d603c8d521 - and Conda Forge has CPU-only packages that will run on the Jetson via miniforge, but that’s the best I’ve been able to come up with.

HI znmeb,

We never tried that before, not sure how to do, may other developers share experiences if they done something similar.

Hi @znmeb, I managed to compile version 0.17.1 after 7h of battling against it.

wget https://github.com/apache/arrow/archive/apache-arrow-0.17.1.zip

I imagine you need this for TF 2.3 + object_detection, at least thats why we need this.
I followed more or less the instructions in the link you published. For cpp it worked fine but with python it was the real pain as there seems to be a bug with the architecture on the cmakes.
Then when running the python installation the key part is to force some cmake variables to make it pass.
Look at the example.

PYARROW_CMAKE_OPTIONS="-DARROW_ARMV8_ARCH=armv8 -DCXX_SUPPORTS_ARMV8_ARCH=true" python3 setup.py build_ext --inplace

It would be nice it @Nvidia could make sure that there are binaries available for all the dependencies to run properly TF stuff(especially that they now bought ARM it is in their best interest), I know there is close to an infinite amount of libraries but just the ones needed to run the basic TF packages would be a great gain. The same happens with the last version of OpenCV when installing from pip you need to build from sources and it takes an eternity.

Let me know if you find any issues there.

I have a semi-kosher way to do this working, but I want to stress-test it a bunch before I call it solved. The strategy is:

  1. Install Miniforge for aarch64 (https://github.com/conda-forge/miniforge). That gets you conda and all of the binaries in the conda-forge channel that run on aarch64. The CPU versions of Arrow C++ and pyarrow are already in conda-forge, so if you don’t care about CUDA you’re done.
  2. Clone the conda-forge feedstock for Apache Arrow. Hack the build scripts so they see the CUDA libraries on the Jetson, then do a “conda build”. I had that running but I’m not sure what the numpy version has to be for things to work, and I don’t have any CUDA-aware Arrow tests to run to make sure it’s working.

IMHO NVIDIA should look at integrating with conda-forge. Miniforge has the CPU-only builds all automated and wired up to continuous integration, so it would simply be a matter of creating a Jetson channel and creating feedstocks for all the Python packages that use CUDA. I’m going to do Arrow as part of my personal project (https://github.com/edgyR) but I have no plans to contribute it upstream.

1 Like

Sounds like a reasonable solution too, just now that it is possible to get it compiled, just a big pain :D
Sounds like an interesting project, good luck!!

The biggest pain point is the lack of widespread free CI/CD resources for building the packages in the cloud. You have to have a Jetson in your home / lab to build and test things. Sure, an AGX-Xavier is only $700US but for repeatability you need a cloud build service.