Ubuntu 22.04 Failed to install apex, cuda_profiler_api.h : No such file or directory

I tried to install apex on Ubuntu 22.04 with two 4090 GPUs, but always failed. Here’s the info:

Processing /media/lihongzheng/data/apex
  Running command python setup.py egg_info


  torch.__version__  = 1.13.1


  running egg_info
  creating /tmp/pip-pip-egg-info-uoma3915/apex.egg-info
  writing /tmp/pip-pip-egg-info-uoma3915/apex.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-uoma3915/apex.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-pip-egg-info-uoma3915/apex.egg-info/requires.txt
  writing top-level names to /tmp/pip-pip-egg-info-uoma3915/apex.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-uoma3915/apex.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-pip-egg-info-uoma3915/apex.egg-info/SOURCES.txt'
  adding license file 'LICENSE'
  writing manifest file '/tmp/pip-pip-egg-info-uoma3915/apex.egg-info/SOURCES.txt'
  Preparing metadata (setup.py) ... done
Requirement already satisfied: packaging>20.6 in /home/lihongzheng/anaconda3/lib/python3.9/site-packages (from apex==0.1) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/lihongzheng/anaconda3/lib/python3.9/site-packages (from packaging>20.6->apex==0.1) (3.0.9)
Skipping wheel build for apex, due to binaries being disabled for it.
Installing collected packages: apex
  Running command Running setup.py install for apex


  torch.__version__  = 1.13.1



  Compiling cuda extensions with
  nvcc: NVIDIA (R) Cuda compiler driver
  Copyright (c) 2005-2022 NVIDIA Corporation
  Built on Wed_Jun__8_16:49:14_PDT_2022
  Cuda compilation tools, release 11.7, V11.7.99
  Build cuda_11.7.r11.7/compiler.31442593_0
  from /home/lihongzheng/anaconda3/bin

  running install
  /home/lihongzheng/anaconda3/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
    warnings.warn(
  running build
  running build_py
  running build_ext
  /home/lihongzheng/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  building 'scaled_upper_triang_masked_softmax_cuda' extension
  gcc -pthread -B /home/lihongzheng/anaconda3/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/lihongzheng/anaconda3/include -I/home/lihongzheng/anaconda3/include -fPIC -O2 -isystem /home/lihongzheng/anaconda3/include -fPIC -I/media/lihongzheng/data/apex/csrc -I/home/lihongzheng/anaconda3/lib/python3.9/site-packages/torch/include -I/home/lihongzheng/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lihongzheng/anaconda3/lib/python3.9/site-packages/torch/include/TH -I/home/lihongzheng/anaconda3/lib/python3.9/site-packages/torch/include/THC -I/home/lihongzheng/anaconda3/include -I/home/lihongzheng/anaconda3/include/python3.9 -c csrc/megatron/scaled_upper_triang_masked_softmax.cpp -o build/temp.linux-x86_64-cpython-39/csrc/megatron/scaled_upper_triang_masked_softmax.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
  /home/lihongzheng/anaconda3/bin/nvcc -I/media/lihongzheng/data/apex/csrc -I/home/lihongzheng/anaconda3/lib/python3.9/site-packages/torch/include -I/home/lihongzheng/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lihongzheng/anaconda3/lib/python3.9/site-packages/torch/include/TH -I/home/lihongzheng/anaconda3/lib/python3.9/site-packages/torch/include/THC -I/home/lihongzheng/anaconda3/include -I/home/lihongzheng/anaconda3/include/python3.9 -c csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu -o build/temp.linux-x86_64-cpython-39/csrc/megatron/scaled_upper_triang_masked_softmax_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
  csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu:21:10: fatal error: cuda_profiler_api.h: No such file or directory
     21 | #include <cuda_profiler_api.h>
        |          ^~~~~~~~~~~~~~~~~~~~~
  compilation terminated.
  csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu:21:10: fatal error: cuda_profiler_api.h: No such file or directory
     21 | #include <cuda_profiler_api.h>
        |          ^~~~~~~~~~~~~~~~~~~~~
  compilation terminated.
  error: command '/home/lihongzheng/anaconda3/bin/nvcc' failed with exit code 255
  error: subprocess-exited-with-error

  × Running setup.py install for apex did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/lihongzheng/anaconda3/bin/python -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize

  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/media/lihongzheng/data/apex/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' --cpp_ext --cuda_ext --deprecated_fused_adam --xentropy --fast_multihead_attn install --record /tmp/pip-record-_t2jl1qw/install-record.txt --single-version-externally-managed --compile --install-headers /home/lihongzheng/anaconda3/include/python3.9/apex
  cwd: /media/lihongzheng/data/apex/
  Running setup.py install for apex ... error
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> apex

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.```

It seems that there is no cuda_profiler_api.h file in my machine, and I really cannot find it.

It is also very strange that, when I tried to reinstall cuda the latest time , I first uninstall the Nvidia driver 525, and reinstalled it, I found that I had the directory /usr/local/cuda without installing cuda, and the directory has no version number, such as cuda-11.7. So I also have the question that, after installing Nvidia drive 525 for 4090 GPU, is cuda automatically installed too?

Here is the environment info of my machine:

  • Ubuntu 22.04 with two 4090 GPU.
  • nvcc -V shows CUDA version11.7. (but nvidia-smi shows that NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 )
  • Python 3.9
  • Pytorch 1.13.1

So where to find and get the cuda_profiler_api.h file? How to solve the apex installation problem? Anyone can help me? Thank you very much!```

Installing “cuda-profiler-api” should fix this:
conda install -c nvidia cuda-profiler-api