Hpc sdk 23.1 rpm packages baseurl is nonindexable?

We’re unable to download rpm packages from the publicly declared nvhpc repo. The baseurl in the nvhpc.repo file points to an upstream target that appears to either be null, empty, or non-indexable via dnf_reposync, wget, curl, or rsync – at least not from the *.nrl.navy.mil domains?

Our system is a cluster of x86_64 infrastructure servers and aarch64 compute nodes, the latter of which are not allowed to talk to the Internet-at-large. We must ‘createrepo’ on a cache/mirror/copy/etc of the desired files in order to provision the compute pool with nvhpc packages. This works fine for NVidia gpu driver and OFED stacks; those baseurl targets are all indexable by wget/curl/firefox/etc. But the HPC SDK baseurl looks like it’s either empty or (more likely) non-indexable from our domain?

This topic was originally under a different title, here’s the salient:

I get 404 from both of these from home using firefox, chrome, and safari. From home and onsite using wget and curl give me results I associate with same (see below).

https://developer.download.nvidia.com/hpc-sdk/rhel/x86_64
https://developer.download.nvidia.com/hpc-sdk/rhel/aarch64

I can’t currently get to the GP/GPU cluster but here’s the OFED from the A64FX cluster:

https://linux.mellanox.com/public/repo/mlnx_ofed/latest/rhel8.7/x86_64/

That link is indexable by wget and curl, and browsable by firefox, chrome, and safari, from home and onsite.

Here’s verbose curl output attempting the aarch64 hpc sdk:

16:53:28 ~/nvtemp $ curl --verbose https://developer.download.nvidia.com/hpc-sdk/rhel/aarch64
*   Trying 152.195.19.142...
* TCP_NODELAY set
* Connected to developer.download.nvidia.com (152.195.19.142) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=Santa Clara; O=Nvidia Corporation; CN=developer.download.nvidia.com
*  start date: Dec  2 00:00:00 2022 GMT
*  expire date: Jan  2 23:59:59 2024 GMT
*  subjectAltName: host "developer.download.nvidia.com" matched cert's "developer.download.nvidia.com"
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7f926f80d600)
> GET /hpc-sdk/rhel/aarch64 HTTP/2
> Host: developer.download.nvidia.com
> User-Agent: curl/7.64.1
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 301 
< cache-control: max-age=604800
< date: Thu, 16 Feb 2023 21:53:36 GMT
< expires: Thu, 23 Feb 2023 21:53:36 GMT
< location: https://developer.download.nvidia.com/hpc-sdk/rhel/aarch64/
< server: EOS (vny/B289)
< x-vdms-version: 3.42
< content-length: 0
< 
* Connection #0 to host developer.download.nvidia.com left intact
* Closing connection 0
16:53:36 ~/nvtemp $

Here’s the same for the OFED:

16:58:50 ~/nvtemp $ curl --verbose https://linux.mellanox.com/public/repo/mlnx_ofed/latest/rhel8.7/x86_64/
*   Trying 168.62.212.37...
* TCP_NODELAY set
* Connected to linux.mellanox.com (168.62.212.37) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=US; ST=California; L=Santa Clara; O=NVIDIA Corporation; CN=linux.mellanox.com
*  start date: Jul  5 00:00:00 2022 GMT
*  expire date: Jul  5 23:59:59 2023 GMT
*  subjectAltName: host "linux.mellanox.com" matched cert's "linux.mellanox.com"
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
*  SSL certificate verify ok.
> GET /public/repo/mlnx_ofed/latest/rhel8.7/x86_64/ HTTP/1.1
> Host: linux.mellanox.com
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Thu, 16 Feb 2023 21:58:56 GMT
< Server: Apache/2.4.6 (CentOS)
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html;charset=UTF-8
< 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
    ...snip snip...

Moderator jmudd reports fix on the way, early next week / possibly Monday.
https://forums.developer.nvidia.com/t/re-hpc-sdk-23-1-tarball-fails-checksum-and-rpm-packages-baseurl-is-nonindexable/243125/6

This issue was resolved. The seven repository areas for HPC SDK now have index files.

If there are any future issues/questions, just let us know.