local cuda mirror

Hi,

We operate a HPC cluster with many, many, many nodes and Nvidia GPUs.
For installing nodes (OS as well as libs) we maintain local mirrors (yum/deb/you-name-it).

We currently have some scripts that scrape [1] using lftp to create a local mirror that is accessible from our cluster. This is done both to save our time and your bandwidth.

This is currently broken because the directory listing at [1] is incomplete.
E.g. the the repodata folder [2] exists, but is not listed in the index page [1].
It also seems that many rpms and other files are missing in the index. Possibly they exist, but if they are not listed in the directory indexes lftp (or wget or whatever) can not retrieve them.

Could you please check why the yum repos resp. the directory listings are incomplete? This could be caused by wrong file permissions, e.g. the webserver does not have permission to read the folders/files.

If at all possible could you setup a rsync server? This would make every large-scale-customers life much easier as the keep-the-nvidia-repo up2date process could be trivially automated.

Cheers,
Steven

[1] http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/
[2] http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/repodata/

1 Like

Actually looking at this again it is clear that the index file [1] is truncated at line 268.

[1] http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/index.html

Some more infos.

I’ve worked around the before mentioned problems by mirroring via reposync instead of lftp. This finds and downloads the repo metadata but there are rpms missing in the rhel7 repo.

e.g. There is no nvidia-kmod-418.39-2 rpm for rhel7 while there is one for rhel6:

http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/nvidia-kmod-418.39-2.el6.x86_64.rpm

But for older versions they exist in both repos:

http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/nvidia-kmod-396.44-2.el6.x86_64.rpm
http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/nvidia-kmod-396.44-2.el7.x86_64.rpm

Can you please sync/update/rebuild the rhel7 repo into a working state?

Hi, we are also affected by this and would be very happy for a timely solution. Thanks.

An rsync server would be nice

1 Like