GPUDirect RDMA with 64 bit addressing

gpu_control · October 30, 2018, 6:28pm

Has anyone had any success using GPUDirect RDMA with a third party PCIe device with one of the newer (Pascal or later) Tesla/Quadro GPUs using 64bit addressing?

Additional details:
I have been using a Tesla K20 with a PCIe NIC and GPUDirect RDMA was working fine. Additionally, the Tesla K20 maps 256MB into the BAR1 region. I upgraded to a Telsa P40 and had to enable “above 4G decoding” ([url]https://www.supermicro.com/support/faqs/faq.cfm?faq=17088[/url]) on my motherboard which switches all PCIe addresses to 64bit. The Tesla P40 now maps 32768 MB into the BAR1 region and I can no longer do RDMA with the NIC and have been informed by the manufacturer that the device is only 32bit capable.

Also, does anyone know if it is possible get the GPU to map less memory into the BAR1 region, so that 32bit addressing would work again?

gpu_control · October 31, 2018, 12:05am

According to the Tesla P40 product brief, page 5: [url]http://images.nvidia.com/content/pdf/tesla/Tesla-P40-Product-Brief.pdf[/url], the GPU supports Compute and Graphics mode. In Graphics mode it only maps 256MBs into BAR1. Does anyone know how to enable Graphics mode? “nvidia-smi” doesn’t mention anything about it.

Robert_Crovella · October 31, 2018, 12:18am

Switching between compute and graphics mode uses a particular utility which is discussed here:

[url]https://docs.nvidia.com/grid/latest/grid-gpumodeswitch-user-guide/index.html[/url]

The P40 cannot be switched into graphics mode.

[url]https://gridforums.nvidia.com/default/topic/2092/tesla-boards/p40-graphics-mode-support-/[/url]

gpu_control · October 31, 2018, 12:22am

@Robert_Crovella, so NVIDIA’s product brief is inaccurate? Seems like false advertising to me.

njuffa · October 31, 2018, 8:46am

In the thread linked by Robert Crovella it says “Graphics support for P40 needs GRID5.0” but also “There is no modeswitch any more for Pascal boards”. I read that to mean that there is a way to use graphics mode with a P40, just that it’s not accessible via the legacy mode-switching utility.

While one can write unit tests to ensure code works as intended, there is no technique yet that I am aware of that can ensure the accuracy of documentation. Mistakes in documentation do happen, though in this case it has not even been established there is one.

In practical terms, deploying a 64-bit capable NIC is probably the best way forward. At least that’s the direction I would pursue.

Robert_Crovella · October 31, 2018, 6:48pm

At the time the product brief was originally written, it was anticipated that there may be a need to support graphics mode in certain configurations (if it were deemed to be needed for that config). There is no indication in the product brief that this would be end-user configurable. Graphics mode was intended for situations where it might be needed to support particular features of the GRID product offering, and/or support particular OEM system configurations. As already pointed out, where there is discussion of actual modification of the mode in-situ, the P40 is explicitly excluded from that discussion.

During subsequent product development and the lifecycle to date, this feature has never been needed. Therefore all P40 are shipped in Compute mode.

Tesla products are only intended to be sold by certified partners, and then only in systems that are certified to support those products. We don’t support Tesla products for use in a platform that was not certified by the OEM for that specific Tesla product. For OEM supported configurations, the specific product offerings and supported configurations are defined by the OEM and are not necessarily end-user modifiable.

No OEM has ever offered this product with support for Graphics mode, therefore placing the product into graphics mode would be an unsupported configuration across the board. Due to that, no tools have ever been developed or provided to manage the configuration, since there is effectively only one supported configuration (with respect to the mode).

If you purchased the Tesla product from a certified OEM in a certified configuration, you also have the option to address your support concerns with that OEM or the channel through which you acquired the product.

njuffa · October 31, 2018, 7:44pm

Thank you for the detailed clarification.

Topic		Replies	Views
GPUDirectRDMA enabled GPUs CUDA Programming and Performance	8	3666	November 8, 2019
8x Tesla M40 on Ubuntu Server 24.04 "Failed to allocate NvKmsKapiDevice" CUDA Setup and Installation	5	342	January 2, 2025
Tesla P40 in Dell Percision 7910 rack CUDA Programming and Performance	10	2366	February 16, 2024
How to set Tesla P40 as default GPU General Discussion	3	1146	June 20, 2024
P40 graphics mode support? Tesla Boards	5	12813	January 30, 2018
GPUDirect RDMA performance CUDA Programming and Performance	2	2176	March 26, 2013
GPUDirect question - cudaDeviceCanAccessPeer information CUDA Programming and Performance	9	4362	January 2, 2020
P100 not showing up in nvidia-smi CUDA Setup and Installation	17	9099	November 20, 2022
NVRM: This PCI I/O region assigned to your NVIDIA device is invalid Linux	39	17445	October 12, 2021
GPUDirect RDMA support with CUDA 5 CUDA Programming and Performance	19	9186	May 28, 2013

GPUDirect RDMA with 64 bit addressing

Related topics