Hello,
my organization migrating to Matlab R2021a, I see myself having to install the versions of CUDA, CUDNN and TENSORRT compatible with this new version of Matlab (the software considered are specified on the page “Documentation - MATLAB & Simulink - MathWorks France /gpucoder/gs/install-prerequisites.html “). As a test, I installed CUDA 11.0.3, CUDNN 8.1.0 and TENSORRT 7.2.3.4. I specify here that I am carrying out 2 installations at the same time, one on a Xeon Gold 6146 128 GB of RAM station equipped with a QUADRO P2000, and the other on an i7-10870H 32 GB of RAM laptop equipped with a GEFORCE RTX 3070.
As recommended by MathWorks, I also installed the Visual Studio edition of NSIGHT in its version 2021.1.0. I took the opportunity to add NSIGHT Integration to the updated version of Visual Studio Cummunity (16.11.4 at the time of posting).
I have carefully followed the installation instructions for the correct version of the documentation for each software and I am now in the testing phase. As indicated in paragraph 2.2 of the CUDA 11.0 Quick Start Guide (Quick Start Guide :: CUDA Toolkit Documentation), to check that my installation, I built the solution presented as an example and successfully generated an executable on both computer (see VS 2019 Outputs shown below).
L’opération de génération a démarré…
1>------ Début de la génération : Projet : nbody, Configuration : Debug x64 ------
1>Compiling CUDA source file bodysystemcuda.cu…
1>
1>C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.0\5_Simulations\nbody>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc.exe” -gencode=arch=compute_35,code="sm_35,compute_35" -gencode=arch=compute_37,code="sm_37,compute_37" -gencode=arch=compute_50,code="sm_50,compute_50" -gencode=arch=compute_52,code="sm_52,compute_52" -gencode=arch=compute_60,code="sm_60,compute_60" -gencode=arch=compute_61,code="sm_61,compute_61" -gencode=arch=compute_70,code="sm_70,compute_70" -gencode=arch=compute_75,code="sm_75,compute_75" -gencode=arch=compute_80,code="sm_80,compute_80" --use-local-env -ccbin “C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64” -x cu -I./ -I…/…/common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0/include” -I…/…/common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include” -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -Xcompiler “/wd 4819” -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64/Debug/vc142.pdb /FS /Zi /RTC1 /MTd " -o x64/Debug/bodysystemcuda.cu.obj “C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.0\5_Simulations\nbody\bodysystemcuda.cu”
1>CUDACOMPILE : nvcc warning : The ‘compute_35’, ‘compute_37’, ‘compute_50’, ‘sm_35’, ‘sm_37’ and ‘sm_50’ architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1>bodysystemcuda.cu
1>Génération du projet “nbody_vs2019.vcxproj” terminée.
1>nbody.cpp
1>render_particles.cpp
1>Génération de code en cours…
1> Création de la bibliothèque …/…/bin/win64/Debug/nbody.lib et de l’objet …/…/bin/win64/Debug/nbody.exp
1>nbody_vs2019.vcxproj → C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.0\bin\win64\Debug\nbody.exe
========== Génération : 1 a réussi, 0 a échoué, 0 à jour, 0 a été ignoré ==========
On each of the computers, the executable works well and returns some information to me via a DOS window.
With QUADRO P2000, we have :
Run “nbody -benchmark [-numbodies = ]” to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies = (number of bodies (> = 1) to run in simulation)
-device = (where d = 0,1,2 … for the CUDA device to use)
-numdevices = (where i = (number of CUDA devices> 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy = <file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: “Pascal” with compute capability 6.1
Compute 6.1 CUDA device: [NVIDIA Quadro P2000]
With GEFORCE RTX 3070, we have :
Run “nbody -benchmark [-numbodies = ]” to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies = (number of bodies (> = 1) to run in simulation)
-device = (where d = 0,1,2 … for the CUDA device to use)
-numdevices = (where i = (number of CUDA devices> 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy = <file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
MapSMtoCores for SM 8.6 is undefined. Default to use 64 Cores / SM
MapSMtoArchName for SM 8.6 is undefined. Default to use Ampere
GPU Device 0: “Ampere” with compute capability 8.6
I finally come to my question. On both machines, I notice that the parameters “MapSMtoCores” and “MapSMtoArchName” are not defined and take a default value. Also, does this interfere with performance and if so how can I reset these settings to the correct value for each GPU card?
Thank you for your attention and for your potential responses!