CUDA : how to set MapSMtoCores and MapSMtoArchName parameters (Windows 10; Quadro P2000 & GeForce RTX 3070)?

Hello,
my organization migrating to Matlab R2021a, I see myself having to install the versions of CUDA, CUDNN and TENSORRT compatible with this new version of Matlab (the software considered are specified on the page “Documentation Home - MathWorks France /gpucoder/gs/install-prerequisites.html “). As a test, I installed CUDA 11.0.3, CUDNN 8.1.0 and TENSORRT 7.2.3.4. I specify here that I am carrying out 2 installations at the same time, one on a Xeon Gold 6146 128 GB of RAM station equipped with a QUADRO P2000, and the other on an i7-10870H 32 GB of RAM laptop equipped with a GEFORCE RTX 3070.
As recommended by MathWorks, I also installed the Visual Studio edition of NSIGHT in its version 2021.1.0. I took the opportunity to add NSIGHT Integration to the updated version of Visual Studio Cummunity (16.11.4 at the time of posting).
I have carefully followed the installation instructions for the correct version of the documentation for each software and I am now in the testing phase. As indicated in paragraph 2.2 of the CUDA 11.0 Quick Start Guide (Quick Start Guide :: CUDA Toolkit Documentation), to check that my installation, I built the solution presented as an example and successfully generated an executable on both computer (see VS 2019 Outputs shown below).
L’opération de génération a démarré…
1>------ Début de la génération : Projet : nbody, Configuration : Debug x64 ------
1>Compiling CUDA source file bodysystemcuda.cu…
1>
1>C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.0\5_Simulations\nbody>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc.exe” -gencode=arch=compute_35,code=“sm_35,compute_35” -gencode=arch=compute_37,code=“sm_37,compute_37” -gencode=arch=compute_50,code=“sm_50,compute_50” -gencode=arch=compute_52,code=“sm_52,compute_52” -gencode=arch=compute_60,code=“sm_60,compute_60” -gencode=arch=compute_61,code=“sm_61,compute_61” -gencode=arch=compute_70,code=“sm_70,compute_70” -gencode=arch=compute_75,code=“sm_75,compute_75” -gencode=arch=compute_80,code=“sm_80,compute_80” --use-local-env -ccbin “C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64” -x cu -I./ -I…/…/common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0/include” -I…/…/common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include” -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -Xcompiler “/wd 4819” -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64/Debug/vc142.pdb /FS /Zi /RTC1 /MTd " -o x64/Debug/bodysystemcuda.cu.obj “C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.0\5_Simulations\nbody\bodysystemcuda.cu”
1>CUDACOMPILE : nvcc warning : The ‘compute_35’, ‘compute_37’, ‘compute_50’, ‘sm_35’, ‘sm_37’ and ‘sm_50’ architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1>bodysystemcuda.cu
1>Génération du projet “nbody_vs2019.vcxproj” terminée.
1>nbody.cpp
1>render_particles.cpp
1>Génération de code en cours…
1> Création de la bibliothèque …/…/bin/win64/Debug/nbody.lib et de l’objet …/…/bin/win64/Debug/nbody.exp
1>nbody_vs2019.vcxproj → C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.0\bin\win64\Debug\nbody.exe
========== Génération : 1 a réussi, 0 a échoué, 0 à jour, 0 a été ignoré ==========
On each of the computers, the executable works well and returns some information to me via a DOS window.
With QUADRO P2000, we have :
Run “nbody -benchmark [-numbodies = ]” to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values ​​for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies = (number of bodies (> = 1) to run in simulation)
-device = (where d = 0,1,2 … for the CUDA device to use)
-numdevices = (where i = (number of CUDA devices> 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy = <file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: “Pascal” with compute capability 6.1

Compute 6.1 CUDA device: [NVIDIA Quadro P2000]

With GEFORCE RTX 3070, we have :
Run “nbody -benchmark [-numbodies = ]” to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values ​​for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies = (number of bodies (> = 1) to run in simulation)
-device = (where d = 0,1,2 … for the CUDA device to use)
-numdevices = (where i = (number of CUDA devices> 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy = <file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
MapSMtoCores for SM 8.6 is undefined. Default to use 64 Cores / SM
MapSMtoArchName for SM 8.6 is undefined. Default to use Ampere
GPU Device 0: “Ampere” with compute capability 8.6

I finally come to my question. On both machines, I notice that the parameters “MapSMtoCores” and “MapSMtoArchName” are not defined and take a default value. Also, does this interfere with performance and if so how can I reset these settings to the correct value for each GPU card?

Thank you for your attention and for your potential responses!

You have installed Cuda 11.0. For support of SM8.6 devices (RTX3070), you need Cuda >= 11.1.

Ok, thanks for your answer. I now understand what’s going wrong between RTX3070 and CUDA 11.0. However, I must keep CUDA 11.0 because it is the version recommended by MathWorks for MATLAB R2021a to work correctly with CUDA…
thanks again