Cuda compilation error: class template has already been defined and invalid records warnings

I have posted the same post on stackoverflow but seems like nobody knows the solution.
Hope I can get some help here.

POST:
I am new to cuda and tried to run a simple vector add sample that I found online to get started. I am using win10 64bit and visual studio 2017.

#include "cuda_runtime.h"
#include "cuda.h"
#include "device_launch_parameters.h"
#include <iostream>


#include <math.h>
// Kernel function to add the elements of two arrays
__global__
void add(int n, float *x, float *y)
{
    int index = threadIdx.x;
    int stride = blockDim.x;
    for (int i = index; i < n; i += stride)
        y[i] = x[i] + y[i];
}

int main(void)
{
    int N = 1 << 20;
    float *x, *y;

    // Allocate Unified Memory – accessible from CPU or GPU
    cudaMallocManaged(&x, N * sizeof(float));
    cudaMallocManaged(&y, N * sizeof(float));

    // initialize x and y arrays on the host
    for (int i = 0; i < N; i++) {
        x[i] = 1.0f;
        y[i] = 2.0f;
    }

    // Run kernel on 1M elements on the GPU
    add <<<1, 1 >>>(N, x, y);

    // Wait for GPU to finish before accessing on host
    cudaDeviceSynchronize();

I used the “Developer Command Prompt for VS2017” since the window’s command prompt is giving me

nvcc fatal : Cannot find compiler 'cl.exe' in PATH

and the online solutions didn’t work for me. Then I ran this command (the --compiler -options solved some of the errors already)

nvcc add.cu --compiler-options "-D _WIN64"

but the compiler is still giving me these errors

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(95): error: class template "std::_Is_function" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(139): error: class template "std::_Is_memfunptr" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(1824): error: class template "std::result_of" has already been defined

C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\type_traits(1824): error: class template "std::result_of" has already been defined

I have been looking for solutions online. Seems like the other people with the similar problem were having issues with their included headers but my sample code is downloaded from the internet and the person who uploaded it didn’t have any problem with it(also I don’t see any problems with it), which makes me confused on what part of the program have the problem.

P.S.: I don’t know if my cuda is installed properly. Details: I wasn’t able to install cuda on my windows, the installer keeps telling me installation failed. Then I found a solution in this link at post #19: https://devtalk.nvidia.com/default/topic/1035535/cuda-setup-and-installation/cuda-9-2-does-not-work-with-visual-studio-2017-15-7-1/2 It seems to work fine for me but I don’t know if it causes the problem.

Your CUDA install is incorrect.

Follow the instructions here:

[url]CUDA 9.1 cannot install due to failed Visual Studio Integration - CUDA Setup and Installation - NVIDIA Developer Forums

If it still fails to install correctly, it means you’ve not removed every last scrap of NVIDIA software from your system before running the install. Go back and repeat the process. It works as confirmed by many folks in that thread and elsewhere.

Also note that current versions of VS 2017 toolchain will not work with CUDA 9.2. You must use a 15.6 or prior toolchain, which you can get from here:

[url]https://docs.microsoft.com/en-us/visualstudio/productinfo/installing-an-earlier-release-of-vs2017[/url]

Since this is not a programming question, further questions in this area should be posted to the CUDA install forum section:

[url]https://devtalk.nvidia.com/default/board/58/cuda-setup-and-installation/[/url]

Where you will also find a large number of similar topics.

After install, if you cannot follow the instructions in the windows install guide to build projects and test your CUDA install, your CUDA install is incorrect, and there is not much point in going on to try other methods of compiling/running codes.

I have followed the instructions from your first link to reinstall CUDA 9.2 and vs2017 15.6.7.

After a couple times of reinstalling with the same method, the same error is still here.
Then I tried to do the same reinstalling process with CUDA 9.1

But when I run “Command Prompt for VS2017” and run

nvcc add.cu -o add_cuda

This error pops up

c:\program files\nvidia gpu computing toolkit\cuda\v9.1\include\crt/host_config.h(135): fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported!

The online solutions didn’t work for me, such as installing “VC++ 2015.3 v140 toolset” and modifying “host_config” line 133…

But I randomly found out that the error didn’t show up when I use “VS2015 x64 Native Tools Command Prompt”. (I don’t know why, can someone explain to me)

The test of showing it works:

Here is the code:

#include "cuda_runtime.h"
#include "cuda.h"
#include "device_launch_parameters.h"
#include "stdio.h"

__global__ void mykernel() {
	printf("Hello world from GPU!\n");
}

int main(){
	printf("Hello world from CPU!\n");
	mykernel <<<1,1>>> ();
	return 0;
}

(nooby question: I don’t know why it doesn’t work with “std:cout”, will be great if someone can explain this too)

Here are the results:

>nvcc helloworld.cu -o helloworld_cuda
helloworld.cu
   Creating library helloworld_cuda.lib and object helloworld_cuda.exp
>helloworld_cuda
Hello world from CPU!
Hello world from GPU!
>nvprof helloworld_cuda
Hello world from CPU!
==22376== NVPROF is profiling process 22376, command: helloworld_cuda
Hello world from GPU!
==22376== Profiling application: helloworld_cuda
==22376== Warning: Found 63 invalid records in the result.
==22376== Warning: This can happen if device ran out of memory or if a device kernel was stopped due to an assertion.
==22376== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  60.161us         1  60.161us  60.161us  60.161us  mykernel(void)
      API calls:   76.22%  167.30ms         1  167.30ms  167.30ms  167.30ms  cudaLaunch
                   22.87%  50.205ms         1  50.205ms  50.205ms  50.205ms  cuDevicePrimaryCtxRelease
                    0.56%  1.2224ms        32  38.200us     514ns  645.14us  cuDeviceGetAttribute
                    0.23%  501.20us         1  501.20us  501.20us  501.20us  cuModuleUnload
                    0.11%  236.47us         1  236.47us  236.47us  236.47us  cuDeviceGetName
                    0.01%  12.852us         1  12.852us  12.852us  12.852us  cuDeviceTotalMem
                    0.00%  9.7670us         1  9.7670us  9.7670us  9.7670us  cudaConfigureCall
                    0.00%  5.6540us         2  2.8270us  1.5420us  4.1120us  cuDeviceGet
                    0.00%  3.0850us         2  1.5420us     514ns  2.5710us  cuDeviceGetCount

According to the code above and the “add.cu” I previously showed, some warnings show up in the “nvprof” result.

==23196== NVPROF is profiling process 23196, command: add_cuda
Max error: 0
==23196== Profiling application: add_cuda
==23196== Warning: Found 62 invalid records in the result.
==23196== Warning: This can happen if device ran out of memory or if a device kernel was stopped due to an assertion.
==23196== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  165.23ms         1  165.23ms  165.23ms  165.23ms  add(int, float*, float*)
      API calls:   39.10%  178.61ms         2  89.305ms  3.9659ms  174.64ms  cudaMallocManaged
                   36.20%  165.40ms         1  165.40ms  165.40ms  165.40ms  cudaDeviceSynchronize
                   12.67%  57.871ms         1  57.871ms  57.871ms  57.871ms  cuDevicePrimaryCtxRelease
                   11.29%  51.574ms         1  51.574ms  51.574ms  51.574ms  cudaLaunch
                    0.53%  2.4130ms         2  1.2065ms  908.85us  1.5041ms  cudaFree
                    0.13%  581.40us        33  17.618us     514ns  273.48us  cuDeviceGetAttribute
                    0.05%  210.76us         1  210.76us  210.76us  210.76us  cuModuleUnload
                    0.04%  164.50us         1  164.50us  164.50us  164.50us  cuDeviceGetName
                    0.00%  8.2250us         1  8.2250us  8.2250us  8.2250us  cuDeviceTotalMem
                    0.00%  4.1120us         1  4.1120us  4.1120us  4.1120us  cudaConfigureCall
                    0.00%  3.5980us         3  1.1990us     514ns  2.5700us  cuDeviceGetCount
                    0.00%  2.5710us         2  1.2850us     514ns  2.0570us  cudaSetupArgument
                    0.00%  1.5420us         2     771ns     514ns  1.0280us  cuDeviceGet

==23196== Unified Memory profiling result:
Device "GeForce GTX 1080 (0)"
   Count  Avg Size  Min Size  Max Size  Total Size  Total Time  Name
    2048  4.0000KB  4.0000KB  4.0000KB  8.000000MB  38.02006ms  Host To Device
     384  32.000KB  32.000KB  32.000KB  12.00000MB  11.05322ms  Device To Host

where I don’t know what to do with this warning:

==23196== Warning: Found 62 invalid records in the result.
==23196== Warning: This can happen if device ran out of memory or if a device kernel was stopped due to an assertion.

As this link https://devtalk.nvidia.com/default/topic/1028975/cuda-programming-and-performance/always-got-this-warning-when-nvprof-cuda-file-quot-this-can-happen-if-device-ran-out-of-memory-or-if-a-device-kernel-was-stopped-due-to-an-assertion-quot-on-just-hellowworld-gpu/ have the same problem as I do, it didn’t provide any solution at the end. I didn’t find any other solutions in other places. So I will show the details of my setup as reference below:

For deviceQuery, here are the details:

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1_Utilities\deviceQuery\../../bin/win64/Debug/deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8192 MBytes (8589934592 bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1734 MHz (1.73 GHz)
  Memory Clock rate:                             5005 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 8 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1
Result = PASS

For vectorAdd, here are the details:

When I run the .cu file with “VS2015 x64 Native Tools Command Prompt”, these errors showed up:

vectorAdd.cu(25): fatal error C1083: Cannot open include file: 'helper_cuda.h': No such file or directory

Then I tried to build the solution with VS2017, some errors showed up:

Severity	Code	Description	Project	File	Line	Suppression State
Error	C1189	#error:  -- unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported!	vectorAdd	c:\program files\nvidia gpu computing toolkit\cuda\v9.1\include\crt\host_config.h	135	

Severity	Code	Description	Project	File	Line	Suppression State
Error	MSB3721	The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe" -gencode=arch=compute_30,code=\"sm_30,compute_30\" -gencode=arch=compute_35,code=\"sm_35,compute_35\" -gencode=arch=compute_37,code=\"sm_37,compute_37\" -gencode=arch=compute_50,code=\"sm_50,compute_50\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_60,code=\"sm_60,compute_60\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" --use-local-env --cl-version 2017 -ccbin "C:\Programming\Microsoft Visual Studio017\Community\VC\Tools\MSVC4.13.26128\bin\HostX86\x64" -x cu  -I./ -I../../common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\/include" -I../../common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static -Xcompiler "/wd 4819" -g   -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/vectorAdd.cu.obj "C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1

Severity Code Description Project File Line Suppression State
Error C1189 #error: – unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported! vectorAdd c:\program files\nvidia gpu computing toolkit\cuda\v9.1\include\crt\host_config.h 135

Severity Code Description Project File Line Suppression State
Error MSB3721 The command ““C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe” -gencode=arch=compute_30,code="sm_30,compute_30" -gencode=arch=compute_35,code="sm_35,compute_35" -gencode=arch=compute_37,code="sm_37,compute_37" -gencode=arch=compute_50,code="sm_50,compute_50" -gencode=arch=compute_52,code="sm_52,compute_52" -gencode=arch=compute_60,code="sm_60,compute_60" -gencode=arch=compute_61,code="sm_61,compute_61" -gencode=arch=compute_70,code="sm_70,compute_70" --use-local-env --cl-version 2017 -ccbin “C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\HostX86\x64” -x cu -I./ -I…/…/common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1/include” -I…/…/common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -Xcompiler “/wd 4819” -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler “/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/vectorAdd.cu.obj “C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd\vectorAdd.cu”” exited with code 2. vectorAdd C:\Programming\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\BuildCustomizations\CUDA 9.1.targets 707

_Simple\vectorAdd\vectorAdd.cu"" exited with code 2.	vectorAdd	C:\Programming\Microsoft Visual Studio017\Community\Common7\IDE\VC\VCTargets\BuildCustomizations\CUDA 9.1.targets	707

For “nvidia-smi”, here are the details:

Fri Jul 06 22:43:16 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.19                 Driver Version: 388.19                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080   WDDM  | 00000000:08:00.0  On |                  N/A |
| 59%   46C    P2    37W / 180W |    446MiB /  8192MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1448    C+G   Insufficient Permissions                   N/A      |
|    0      2860    C+G   ...xperience\NVIDIA GeForce Experience.exe N/A      |
|    0      3672    C+G   Insufficient Permissions                   N/A      |
|    0      5396    C+G   ...o017\Community\Common7\IDE\devenv.exe N/A      |
|    0      7308    C+G   ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0      7520    C+G   ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
|    0      8800    C+G   ...osoft.LockApp_cw5n1h2txyewy\LockApp.exe N/A      |
|    0     16280    C+G   ...DIA GeForce Experience\NVIDIA Share.exe N/A      |
|    0     17312    C+G   ... Files (x86)\Dell Update\DellUpTray.exe N/A      |
+-----------------------------------------------------------------------------+

For “nvprof --version”, here are the details:

nvprof: NVIDIA (R) Cuda command line profiler
Copyright (c) 2012 - 2017 NVIDIA Corporation
Release version 9.1.85 (21)

Hope someone knows why there are so many problems in my setup environment. I can provide any details if need.
Thanks for helping!!!

Severity	Code	Description	Project	File	Line	Suppression State
Error	C1189	#error:  -- unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported!

txbob pointed out in #2 why you are likely seeing this error, and what you can do to fix the issue (install a slightly older version of MSVS).

An updated to the vectorAdd problem:

This error from building the solution:

Severity	Code	Description	Project	File	Line	Suppression State
Error	C1189	#error:  -- unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported!	vectorAdd	c:\program files\nvidia gpu computing toolkit\cuda\v9.1\include\crt\host_config.h	135	

Severity	Code	Description	Project	File	Line	Suppression State
Error	MSB3721	The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe" -gencode=arch=compute_30,code=\"sm_30,compute_30\" -gencode=arch=compute_35,code=\"sm_35,compute_35\" -gencode=arch=compute_37,code=\"sm_37,compute_37\" -gencode=arch=compute_50,code=\"sm_50,compute_50\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_60,code=\"sm_60,compute_60\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" --use-local-env --cl-version 2017 -ccbin "C:\Programming\Microsoft Visual Studio017\Community\VC\Tools\MSVC4.13.26128\bin\HostX86\x64" -x cu  -I./ -I../../common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\/include" -I../../common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static -Xcompiler "/wd 4819" -g   -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/vectorAdd.cu.obj "C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1

Severity Code Description Project File Line Suppression State
Error C1189 #error: – unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported! vectorAdd c:\program files\nvidia gpu computing toolkit\cuda\v9.1\include\crt\host_config.h 135

Severity Code Description Project File Line Suppression State
Error MSB3721 The command ““C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe” -gencode=arch=compute_30,code="sm_30,compute_30" -gencode=arch=compute_35,code="sm_35,compute_35" -gencode=arch=compute_37,code="sm_37,compute_37" -gencode=arch=compute_50,code="sm_50,compute_50" -gencode=arch=compute_52,code="sm_52,compute_52" -gencode=arch=compute_60,code="sm_60,compute_60" -gencode=arch=compute_61,code="sm_61,compute_61" -gencode=arch=compute_70,code="sm_70,compute_70" --use-local-env --cl-version 2017 -ccbin “C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\HostX86\x64” -x cu -I./ -I…/…/common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1/include” -I…/…/common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -Xcompiler “/wd 4819” -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler “/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/vectorAdd.cu.obj “C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd\vectorAdd.cu”” exited with code 2. vectorAdd C:\Programming\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\BuildCustomizations\CUDA 9.1.targets 707

_Simple\vectorAdd\vectorAdd.cu"" exited with code 2.	vectorAdd	C:\Programming\Microsoft Visual Studio017\Community\Common7\IDE\VC\VCTargets\BuildCustomizations\CUDA 9.1.targets	707

was solved according to this linkhttps://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017/and successfully builded. Here are the results:

1>------ Build started: Project: vectorAdd, Configuration: Debug x64 ------
1>Compiling CUDA source file vectorAdd.cu...
1>
1>C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1

1>------ Build started: Project: vectorAdd, Configuration: Debug x64 ------
1>Compiling CUDA source file vectorAdd.cu…
1>
1>C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe” -gencode=arch=compute_30,code="sm_30,compute_30" -gencode=arch=compute_35,code="sm_35,compute_35" -gencode=arch=compute_37,code="sm_37,compute_37" -gencode=arch=compute_50,code="sm_50,compute_50" -gencode=arch=compute_52,code="sm_52,compute_52" -gencode=arch=compute_60,code="sm_60,compute_60" -gencode=arch=compute_61,code="sm_61,compute_61" -gencode=arch=compute_70,code="sm_70,compute_70" --use-local-env --cl-version 2017 -ccbin “C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.11.25503\bin\HostX86\x64” -x cu -I./ -I…/…/common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1/include" -I…/…/common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -Xcompiler “/wd 4819” -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/vectorAdd.cu.obj “C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd\vectorAdd.cu”
1>vectorAdd.cu
1> Creating library …/…/bin/win64/Debug/vectorAdd.lib and object …/…/bin/win64/Debug/vectorAdd.exp
1>vectorAdd_vs2017.vcxproj → C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd../…/bin/win64/Debug/vectorAdd.exe
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

_Simple\vectorAdd>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe" -gencode=arch=compute_30,code=\"sm_30,compute_30\" -gencode=arch=compute_35,code=\"sm_35,compute_35\" -gencode=arch=compute_37,code=\"sm_37,compute_37\" -gencode=arch=compute_50,code=\"sm_50,compute_50\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_60,code=\"sm_60,compute_60\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" --use-local-env --cl-version 2017 -ccbin "C:\Programming\Microsoft Visual Studio017\Community\VC\Tools\MSVC4.11.25503\bin\HostX86\x64" -x cu  -I./ -I../../common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\/include" -I../../common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static -Xcompiler "/wd 4819" -g   -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/vectorAdd.cu.obj "C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1

1>------ Build started: Project: vectorAdd, Configuration: Debug x64 ------
1>Compiling CUDA source file vectorAdd.cu…
1>
1>C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe” -gencode=arch=compute_30,code="sm_30,compute_30" -gencode=arch=compute_35,code="sm_35,compute_35" -gencode=arch=compute_37,code="sm_37,compute_37" -gencode=arch=compute_50,code="sm_50,compute_50" -gencode=arch=compute_52,code="sm_52,compute_52" -gencode=arch=compute_60,code="sm_60,compute_60" -gencode=arch=compute_61,code="sm_61,compute_61" -gencode=arch=compute_70,code="sm_70,compute_70" --use-local-env --cl-version 2017 -ccbin “C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.11.25503\bin\HostX86\x64” -x cu -I./ -I…/…/common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1/include" -I…/…/common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -Xcompiler “/wd 4819” -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/vectorAdd.cu.obj “C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd\vectorAdd.cu”
1>vectorAdd.cu
1> Creating library …/…/bin/win64/Debug/vectorAdd.lib and object …/…/bin/win64/Debug/vectorAdd.exp
1>vectorAdd_vs2017.vcxproj → C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd../…/bin/win64/Debug/vectorAdd.exe
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

_Simple\vectorAdd\vectorAdd.cu"
1>vectorAdd.cu
1>   Creating library ../../bin/win64/Debug/vectorAdd.lib and object ../../bin/win64/Debug/vectorAdd.exp
1>vectorAdd_vs2017.vcxproj -> C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1

1>------ Build started: Project: vectorAdd, Configuration: Debug x64 ------
1>Compiling CUDA source file vectorAdd.cu…
1>
1>C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin\nvcc.exe” -gencode=arch=compute_30,code="sm_30,compute_30" -gencode=arch=compute_35,code="sm_35,compute_35" -gencode=arch=compute_37,code="sm_37,compute_37" -gencode=arch=compute_50,code="sm_50,compute_50" -gencode=arch=compute_52,code="sm_52,compute_52" -gencode=arch=compute_60,code="sm_60,compute_60" -gencode=arch=compute_61,code="sm_61,compute_61" -gencode=arch=compute_70,code="sm_70,compute_70" --use-local-env --cl-version 2017 -ccbin “C:\Programming\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.11.25503\bin\HostX86\x64” -x cu -I./ -I…/…/common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1/include" -I…/…/common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -Xcompiler “/wd 4819” -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/vectorAdd.cu.obj “C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd\vectorAdd.cu”
1>vectorAdd.cu
1> Creating library …/…/bin/win64/Debug/vectorAdd.lib and object …/…/bin/win64/Debug/vectorAdd.exp
1>vectorAdd_vs2017.vcxproj → C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\0_Simple\vectorAdd../…/bin/win64/Debug/vectorAdd.exe
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

_Simple\vectorAdd\../../bin/win64/Debug/vectorAdd.exe
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
>vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
>nvprof vectorAdd
[Vector addition of 50000 elements]
==13136== NVPROF is profiling process 13136, command: vectorAdd
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==13136== Profiling application: vectorAdd
==13136== Warning: Found 65 invalid records in the result.
==13136== Warning: This can happen if device ran out of memory or if a device kernel was stopped due to an assertion.
==13136== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   70.67%  182.59us         2  91.296us  91.264us  91.329us  [CUDA memcpy HtoD]
                   25.27%  65.280us         1  65.280us  65.280us  65.280us  [CUDA memcpy DtoH]
                    4.06%  10.496us         1  10.496us  10.496us  10.496us  vectorAdd(float const *, float const *, float*, int)
      API calls:   77.91%  191.04ms         3  63.681ms  4.1120us  191.03ms  cudaMalloc
                   20.72%  50.819ms         1  50.819ms  50.819ms  50.819ms  cuDevicePrimaryCtxRelease
                    0.46%  1.1397ms         1  1.1397ms  1.1397ms  1.1397ms  cuModuleUnload
                    0.39%  963.34us        31  31.075us     514ns  506.35us  cuDeviceGetAttribute
                    0.30%  727.90us         3  242.63us  131.08us  371.66us  cudaMemcpy
                    0.12%  290.96us         3  96.985us  28.273us  223.10us  cudaFree
                    0.07%  169.64us         1  169.64us  169.64us  169.64us  cuDeviceGetName
                    0.02%  38.040us         1  38.040us  38.040us  38.040us  cudaLaunch
                    0.00%  8.2250us         1  8.2250us  8.2250us  8.2250us  cuDeviceTotalMem
                    0.00%  3.5990us         3  1.1990us     514ns  2.5710us  cuDeviceGetCount
                    0.00%  3.5980us         1  3.5980us  3.5980us  3.5980us  cudaConfigureCall
                    0.00%  3.5980us         2  1.7990us  1.0280us  2.5700us  cudaSetupArgument
                    0.00%  2.0560us         1  2.0560us  2.0560us  2.0560us  cudaGetLastError
                    0.00%  1.0280us         2     514ns     514ns     514ns  cuDeviceGet

This method helped me to successfully build the CUDA sample solution and also it can run “addVector” application on both “Developer Command Prompt for VS 2017” and “VS2015 x64 Native Tools Command Prompt”.

As I mentioned before, my own “add.cu” can only run on “VS2015 x64 Native Tools Command Prompt”.
With this method, I am able to build the project solution with VS and run the application in “Developer Command Prompt for VS 2017”.
P.S. I am still unable to build “add.cu” in the “Developer Command Prompt for VS 2017” using the “nvcc” command.

Here are the result:
(the “add.cu” is located in the “test.sln” so the generated application is called “test”)

>test
Max error: 0
>nvprof test
==24104== NVPROF is profiling process 24104, command: test
Max error: 0
==24104== Profiling application: test
==24104== Warning: Found 63 invalid records in the result.
==24104== Warning: This can happen if device ran out of memory or if a device kernel was stopped due to an assertion.
==24104== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  820.82ms         1  820.82ms  820.82ms  820.82ms  add(int, float*, float*)
      API calls:   75.52%  821.02ms         1  821.02ms  821.02ms  821.02ms  cudaDeviceSynchronize
                   19.21%  208.91ms         2  104.45ms  3.7845ms  205.13ms  cudaMallocManaged
                    4.88%  53.010ms         1  53.010ms  53.010ms  53.010ms  cudaLaunch
                    0.32%  3.4879ms         2  1.7439ms  1.5884ms  1.8994ms  cudaFree
                    0.06%  631.26us        32  19.726us     514ns  308.43us  cuDeviceGetAttribute
                    0.01%  139.82us         1  139.82us  139.82us  139.82us  cuDeviceGetName
                    0.00%  7.7110us         1  7.7110us  7.7110us  7.7110us  cuDeviceTotalMem
                    0.00%  6.6820us         1  6.6820us  6.6820us  6.6820us  cudaConfigureCall
                    0.00%  3.0850us         3  1.0280us     514ns  2.0570us  cuDeviceGetCount
                    0.00%  3.0840us         3  1.0280us     514ns  2.0560us  cudaSetupArgument
                    0.00%  1.0280us         1  1.0280us  1.0280us  1.0280us  cuDeviceGet

==24104== Unified Memory profiling result:
Device "GeForce GTX 1080 (0)"
   Count  Avg Size  Min Size  Max Size  Total Size  Total Time  Name
    2048  4.0000KB  4.0000KB  4.0000KB  8.000000MB  38.50172ms  Host To Device
     384  32.000KB  32.000KB  32.000KB  12.00000MB  15.64681ms  Device To Host

Neglect the problem of unable to build with the “Developer Command Prompt for VS 2017” using the “nvcc”, the main problem is still the warnings from “nvprof”.

Thanks for reading the long post!

Thanks for reminding, it was unclear to me at first, now I solved that problem.
Do you have any idea on the warnings from the “nvprof”?