NVCC on Windows

Question 1- How to Run CUDA Program on NVCC on Windows command line?
Question 2 - HOw to use Device Query on Windows command line?

Environment - Windows 7, Visual Studio 2015, CUDA 8.0

(1) Do you mean how to compile a CUDA programm with nvcc from the Windows command prompt? A simple case, with one source file and one target architecture would look like so:

nvcc -o [executable_name].exe -arch=[compute_capability] [source_file].cu

For example:

nvcc -o foo.exe -arch=sm_50 foo.cu

(2) Building and running deviceQuery from the Windows command prompt (you will need to adjust the paths to match your directory structure and MSVC and CUDA versions, I used MSVS 10.0 and CUDA 7.5 here):

C:\Users\Norbert\My Programs>cl /EHsc /I"c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE" /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include" /I"C:\Users\All Users\NVIDIA Corporation\CUDA Samples\v7.5\common\inc" deviceQuery.cpp "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64\cudart.lib"
Copyright (C) Microsoft Corporation.  All rights reserved.

deviceQuery.cpp
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:deviceQuery.exe
deviceQuery.obj
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64\cudart.lib"

C:\Users\Norbert\My Programs>deviceQuery
deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro K2200"
  CUDA Driver Version / Runtime Version          8.0 / 7.5
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 4096 MBytes (4294967296 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1124 MHz (1.12 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro K2200
Result = PASS

Hi,

So when I attempt to execute the command you mentioned from the location (folder) where file.cu exists, the error is as follows:
Error:
LINK: fatal error LNK1104: cannot open file ‘uuid.lib’

Next, I try to execute it from the command prompt path, where the exe is located, i.e. inside the debug folder. The error is as follows:
Error:
clxx:fatal error C1083: Cannot open source file:‘file.cu’: No such file or directory

So clearly the location for the second option is wrong.

So please guide me, with the following details:

  1. What is uuid.lib and how I can rectify it?
  2. From which location in the command prompt should I execute the nvcc command you spoke off?
  3. From which location (and what is the command), to execute Device Query?

Hi,

I tried the method to execute the Device Query suggested by you. Here are the changes I did to the command and this is the error I am getting…

c:\Users\ns76685w>cl /EHsc /I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include" /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" /I"C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\common\inc" deviceQuery.cpp “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\cudart.lib”

Error:

Microsoft ® C/C++ Optimizing Compiler Version 19.00.24215.1 for x86
Copyright © M
icrosoft Corporation. All rights reserved.

deviceQuery.cpp
c1xx:fatal error C1083: Cannot open source file: ‘deviceQuery.cpp’: No such file or directoty

Obviously the source file deviceQuery.cpp must exist in the current directory when using the file name without a path, as I did in my worked example.

Either copy the file deviceQuery.cpp from the CUDA examples directory to your current directory (which is why I did), run the compilation from the directory where the CUDA 8 installation placed this file, or reference the file using the appropriate path.

This is not an issue specific to CUDA or programming under Windows, just basic build mechanics that work pretty such the same for all tool chains and operating systems.

Hi,

So now I made a folder called MyProjects and copied deviceQuery.cpp in that folder. Then on the command line, the current directory path is the address to this folder, where deviceQuery.cpp is located. Now I ran the same command as below, but still getting error.

Current Directory:
C:\Users\ns76685w\Documents\Visual Studio 2015\MyProjects

Command:

cl /EHsc /I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include" /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" /I"C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\common\inc" deviceQuery.cpp “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\cudart.lib”

Error:
LINK: fatal error LNK1104: Cannot open the file ‘libcpmt.lib’


deviceQuery.cpp (9.72 KB)
MyProjects.zip (15.1 KB)

I don’t know what libcpmt.lib is, presumably it is a library used by MSVC. Have you had a chance to familiarize yourself with command line operation of the Microsoft compiler? In particular, have you set up the environment for the Microsoft compiler, for example by invoking the appropriate batch file (“shell script”) that ships with MSVS? E.g.

"c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcvarsall.bat" amd64

Note that what is relevant for compiling the deviceQuery application is the host compiler, not the CUDA compiler. The only thing CUDA-specific in the build process for deviceQuery is to point the host compiler at the header files and libraries supplied by CUDA. So a good place to start is to make sure that a simple “Hello, World” program can be successfully compiled with the command line version of MSVC.

Hi,

I have tried to invoke the batch file instructed by you and now, the set of errors are different. Kindly have a look at the attached error files. I have tried simpler programs on MSVC command line like Hello Worlds and playing with few threads, but I am not able to use NVCC compiler on command line.

I need to know how to go about it. Please guide.

Error2.JPG

Error2.JPG

As I stated, compiling deviceQuery has nothing to do with the CUDA tool chain, or nvcc in particular.

As for your build error (you only show a partial log) you seem to be linking to a CUDART library version that is incompatible with your Windows platform. Do you have a 32-bit or 64-bit Windows platform?

Notice that the batch file invocation I showed above takes the argument “amd64” to indicate a 64-bit system. The batch file may resort to setting up a 32-bit environment if you leave off that argument. If you supply the argument “amd64” the output from the batch file should look similar to this:

Setting environment for using Microsoft Visual Studio 2010 x64 tools.

Note the “x64” indicating that this set up for a 64-bit Windows platform.

But I cannot compile or run any program on NVCC command line?
So are you saying that I cannot run command from NVCC compiler and that it is mandatory to use MSVS for compiling and executing.

I have heard that people are able to use nvcc fluently on linux. I wanted to use it on Windows, now that I have GTX 1080, but somehow I cannot figure out why or where I have gone wrong?
Error2.JPG

Error2.JPG

Yes, you can compile programs with nvcc from the Windows command prompt, and run them from there. But deviceQuery.cpp is a C++ program (not a CUDA program, note the .cpp extension instead of a .cu extension) that does not require the CUDA toolchain because it does not contain code that is executed on the GPU. The CUDA tool chain is, by design, tightly integrated with the host toolchain, so in order to use the CUDA tool chain you must first have an operational and supported version of the host toolchain installed.

Here is a worked example of building and running a CUDA program from the Windows command line, on my 64-bit Windows 7 machine. Note: error checking omitted for brevity, do not code without it for production code.

C:\Users\Norbert\My Programs>cat helloworld.cu
#include <stdio.h>
#include <stdint.h>

__global__ void kernel (void)
{
    printf ("Hello, world, from the GPU\n");
}

int main (void)
{
    printf ("Hello, world, from the CPU\n");
    kernel<<<1,1>>>();
    cudaDeviceSynchronize();
    return EXIT_SUCCESS;
}

C:\Users\Norbert\My Programs>nvcc -o helloworld.exe -arch=sm_50 helloworld.cu
helloworld.cu
   Creating library helloworld.lib and object helloworld.exp

C:\Users\Norbert\My Programs>helloworld
Hello, world, from the CPU
Hello, world, from the GPU

So, I tried typing cat filename.cu, which threw an error, stating that cat is not an internal or external command. Then I tried to type the code on txt file and saved it as helloworld.cu in the current folder location and tried nvcc command suggested by you. Still it threw the error: Link: Fatal error cannot open file ‘uuid.lib’. Kindly refer the attachment.
Error3.JPG

Error3.JPG

“cat” works on my machine because I have Cygwin installed. If you plan to do a reasonable amount of programming using command line tools on Windows, I would strongly suggest installing it, so you have all the the standard *nix tools available to you. I think the Windows equivalent of “cat” is “type”.

Missing uuid.lib sounds like a familiar problem of not having set up the environment variables for the Microsoft SDK. I think the following may be the kind of path you need to append to the LIB environment variable:

"C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\lib\x64"

Likewise append to the INCLUDE environment variable:

"C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\include"

Also, you should have the WindowsSdkDir variable in the environment:

WindowsSdkDir=C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\

Obviously the specific paths are going to depend on what version you have installed. Note that running the vcvarsall.bat script should have set up these paths, so I am not sure what is going on with your system.

Thank you Norbert.

Set all the things, Cygwin, Windows SDK Paths, etc. and now atleast I am able to execute nvcc commands (see the attachment). Really a sigh of relief. Earlier, Windows SDK version 7.0 folder did not have include and lib folder and that is why probably, I was getting the error. So I tried other folders and it seemed that version7.1 had include and lib folders. Let me go ahead and perform more experiments on it.

But device Query is still a mystery?
Actually the reason I am after deviceQuery is that, my GeForce GTX 1080 FTW GPU, is showing Global memory of 4 GB through the CUDA program that I have written. But there is an observation on the internet where a user has published his GTX1080 deviceQuery which indicated 8GB of Total Global Memory. After seeing the specs, I thought mine too should have 8GB…! That is why I want to try DeviceQuery from cl.

Any suggestions regarding it.

Regards,
Nikhil
NY,USA
9176054971

You still have a target architecture mismatch for your host code. The deviceQuery app is apparently built for 32-bit x86, while you are linking to a 64-bit CUDART library.

I assume you are actually on a 64-bit Windows platform? If so, make sure you are picking up the correct version of ‘cl’ (the Microsoft compiler). Your PATH may be set up incorrectly, causing you to pick up the 32-bit compiler instead of the 64-bit compiler. Now that you have Cygwin installed, check the location of ‘cl’ with ‘which’. E.g.:

C:\Users\Norbert\My Programs>which cl
/cygdrive/c/Program Files (x86)/Microsoft Visual Studio 10.0/VC/BIN/amd64/cl

Note the “amd64” in the path which indicates the location of the 64-bit compiler.

You still have a target architecture mismatch for your host code. The deviceQuery app is apparently built for 32-bit x86, while you are linking to a 64-bit CUDART library.

I assume you are actually on a 64-bit Windows platform? If so, make sure you are picking up the correct version of ‘cl’ (the Microsoft compiler). Your PATH may be set up incorrectly, causing you to pick up the 32-bit compiler instead of the 64-bit compiler. Now that you have Cygwin installed, check the location of ‘cl’ with ‘which’. E.g.:

C:\Users\Norbert\My Programs>which cl
/cygdrive/c/Program Files (x86)/Microsoft Visual Studio 10.0/VC/BIN/amd64/cl

Note the “amd64” in the path which indicates the location of the 64-bit compiler.