Thrust or NVCC Bug - NVCC hangs

Using Cuda compilation tools, release 9.2, V9.2.148
Using latest Nsight (5.6.0.18146)
Using MSVC 2015

The following snippet hangs indefinitely when I try to compile it. Can you all confirm?

This is roughly based on c++ - segmentation fault on resizing a vector of large structures - Stack Overflow and I was trying to recreate a stack overflow I saw with thrust in real code.

Consider the following:

Kernel.cuh:
struct tmp_t {
int a_data[100000000];
};
int doAdd();

Kernel.cu:
#include “kernel.cuh”
#include <thrust/host_vector.h>

int doAdd()
{
thrust::host_vector<tmp_t> v_tmp;
v_tmp.resize(1);
return v_tmp.size();
}

Main.cpp:
#include “kernel.cuh”
void main() {
int ret = doAdd();
}

The following command hangs indefinitely and chews up memory:
1> C:\Users.…\CuTest>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin\nvcc.exe” -gencode=arch=compute_37,code="sm_37,compute_37" --use-local-env -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64” -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64\Debug\vc140.pdb /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj “C:\Users.…\CuTest\kernel.cu”

Yes, it seems to take a very long time to compile. According to my test the compiler is using about 4GB for the compilation phase.

I’m not sure a code that uses thrust vectors where each vector element is 400MB is sensible, in any way.

Feel free to file a bug at developer.nvidia.com

as a workaround, compile a release project instead of a debug project.

The bug submit form doesn’t seem to work - I get a forbidden error.

File a nearly empty bug report. Then come back and update it later.

There may be a simple workaround available here. If you add assignment operator, copy constructor and default constructor to the structure, the compile time is okay.

Changing the structure to

struct tmp_t {
int a_data[ASIZE];

__host__ __device__ tmp_t () {};
__host__ __device__ tmp_t (const tmp_t &in) {
  for (int i = 0; i < ASIZE; ++i) {
    a_data[i] = in.a_data[i];
  }
}
__host__ __device__ tmp_t & operator= (const tmp_t &in) {
  for (int i = 0; i < ASIZE; ++i) {
    a_data[i] = in.a_data[i];
  }
}

};

fixes the issue here. Can you try this on your end and let us know if it helps?