Error when compiling for architectures > 3.5

I recently switched to a GTX 1070 (from a GTX 760) and wanted to test my code for higher architectures. Unfortunately, when I compile with any architecture higher than sm_35,compute_35 I get the following error:

LNK2001 unresolved external symbol _fatbinwrap_66_tmpxft_00002350_00000000_18_cuda_device_runtime_compute_61_cpp1_ii_8b1a5d37

If I compile with sm_35 or lower, I get no compilation errors but my program crashes during runtime in a thrust::stable_sort. I’m unable to create a small testing application which reproduces this situation. I was hoping someone could point me in the right direction.

I’ve used both CUDA 7.5 and CUDA 8.0RC, when installing both did not detect my graphics card (but since it is a newer card, I think this is normal). I’m using Visual Studio 2013, and compiled my code to generate relocatable device code (I’ve also disabled this option, but no success) into a static library (again, also created an executable from the code, but no success). I compiled in x64 mode.

I’ve included the CUDA /include and /lib/x64 from the CUDA directory in the project, which made me wonder even more why I would still get a unresolved external symbol.

Can you compile the CUDA sample code projects successfully?

You probably have a corrupted VS setup or VS project setup.

Yes, I can compile and run those just fine, even for the higher architectures. I recreated a project file from the CUDA template in VS2013 and threw in all my code, compilation still results in the same error.

Meanwhile, I also uninstalled both CUDA versions and installed only the 8.0RC. This also leads to the same error.

cuda 7.5 shouldn’t even know anything about sm_61, so you’ve got some kind of strange install there.

I finally found it, below is a minimal example that reproduces this error. I compiled this on another machine with only CUDA 7.5 installed. The error is similar to the one stated above, only for version 52:

unresolved external symbol __fatbinwrap_66_tmpxft_00000888_00000000_17_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37

Compile with settings above:

  • Generate relocatable device code: true
  • x64
  • application or static library, doesn’t matter.
  • compute_52,sm_52 (or anything higher than 35)

Main.cu

#include <thrust/device_vector.h>
#include <thrust/device_ptr.h>
#include <thrust/sort.h>
#include <thrust/copy.h>

typedef unsigned __int64 uint64;

struct is2to63
{
    __device__ bool operator()(const uint64 a)
    {
        return a == 9223372036854775808;
    }
};

int main()
{
    const int nData = 10;

    // Fill array with high value
    const uint64 pow2to63 = 9223372036854775808;
    thrust::device_vector<uint64> v_data(nData);
    thrust::fill(v_data.data(), v_data.data() + nData, pow2to63);

    // Parallel stream compaction
    thrust::device_ptr<uint64> d_newEnd = thrust::remove_if(v_data.data(), v_data.data() + nData, is2to63());
}

I found that it does compile when you turn off the “generate relocatable device code”, however this would be a major change to the current structure of my project.

The weird thing is, it does compile for architecture compute_30,sm_30. I also ran this code succesfully on my GTX 760.

I was able to build your code successfully for cc 5.2, release x64 project on windows, CUDA 7.5, with relocatable device code, here is my full VS command console output:

1>------ Rebuild All started: Project: t17, Configuration: Release x64 ------
1>  
1>  c:\Users\bob-tosh\documents\visual studio 2013\Projects\t17\t17>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include"     --keep-dir x64\Release -maxrregcount=0  --machine 64 --compile      -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD " -o x64\Release\kernel.cu.obj "c:\Users\bob-tosh\documents\visual studio 2013\Projects\t17\t17\kernel.cu" -clean 
1>  kernel.cu
1>  Compiling CUDA source file kernel.cu...
1>  
1>  c:\Users\bob-tosh\documents\visual studio 2013\Projects\t17\t17>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\nvcc.exe" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include"     --keep-dir x64\Release -maxrregcount=0  --machine 64 --compile -cudart static     -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD " -o x64\Release\kernel.cu.obj "c:\Users\bob-tosh\documents\visual studio 2013\Projects\t17\t17\kernel.cu" 
1>  kernel.cu
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(241): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>::size_type', possible loss of data
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(303): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>::size_type', possible loss of data
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(319): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>::size_type', possible loss of data
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(439): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>::size_type', possible loss of data
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(250): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>::size_type', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(625) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00> thrust::system::cuda::detail::bulk_::par<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>>(ExecutionAgent,size_t)' being compiled
1>          with
1>          [
1>              ExecutionAgent=thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>
1>          ]
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(155): warning C4267: 'return' : conversion from 'size_t' to 'int', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(148) : while compiling class template member function 'int thrust::system::cuda::detail::bulk_::detail::cuda_launcher_base<0,thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Closure>::choose_group_size(int)'
1>          with
1>          [
1>              Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<unsigned __int64>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(305) : see reference to function template instantiation 'int thrust::system::cuda::detail::bulk_::detail::cuda_launcher_base<0,thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Closure>::choose_group_size(int)' being compiled
1>          with
1>          [
1>              Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<unsigned __int64>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(228) : see reference to class template instantiation 'thrust::system::cuda::detail::bulk_::detail::cuda_launcher_base<0,thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Closure>' being compiled
1>          with
1>          [
1>              Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<unsigned __int64>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/choose_sizes.inl(41) : see reference to class template instantiation 'thrust::system::cuda::detail::bulk_::detail::cuda_launcher<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Closure>' being compiled
1>          with
1>          [
1>              Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<unsigned __int64>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/choose_sizes.inl(96) : see reference to function template instantiation 'thrust::pair<int,int> thrust::system::cuda::detail::bulk_::detail::choose_sizes<thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<T>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<T>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>>(thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Closure)' being compiled
1>          with
1>          [
1>              T=unsigned __int64
1>  ,            Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<unsigned __int64>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/for_each.inl(125) : see reference to function template instantiation 'thrust::pair<int,int> thrust::system::cuda::detail::bulk_::choose_sizes<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::system::cuda::detail::bulk_::detail::cursor<0>,RandomAccessIterator,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int>(thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Function,Arg1,Arg2,Arg3,Arg4)' being compiled
1>          with
1>          [
1>              RandomAccessIterator=thrust::device_ptr<unsigned __int64>
1>  ,            Function=thrust::system::cuda::detail::for_each_n_detail::for_each_kernel
1>  ,            Arg1=thrust::system::cuda::detail::bulk_::detail::cursor<0>
1>  ,            Arg2=thrust::device_ptr<unsigned __int64>
1>  ,            Arg3=thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>
1>  ,            Arg4=unsigned int
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/allocator/allocator_traits.inl(249) : while compiling class template member function 'void thrust::detail::allocator_traits<Alloc>::deallocate(thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>> &,thrust::pointer<unsigned __int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>,unsigned __int64)'
1>          with
1>          [
1>              Alloc=thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<unsigned __int64,thrust::system::cuda::detail::tag>>
1>  ,            T=unsigned __int64
1>  ,            System=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/contiguous_storage.inl(172) : see reference to function template instantiation 'void thrust::detail::allocator_traits<Alloc>::deallocate(thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>> &,thrust::pointer<unsigned __int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>,unsigned __int64)' being compiled
1>          with
1>          [
1>              Alloc=thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<unsigned __int64,thrust::system::cuda::detail::tag>>
1>  ,            T=unsigned __int64
1>  ,            System=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/contiguous_storage.inl(169) : while compiling class template member function 'void thrust::detail::contiguous_storage<T,thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>>>::deallocate(void)'
1>          with
1>          [
1>              T=unsigned __int64
1>  ,            System=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/contiguous_storage.inl(64) : see reference to function template instantiation 'void thrust::detail::contiguous_storage<T,thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>>>::deallocate(void)' being compiled
1>          with
1>          [
1>              T=unsigned __int64
1>  ,            System=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/contiguous_storage.inl(38) : while compiling class template member function 'thrust::detail::contiguous_storage<T,thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>>>::contiguous_storage(const thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>> &)'
1>          with
1>          [
1>              T=unsigned __int64
1>  ,            System=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/temporary_array.inl(131) : see reference to function template instantiation 'thrust::detail::contiguous_storage<T,thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>>>::contiguous_storage(const thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>> &)' being compiled
1>          with
1>          [
1>              T=unsigned __int64
1>  ,            System=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/contiguous_storage.inl(90) : while compiling class template member function 'thrust::detail::normal_iterator<thrust::pointer<unsigned __int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>> thrust::detail::contiguous_storage<T,thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>>>::begin(void)'
1>          with
1>          [
1>              T=unsigned __int64
1>  ,            System=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/detail/generic/remove.inl(89) : see reference to function template instantiation 'thrust::detail::normal_iterator<thrust::pointer<unsigned __int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>> thrust::detail::contiguous_storage<T,thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>>>::begin(void)' being compiled
1>          with
1>          [
1>              T=unsigned __int64
1>  ,            System=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/temporary_array.h(38) : see reference to class template instantiation 'thrust::detail::contiguous_storage<T,thrust::detail::no_throw_allocator<thrust::detail::temporary_allocator<T,System>>>' being compiled
1>          with
1>          [
1>              T=unsigned __int64
1>  ,            System=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/detail/generic/remove.inl(86) : see reference to class template instantiation 'thrust::detail::temporary_array<unsigned __int64,DerivedPolicy>' being compiled
1>          with
1>          [
1>              DerivedPolicy=thrust::system::cuda::detail::tag
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/remove.inl(76) : see reference to function template instantiation 'ForwardIterator thrust::system::detail::generic::remove_if<thrust::system::cuda::detail::tag,ForwardIterator,Predicate>(thrust::execution_policy<thrust::system::cuda::detail::tag> &,ForwardIterator,ForwardIterator,Predicate)' being compiled
1>          with
1>          [
1>              ForwardIterator=thrust::device_ptr<unsigned __int64>
1>  ,            Predicate=is2to63
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/remove.inl(181) : see reference to function template instantiation 'ForwardIterator thrust::remove_if<DerivedPolicy,ForwardIterator,Predicate>(const thrust::detail::execution_policy_base<DerivedPolicy> &,ForwardIterator,ForwardIterator,Predicate)' being compiled
1>          with
1>          [
1>              ForwardIterator=thrust::device_ptr<unsigned __int64>
1>  ,            DerivedPolicy=thrust::system::cuda::detail::tag
1>  ,            Predicate=is2to63
1>          ]
1>          c:/Users/bob-tosh/documents/visual studio 2013/Projects/t17/t17/kernel.cu(26) : see reference to function template instantiation 'ForwardIterator thrust::remove_if<thrust::device_ptr<T>,is2to63>(ForwardIterator,ForwardIterator,Predicate)' being compiled
1>          with
1>          [
1>              ForwardIterator=thrust::device_ptr<unsigned __int64>
1>  ,            T=unsigned __int64
1>  ,            Predicate=is2to63
1>          ]
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(85): warning C4267: 'return' : conversion from 'size_t' to 'int', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(84) : while compiling class template member function 'int thrust::system::cuda::detail::bulk_::detail::cuda_launcher_base<0,thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Closure>::max_active_blocks_per_multiprocessor(const thrust::system::cuda::detail::bulk_::detail::device_properties_t &,const thrust::system::cuda::detail::bulk_::detail::function_attributes_t &,int,int)'
1>          with
1>          [
1>              Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<unsigned __int64>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(96) : see reference to function template instantiation 'int thrust::system::cuda::detail::bulk_::detail::cuda_launcher_base<0,thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Closure>::max_active_blocks_per_multiprocessor(const thrust::system::cuda::detail::bulk_::detail::device_properties_t &,const thrust::system::cuda::detail::bulk_::detail::function_attributes_t &,int,int)' being compiled
1>          with
1>          [
1>              Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<unsigned __int64>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/pair.inl(46): warning C4267: 'initializing' : conversion from 'size_t' to 'int', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(101) : see reference to function template instantiation 'thrust::pair<int,int>::pair<size_t,int>(const thrust::pair<size_t,int> &)' being compiled
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(101) : see reference to function template instantiation 'thrust::pair<int,int>::pair<size_t,int>(const thrust::pair<size_t,int> &)' being compiled
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(94) : while compiling class template member function 'thrust::pair<int,int> thrust::system::cuda::detail::bulk_::detail::cuda_launcher_base<0,thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Closure>::dynamic_smem_occupancy_limit(const thrust::system::cuda::detail::bulk_::detail::device_properties_t &,const thrust::system::cuda::detail::bulk_::detail::function_attributes_t &,int,int)'
1>          with
1>          [
1>              Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<unsigned __int64>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_launcher/cuda_launcher.hpp(119) : see reference to function template instantiation 'thrust::pair<int,int> thrust::system::cuda::detail::bulk_::detail::cuda_launcher_base<0,thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x01>,0x00>,0x00>,Closure>::dynamic_smem_occupancy_limit(const thrust::system::cuda::detail::bulk_::detail::device_properties_t &,const thrust::system::cuda::detail::bulk_::detail::function_attributes_t &,int,int)' being compiled
1>          with
1>          [
1>              Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::for_each_n_detail::for_each_kernel,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<0>,thrust::device_ptr<unsigned __int64>,thrust::detail::wrapped_function<thrust::detail::device_generate_functor<thrust::detail::fill_functor<unsigned __int64>>,void>,unsigned int,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(458): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>::size_type', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(667) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200> thrust::system::cuda::detail::bulk_::con<0x0200,0x03>(size_t)' being compiled
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/scan.inl(235) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::async_launch<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>,0x00>> thrust::system::cuda::detail::bulk_::grid<0x0200,3>(size_t,size_t,cudaStream_t)' being compiled
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/scan.inl(406) : see reference to function template instantiation 'OutputIterator thrust::system::cuda::detail::scan_detail::inclusive_scan<thrust::system::cuda::detail::tag,InputIterator,OutputIterator,AssociativeOperator>(thrust::system::cuda::detail::execution_policy<thrust::system::cuda::detail::tag> &,InputIterator,InputIterator,OutputIterator,AssociativeOperator)' being compiled
1>          with
1>          [
1>              OutputIterator=thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>
1>  ,            InputIterator=thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>
1>  ,            AssociativeOperator=thrust::plus<__int64>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/iterator/iterator_facade.h(309) : while compiling class template member function 'thrust::reference<Element,thrust::pointer<Element,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>,thrust::use_default> thrust::iterator_facade<Derived,__int64,thrust::system::cuda::detail::tag,thrust::random_access_traversal_tag,thrust::reference<Element,thrust::pointer<Element,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>,thrust::use_default>,__int64>::operator *(void) const'
1>          with
1>          [
1>              Element=__int64
1>  ,            Derived=thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/iterator/iterator_facade.h(328) : see reference to function template instantiation 'thrust::reference<Element,thrust::pointer<Element,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>,thrust::use_default> thrust::iterator_facade<Derived,__int64,thrust::system::cuda::detail::tag,thrust::random_access_traversal_tag,thrust::reference<Element,thrust::pointer<Element,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>,thrust::use_default>,__int64>::operator *(void) const' being compiled
1>          with
1>          [
1>              Element=__int64
1>  ,            Derived=thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/iterator/iterator_adaptor.h(121) : see reference to class template instantiation 'thrust::iterator_facade<Derived,__int64,thrust::system::cuda::detail::tag,thrust::random_access_traversal_tag,thrust::reference<Element,thrust::pointer<Element,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>,thrust::use_default>,__int64>' being compiled
1>          with
1>          [
1>              Derived=thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>
1>  ,            Element=__int64
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/iterator/detail/normal_iterator.h(36) : see reference to class template instantiation 'thrust::iterator_adaptor<thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,Pointer,thrust::use_default,thrust::use_default,thrust::use_default,thrust::use_default,thrust::use_default>' being compiled
1>          with
1>          [
1>              Pointer=thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/tuple.inl(256) : see reference to class template instantiation 'thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>' being compiled
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/detail/tuple.inl(257) : see reference to class template instantiation 'thrust::detail::cons<T0,thrust::detail::cons<__int64,thrust::detail::cons<T0,thrust::detail::cons<thrust::plus<__int64>,thrust::detail::map_tuple_to_cons<thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>::type>>>>' being compiled
1>          with
1>          [
1>              T0=thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/tuple.h(197) : see reference to class template instantiation 'thrust::detail::cons<T0,thrust::detail::cons<thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::detail::cons<__int64,thrust::detail::cons<thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::detail::cons<thrust::plus<__int64>,thrust::detail::map_tuple_to_cons<thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>::type>>>>>' being compiled
1>          with
1>          [
1>              T0=thrust::system::cuda::detail::bulk_::detail::cursor<1>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/closure.hpp(70) : see reference to class template instantiation 'thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<1>,thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,__int64,thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::plus<__int64>,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>' being compiled
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_task.hpp(61) : see reference to class template instantiation 'thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::scan_detail::inclusive_scan_n,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<1>,thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,__int64,thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::plus<__int64>,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>' being compiled
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/detail/cuda_task.hpp(202) : see reference to class template instantiation 'thrust::system::cuda::detail::bulk_::detail::task_base<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>,0x00>,Closure>' being compiled
1>          with
1>          [
1>              Closure=thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::scan_detail::inclusive_scan_n,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<1>,thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,__int64,thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::plus<__int64>,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>
1>          ]
1>          c:\users\bob-tosh\appdata\local\temp\tmpxft_00000bf0_00000000-2_kernel.cudafe1.stub.c(119) : see reference to class template instantiation 'thrust::system::cuda::detail::bulk_::detail::cuda_task<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>,0x00>,thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::scan_detail::inclusive_scan_n,thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<1>,thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,__int64,thrust::detail::normal_iterator<thrust::pointer<__int64,thrust::system::cuda::detail::tag,thrust::use_default,thrust::use_default>>,thrust::plus<__int64>,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>>' being compiled
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(458): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x09>,0x080>::size_type', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(667) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x09>,0x080> thrust::system::cuda::detail::bulk_::con<0x080,0x09>(size_t)' being compiled
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/scan.inl(259) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::async_launch<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x09>,0x080>,0x00>> thrust::system::cuda::detail::bulk_::grid<0x080,0x09>(size_t,size_t,cudaStream_t)' being compiled
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(458): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0100>::size_type', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(667) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0100> thrust::system::cuda::detail::bulk_::con<0x0100,0x03>(size_t)' being compiled
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/scan.inl(267) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::async_launch<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0100>,0x00>> thrust::system::cuda::detail::bulk_::grid<0x0100,0x03>(size_t,size_t,cudaStream_t)' being compiled
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(250): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>,0x00>::size_type', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(311) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>,0x00> thrust::system::cuda::detail::bulk_::par<ExecutionAgent>(ExecutionAgent,size_t)' being compiled
1>          with
1>          [
1>              ExecutionAgent=thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(667) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::async_launch<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>,0x00>> thrust::system::cuda::detail::bulk_::par<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>>(cudaStream_t,ExecutionAgent,size_t)' being compiled
1>          with
1>          [
1>              ExecutionAgent=thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0200>
1>          ]
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(250): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x09>,0x080>,0x00>::size_type', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(311) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x09>,0x080>,0x00> thrust::system::cuda::detail::bulk_::par<ExecutionAgent>(ExecutionAgent,size_t)' being compiled
1>          with
1>          [
1>              ExecutionAgent=thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x09>,0x080>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(667) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::async_launch<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x09>,0x080>,0x00>> thrust::system::cuda::detail::bulk_::par<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x09>,0x080>>(cudaStream_t,ExecutionAgent,size_t)' being compiled
1>          with
1>          [
1>              ExecutionAgent=thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x09>,0x080>
1>          ]
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(250): warning C4267: 'argument' : conversion from 'size_t' to 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0100>,0x00>::size_type', possible loss of data
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(311) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0100>,0x00> thrust::system::cuda::detail::bulk_::par<ExecutionAgent>(ExecutionAgent,size_t)' being compiled
1>          with
1>          [
1>              ExecutionAgent=thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0100>
1>          ]
1>          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\thrust/system/cuda/detail/bulk/execution_policy.hpp(667) : see reference to function template instantiation 'thrust::system::cuda::detail::bulk_::async_launch<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0100>,0x00>> thrust::system::cuda::detail::bulk_::par<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0100>>(cudaStream_t,ExecutionAgent,size_t)' being compiled
1>          with
1>          [
1>              ExecutionAgent=thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<0x03>,0x0100>
1>          ]
1>  LINK : /LTCG specified but no code generation required; remove /LTCG from the link command line to improve linker performance
1>  t17.vcxproj -> c:\Users\bob-tosh\documents\visual studio 2013\Projects\t17\x64\Release\t17.exe
1>  copy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\cudart*.dll" "c:\Users\bob-tosh\documents\visual studio 2013\Projects\t17\x64\Release\"
1>  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\cudart32_75.dll
1>  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\cudart64_75.dll
1>          2 file(s) copied.
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========

I opened a new CUDA 7.5 runtime project, dropped your code into the existing kernel.cu, replacing the code that was there, then changed the project settings:

  1. change from debug/win32 to release/x64
  2. In project properties, turn on “generate relocatable device code” and also change from project default of compute_20,sm_20 to compute_52,sm_52 (and turn off inherit from project default)

Then I did rebuild project.

Thanks for your time.

Following the same steps, I get exactly the same output if I disable the generate relocatable device code.

If enabled, I get this little addition at the end:

1>CudaLink:
1>  
1>  D:\Projects\CudaTest3>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\nvcc.exe" -dlink -o x64\Release\CudaTest3.device-link.obj -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD " -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64" cudart.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  -gencode=arch=compute_52,code=sm_52  --machine 64 x64\Release\kernel.cu.obj 
1>  cudart.lib
1>  kernel32.lib
1>  user32.lib
1>  gdi32.lib
1>  winspool.lib
1>  comdlg32.lib
1>  advapi32.lib
1>  shell32.lib
1>  ole32.lib
1>  oleaut32.lib
1>  uuid.lib
1>  odbc32.lib
1>  odbccp32.lib
1>  kernel.cu.obj
1>CudaTest3.device-link.obj : error LNK2001: unresolved external symbol __fatbinwrap_66_tmpxft_00000888_00000000_17_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37
1>D:\Projects\CudaTest3\x64\Release\CudaTest3.exe : fatal error LNK1120: 1 unresolved externals
1>
1>Build FAILED.
1>
1>Time Elapsed 00:00:28.96
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

The only thing in common between both machines I tested on, is that they both have VS2013 and VS2015 installed on windows 10. I used only VS2013 to run these examples though. Any further suggestions I could investigate?

Sorry, my previous output was indeed for the case where I had not turned on “generate relocatable device code”. After I turned that on, I was able to get the same error you are seeing.

Devices of compute capability 3.5 or higher support dynamic parallelism. It’s possible for a code to have different code paths depending on what architecture is being compiled for. The thrust code you are compiling may be an example of that. If this code path uses any of the device runtime API:

[url]Programming Guide :: CUDA Toolkit Documentation

then its necessary to link against cudadevrt.lib (on windows):

[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compiling-and-linking[/url]

In any event, according to my testing, the fix to remove the link error is to add cudadevrt.lib as an additional dependency to the Linker input.

Specifically, Project…Properties…Linker (not CUDA linker)…Input…Additional Dependencies.

Here you should see a list of libraries already being linked against, such as cudart.lib, kernel32.lib, etc.

Add cudadevrt.lib to the beginning of this list.

I think that will make this link error go away.

Thanks, that is indeed the solution. It compiles just fine now!