Using Thrust and rtBufferGetDevicePointer

I am using a modified version of Sample6. After my trace (launch()) I want to find the max and min of a buffer I wrote to in my kernel. I want to use Thrust’s sort() function to accomplish this, but I keep on getting a compile error from Visual Studio 2012.

In my header file MeshViewer.h:

Buffer m_buffer;

In my source file in MeshViewer::initContext():

m_buffer = context->createBuffer( RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, RT_FORMAT_FLOAT,
		WIDTH, HEIGHT );
m_context["buffer"]->set( m_buffer );

During MeshViewer::trace():

context->launch();
reduceBuffer();

And in MeshViewer::reduceBuffer():

786	float *device_ptr = rtGetBDP<float*>(m_context, m_context["buffer"]->getBuffer()->get(), 0);
787	thrust::device_ptr<float> thrust_dev_ptr(device_ptr);
788	thrust::sort( thrust_dev_ptr, thrust_dev_ptr+(WIDTH*HEIGHT) );

Do I need to put this code in a .cu file or can I run it from my .cpp?


Compiler error:

c:\program files\nvidia gpu computing toolkit\cuda\v5.0\include\thrust\detail\backend\cuda\sort.inl(400): error C2027: use of undefined type ‘thrust::detail::STATIC_ASSERTION_FAILURE’
2> with
2> [
2> x=false
2> ]
2> c:\program files\nvidia gpu computing toolkit\cuda\v5.0\include\thrust\detail\backend\sort.inl(70) : see reference to function template instantiation ‘void thrust::detail::backend::cuda::stable_sort<RandomAccessIterator,StrictWeakOrdering>(RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering)’ being compiled
2> with
2> [
2> RandomAccessIterator=thrust::device_ptr,
2> StrictWeakOrdering=thrust::less
2> ]
2> c:\program files\nvidia gpu computing toolkit\cuda\v5.0\include\thrust\detail\backend\sort.inl(158) : see reference to function template instantiation ‘void thrust::detail::backend::dispatch::stable_sort<RandomAccessIterator,StrictWeakOrdering>(RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering,thrust::detail::cuda_device_space_tag)’ being compiled
2> with
2> [
2> RandomAccessIterator=thrust::device_ptr,
2> StrictWeakOrdering=thrust::less
2> ]
2> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\thrust/detail/backend/generic/sort.h(41) : see reference to function template instantiation ‘void thrust::detail::backend::stable_sort<RandomAccessIterator,StrictWeakOrdering>(RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering)’ being compiled
2> with
2> [
2> RandomAccessIterator=thrust::device_ptr,
2> StrictWeakOrdering=thrust::less
2> ]
2> c:\program files\nvidia gpu computing toolkit\cuda\v5.0\include\thrust\detail\backend\sort.inl(48) : see reference to function template instantiation ‘void thrust::detail::backend::generic::sort<RandomAccessIterator,StrictWeakOrdering>(RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering)’ being compiled
2> with
2> [
2> RandomAccessIterator=thrust::device_ptr,
2> StrictWeakOrdering=thrust::less
2> ]
2> c:\program files\nvidia gpu computing toolkit\cuda\v5.0\include\thrust\detail\backend\sort.inl(147) : see reference to function template instantiation ‘void thrust::detail::backend::dispatch::sort<RandomAccessIterator,StrictWeakOrdering,thrust::detail::cuda_device_space_tag>(RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering,Backend)’ being compiled
2> with
2> [
2> RandomAccessIterator=thrust::device_ptr,
2> StrictWeakOrdering=thrust::less,
2> Backend=thrust::detail::cuda_device_space_tag
2> ]
2> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\thrust/detail/sort.inl(52) : see reference to function template instantiation ‘void thrust::detail::backend::sort<RandomAccessIterator,StrictWeakOrdering>(RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering)’ being compiled
2> with
2> [
2> RandomAccessIterator=thrust::device_ptr,
2> StrictWeakOrdering=thrust::less
2> ]
2> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\thrust/detail/sort.inl(43) : see reference to function template instantiation ‘void thrust::sort<RandomAccessIterator,thrust::less>(RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering)’ being compiled
2> with
2> [
2> RandomAccessIterator=thrust::device_ptr,
2> T=KeyType,
2> StrictWeakOrdering=thrust::less
2> ]
2> …\sample6\MeshViewer.cpp(788) : see reference to function template instantiation ‘void thrust::sort<thrust::device_ptr>(RandomAccessIterator,RandomAccessIterator)’ being compiled
2> with
2> [
2> T=float,
2> RandomAccessIterator=thrust::device_ptr
2> ]
2>
2>Build FAILED

Hi jeesh,

yes, you have to put the thrust calls in a .cu file, and it has to be compiled by nvcc.

Best,
GM

What? Really? Is this a Windows thing? I recall successfully using thrust to copy to/from device pointers in a class not compiled by nvcc in linux. Should this not have worked?

Thanks, I will try that …

No, this is a Thrust requirement to use the CUDA backend. See: https://github.com/thrust/thrust/wiki/Frequently-Asked-Questions

Best,
GM

Ack! Now I get linker errors when I put the thrust call inside a .cu file. I take my pointer returned by rtBufferGetDevicePointer, wrap it with thrust::device_ptr, and then pass that to a function in a .cu file:

#include "callSort.h"
#include <thrust/device_ptr.h>

...

float *device_ptr = rtGetBDP<float*>(m_context, m_context["buffer"]->getBuffer()->get(), 0);
thrust::device_ptr<float> thrust_dev_ptr(device_ptr);
callsort_kernel( thrust_dev_ptr, WIDTH*HEIGHT );

callSort.h is simply:

#pragma once

#include <thrust/device_ptr.h>

void callsort_kernel( thrust::device_ptr<float>& g_data, unsigned int n );

callsort_kernel.cu is:

/*
 * Call thrust sort
 */

#include <thrust/sort.h>
#include <thrust/device_ptr.h>
#include "callSort.h"

void callsort_kernel( thrust::device_ptr<float>& g_data, unsigned int n ) {
	thrust::sort( g_data, g_data + n );
}

Linker error:
MeshViewer.obj : error LNK2019: unresolved external symbol “void __cdecl callsort_kernel(class thrust::device_ptr &,unsigned int)” (?callsort_kernel@@YAXAAV?$device_ptr@M@thrust@@I@Z) referenced in function “private: void __thiscall MeshViewer::reduceDevice(unsigned int,unsigned int,float &,float &)” (?reduceDevice@MeshViewer@@AAEXIIAAM0@Z)

: fatal error LNK1120: 1 unresolved externals

If it helps, during compilation in Visual Studio 2010 I get in my output window the following:
CustomBuild:
2> Building NVCC ptx file lib/ptx/sample6_generated_callsort_kernel.cu.ptx
2> callsort_kernel.cu
2> callsort_kernel.cu
2> callsort_kernel.cu
2> tmpxft_0000083c_00000000-5_callsort_kernel.cudafe1.gpu
2> tmpxft_0000083c_00000000-10_callsort_kernel.cudafe2.gpu

Hi jeesh,

Please check that you are telling CMake to compile the .cu as an object file in the CMakeLists.txt. This is needed because you need to have an object file to link to. Like this:

# Tag callsort_kernel.cu for OBJ format intead of PTX (which is the default for
# OPTIX_add_test_executable).  Do this before calling OPTIX_add_test_executable.
set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/callsort_kernel.cu
  PROPERTIES CUDA_SOURCE_PROPERTY_FORMAT OBJ
)

It is also possible that the name mangling is different in host and device code. If you are not using namespaces, try declaring the function as extern “C” in callSort.h:

extern "C" void callsort_kernel( thrust::device_ptr<float>& g_data, unsigned int n );

Best,
GM

I am embarassed as we use set_source_files_properties() for the CUDA kernel–I should have noticed. Thank you for your help!