Invalid device function error when using Thrust

tomilovanatoliy · June 3, 2019, 9:54pm

The following code starts to build kd-tree, but on the lines marked by exclimation symbol (on either of the two - try to choose by #if 0/#if 1) execution failed with an error message:

CUDA error 98 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include\thrust/system/cuda/detail/parallel_for.h, 143]: invalid device function

Also message box said “abort() called”.

The code itself:

__forceinline__ __device__
float min(float a, float b, float c)
{
    return fminf(fminf(a, b), c);
}

__forceinline__ __device__
float max(float a, float b, float c)
{
    return fmaxf(fmaxf(a, b), c);
}

using Vertex = float3;
struct Triangle { Vector A, B, C; };
struct AABB { Vertex min, max; };

void KdTreeBuilderPrivate::build(const Triangle * t, size_t triangleCount)
{
    using U = unsigned int;
    //auto p = thrust::cuda::par.on(stream);
    auto p = thrust::device;
    thrust::device_vector< Triangle > triangles{t + 0, t + triangleCount};

    thrust::device_vector< Vertex > aabb{triangleCount + triangleCount};
    {
        auto even = thrust::make_transform_iterator(thrust::counting_iterator< U >(0), [] __device__ (U i) -> U { return i + i; });
        {
            auto minbb = [] __device__ (const Triangle & t) -> Vertex { return {min(t.A.x, t.B.x, t.C.x), min(t.A.y, t.B.y, t.C.y), min(t.A.z, t.B.z, t.C.z)}; };
            auto dest = thrust::make_permutation_iterator(aabb.begin(), even);
            thrust::transform(p, triangles.cbegin(), triangles.cend(), dest, minbb);
        }
        {
            auto maxbb = [] __device__ (const Triangle & t) -> Vertex { return {max(t.A.x, t.B.x, t.C.x), max(t.A.y, t.B.y, t.C.y), max(t.A.z, t.B.z, t.C.z)}; };
            auto dest = thrust::make_permutation_iterator(thrust::next(aabb.begin()), even); // odd
            thrust::transform(p, triangles.cbegin(), triangles.cend(), dest, maxbb);
        }
    }

    thrust::device_vector< U > X{aabb.size()};
    {
        auto halves = [] __device__ (U i) { return i / 2; };
        auto bb = thrust::make_transform_iterator(thrust::make_counting_iterator< U >(0), halves);
#if 1
        thrust::copy_n(p, bb, X.size(), X.begin()); // !
#else
        X.assign(bb, thrust::next(bb, X.size())); // !
#endif
    }

    auto Y = X, Z = Y;

    {
        auto xless = [] __device__ (const Vertex & l, const Vertex & r) -> bool { return l.x < r.x; };
        auto aabbCopy = aabb;
        thrust::stable_sort_by_key(p, aabbCopy.begin(), aabbCopy.end(), X.begin(), xless);

        auto yless = [] __device__ (const Vertex & l, const Vertex & r) -> bool { return l.y < r.y; };
        aabbCopy = aabb;
        thrust::stable_sort_by_key(p, aabbCopy.begin(), aabbCopy.end(), Y.begin(), yless);

        auto zless = [] __device__ (const Vertex & l, const Vertex & r) -> bool { return l.z < r.z; };
        aabbCopy = aabb;
        thrust::stable_sort_by_key(p, aabbCopy.begin(), aabbCopy.end(), Z.begin(), zless);
    }
    // ...
}

My guess is that the code, generated for bb iterator, is somehow broken.

How to fix the error?

Robert_Crovella · June 3, 2019, 11:59pm

If you wish to provide a short, complete example that I can compile, and run, and see the issue, without having to add anything or change anything, I’ll take a look. You should also indicate what compute capability you are compiling for, and what device you are running on.

If not, perhaps someone else will be able to help you.

tomilovanatoliy · June 4, 2019, 6:40am

Here is MCVE for this topic: mcve/thrust1 at master · tomilov/mcve · GitHub
I use Visual Studio 2017 to compile it on Windows 10 x64.
Version of CUDA is 10.1 with corresponding version of the driver. GPU is 2060.
Error is reliably reproducible with this short, complete example. You can compile it, then run and finally see the issue.
If you need something that differs from CMake, let me know. I can make a single bat file specially for your environment. But I need to know the version of Visual Studio (C++ compiler) you have installed.

Robert_Crovella · June 4, 2019, 2:15pm

The problem appears to be in the lambda definition with respect to thrust. It may be a bug in thrust. You may wish to file a bug using the instructions in the sticky post at the top of this forum.

In the meantime, according to my testing you can work around this in several ways:

decorate your lambda with host device instead of just device
replace the lambda with a thrust placeholder expression e.g. _1/2
use an ordinary functor for the operator

Topic		Replies	Views
invalid device function during creation of thrust::device_vector<std::int64_t> GPU-Accelerated Libraries	0	1096	August 30, 2017
invalid device function runtime error occurs only when running on actual device CUDA Programming and Performance	0	2043	December 4, 2009
Unspecified launch failure error when thrust::device is used in transform_reduce CUDA Programming and Performance	3	1648	October 12, 2021
Troubleshooting uncommon error 98 "invalid device function" CUDA Programming and Performance	3	2590	May 16, 2025
invalid device function CUDA Programming and Performance	2	3768	July 8, 2009
Problem getting thrust device functionality to work GPU-Accelerated Libraries	6	976	October 21, 2020
(error 98) due to "invalid device function" for a very simple templated kernel example CUDA Programming and Performance cuda , kernel	3	3556	July 8, 2020
"ilegal memory access" with thrust::generate on device CUDA Programming and Performance	4	911	February 2, 2018
Cuda Dynamic Parallelism trigger thrust error CUDA Programming and Performance cuda	4	886	October 21, 2022
[Thrust] using thrust::unique causes LNK2001: unresolved external symbol __fatbinwrap ... _cuda_devi... GPU-Accelerated Libraries	11	3221	July 7, 2017

Invalid device function error when using Thrust

Related topics