Intrinsic __popcll not working in OptiX 7.4


It appears that the __popcll intrinsic is not working in Optix 7.4. It’s easy enough to reproduce, in the optixHello sample change the raygen program to this:

extern "C"
__global__ void __raygen__draw_solid_color()
    if (__popcll(params.image_width) != 1) // image_width is 512, so popc should be 1.

    uint3 launch_index = optixGetLaunchIndex();
    RayGenData* rtData = (RayGenData*)optixGetSbtDataPointer();
    params.image[launch_index.y * params.image_width + launch_index.x] =
        make_color( make_float3( rtData->r, rtData->g, rtData->b ) );

The result image is black except for the bottom row, which is a bit odd, suggests it is working on some optix threads but not others.

It’s a bit tricky to avoid, as the optimizer will replace code with the popc intrinsic when it can, e.g. this function gets replace by a single popc instruction in the PTX and so also fails:

inline __device__ int my_popcll(unsigned long long int x)
    int count = 0;
    while (x)
        x &= x - 1;
    return count;

The release notes mentioned to watch for possible warning messages about unimplemented intrinsics, although there does not appear to be any warnings in my log.

Thanks for the report @juggler! Is this when using the most recent driver, numbered 510 or higher?


Yes, this is with 511.23, Windows 10. Is this something that could hopefully be fixed with a new driver?

This will definitely be fixed in a future driver update. From my quick search so far, I don’t think it’s fixed in any existing driver builds already, so I’m not exactly sure when, but will be asap and I’ll try to post once I know exactly. Sorry about the bug!

Are you about to work around this at all, or blocked until fixed? It’s a bummer the optimizer is undoing your workaround!


Well a manual loop over all 64 bits to count them doesn’t get optimized, so that works at least but it’s not very fast. Will keep looking for a better way to trick the optimizer in the meantime. Thanks for looking into it!