Virtual methods problem

Hi all,

I have the following code:

class BaseLight
{
public:
    __device__ virtual float3 Illuminate(
        const float3       &aReceivingPosition,
        const float2       &aRndTuple,
        float3             &oDirectionToLight,
        float             &oDistance,
        float             &oDirectPdfW,
        float             *oEmissionPdfW,
	float             *oCosAtLight) const { return make_float3(0, 0, 0); };

	__device__ virtual float3 Emit(
        const SceneSphere &aSceneSphere,
        const float2      &aDirRndTuple,
        const float2      &aPosRndTuple,
        float3            &oPosition,
        float3            &oDirection,
        float             &oEmissionPdfW,
        float             *oDirectPdfA,
        float             *oCosThetaLight) const { return make_float3(0, 0, 0); };

    __device__ virtual float3 GetRadiance(
        const float3       &aRayDirection,
        const float3       &aHitPoint,
        float             *oDirectPdfA,
        float             *oEmissionPdfW)  const { return make_float3(0, 0, 0); };

	// Whether the light has a finite extent (area, point) or not (directional, env. map)
    __device__ virtual bool IsFinite() const { return false; };

    // Whether the light has delta function (point, directional) or not (area)
    __device__ virtual bool IsDelta() const { return false; };
};

class BackgroundLight : public BaseLight
{
public:
    __device__ BackgroundLight() : BaseLight()
    {
    }

    __device__ void Setup(
		const SceneSphere &sp,
		const float3 &backEmission)
    {
		...
    }

    __device__ float3 Illuminate(
        const float3       &aReceivingPosition,
        const float2       &aRndTuple,
        float3             &oDirectionToLight,
        float             &oDistance,
        float             &oDirectPdfW,
        float             *oEmissionPdfW,
        float             *oCosAtLight) const
    {
		...
    }

    __device__ float3 Emit(
        const SceneSphere &aSceneSphere,
        const float2      &aDirRndTuple,
        const float2      &aPosRndTuple,
        float3            &oPosition,
        float3            &oDirection,
        float             &oEmissionPdfW,
        float             *oDirectPdfA,
        float             *oCosThetaLight) const
    {
		...
    }

    __device__ float3 GetRadiance(
        const float3       &aRayDirection,
        const float3       &aHitPoint,
        float             *oDirectPdfA,
        float             *oEmissionPdfW) const
    {
		...
    }

    // Whether the light has a finite extent (area, point) or not (directional, env. map)
    __device__ bool IsFinite() const { return false; }

    // Whether the light has delta function (point, directional) or not (area)
    __device__ bool IsDelta() const { return false; }

public:
    float3 mBackgroundColor;
    float mInvSceneRadiusSqr;
};

class AreaLight : public BaseLight
{
public:
    __device__ AreaLight() : BaseLight()
    {
    }

    __device__ void Setup(
        const float3 &aP0,
        const float3 &aP1,
        const float3 &aP2,
	const float3 &it)
    {
		...
    }

    __device__ float3 Illuminate(
        const float3       &aReceivingPosition,
        const float2       &aRndTuple,
        float3             &oDirectionToLight,
        float             &oDistance,
        float             &oDirectPdfW,
        float             *oEmissionPdfW,
        float             *oCosAtLight) const
    {
		...
    }

    __device__ float3 Emit(
        const SceneSphere &aSceneSphere,
        const float2      &aDirRndTuple,
        const float2      &aPosRndTuple,
        float3            &oPosition,
        float3            &oDirection,
        float             &oEmissionPdfW,
        float             *oDirectPdfA,
        float             *oCosThetaLight) const
    {
		...
    }

    __device__ float3 GetRadiance(
        const float3       &aRayDirection,
        const float3       &aHitPoint,
        float             *oDirectPdfA,
        float             *oEmissionPdfW) const
    {
		...
    }

    // Whether the light has a finite extent (area, point) or not (directional, env. map)
    __device__ bool IsFinite() const { return true; }

    // Whether the light has delta function (point, directional) or not (area)
    __device__ bool IsDelta() const { return false; }

public:
    float3 p0, e1, e2;
    float3 mIntensity;
    float mInvArea;
};

Basically I have an array of lights passed from host (called ‘lightBuffer’). Every element of that array can be a background light or an area light. They have different data, but same interface (Emit, GetRadiance and Illuminate).
I use the previous code as follow:

BackgroundLight bl;
TriLight tl;
BaseLight *light;
if (lightBuffer[lightIx].isBackground)
{
    bl.Setup(mSceneSphere, lightBuffer[lightIx].emission);
    light = &bl;
}
else
{
    tl.Setup(lightBuffer[lightIx].v[0], lightBuffer[lightIx].v[1], lightBuffer[lightIx].v[2], lightBuffer[lightIx].emission);
    light = &tl;
}

float emissionPdfW, directPdfW, cosLight;
oLightState.mThroughput = light->Emit(mSceneSphere, rndDirSamples, rndPosSamples,
		oLightState.mOrigin, oLightState.mDirection,
		emissionPdfW, &directPdfW, &cosLight);

I obtain the following error at runtime (no error at compile time):

OptiX Error: Parse error (Details: Function "_rtProgramCreateFromPTXFile" caught
exception: ./debug/pinhole_camera.cu.ptx: error: Cannot find function "%r403" in PTX [4850039])

If I remove all the virtual keywords from BaseLight, it works (but the methods return always zero).

What I wrong?

Thanks in advance,

  • AGPX

(Cuda 5.0.35.3, Optix 3.0.1).

P.S.: I use compute_20, sm_20.

By the way, there’s any better (read: faster) way to handle a situation like that? (notice that I have to pick a light randomly from the light arrays and it can be a background or an area light).

Virtual functions would be calls to function pointers in CUDA which is not supported in OptiX.

Sounds as if removing the virtual keyword always uses the base class implementation which returns black.

What you’re looking for can be done with callable programs.
Please have a look at the chapter about callable programs in the OptiX programming guide.

I’m doing almost the same in a light sampling function which is implemented as a callable program at context scope to be shared by all materials.
I have a light definition buffer just like you, with environment, area lights, etc., pick one light, importance sample a location on it, return the sample point, its probability, distance, and emission in a LightSample structure and then do the lighting calculation inside the material’s closest hit implementation.

Currently the light type would still need to be checked to determine which calculation to use for the sampling. The single if-statement in your code is cheap. I use a switch-case because I have more light types.

Small coding tip: When implementing functions in OptiX I always use a define RT_FUNCTION which helps a lot to distinguish functions from RT_PROGRAM and RT_CALLABLE_PROGRAM types. Looks like this:

#ifndef RT_FUNCTION
#define RT_FUNCTION __forceinline__ __device__
#endif

RT_FUNCTION float cube(const float x)
{
  return x * x * x;
}

Mind that the forceinline sometimes helps to get things working at all, when CUDA wouldn’t have inlined a device function but kept as function call which OptiX wouldn’t handle! That happens when functions get bigger or use more parameters.

Thanks for the explanation. I have implemented the callable programs successfully. Actually, however, when I use it the performance is reduced from 16% to 30% on average. Looks like that the call overhead is quite costly. Maybe my GPU is too old (nVidia Geforce 555M, compute model 2.0). Can I expect better performance of callable programs in newer video card?

The callable program overhead is known to be on the software side, which means it’s going to be addressed in a future OptiX version.
Until then, function inlining would give you the higher performance. For that I’m simply using templates a lot to build up my materials.
Though callable programs have the benefit of shared code (smaller PTX kernel) and they can be exchanged dynamically (inducing a recompile operation in OptiX though). I’m using a callable program for the light sample routine anyway, because that is called in all materials’ closest hit implementations. I used them first for procedural texture generation.

You’re using an entry level Fermi architecture mobile consumer GPU. You can definitely expect a lot higher base ray tracing performance by using recent Kepler desktop boards. Maxwell isn’t supported by OptiX, yet.
Means any desktop GTX 760 and up (GK104 and GK110 GPUs) should be night and day compared to your GT 555M.
You can never have enough video memory and you would have paging support in OptiX with the GTX level boards.
For enthusiast level :-) private ray tracing use I would look at GeForce GTX 780 and up (because of the GK110 GPU).
Professionally I’m using the ultra high end Quadro and Tesla boards with Kepler GPU.

Actually I have implemented a bidirectional pathtracer, only for fun.
The slowness of my GPU is not very encouraging ;(, but before or after (when it become a bit more affordable for my pockets), I’ll buy nothing less than a GTX 780!