How to configure launch grid configuration for computational task?

I am trying to use Optix in order to achieve some intense calculation tasks. Based on the simple optixTriangle sample, I would like to shoot 10000 rays (to a predefined direction) from 10000 different cam.eye points(origins). Can I do this in parallel using Optix?

Yes.

In addition, I am a bit confused about the differences between optix and cuda grid configuration.

Answered here: https://forums.developer.nvidia.com/t/optixtriangle-how-to-shoot-rays-to-specific-set-of-co-ordinates/295901/10
You only need to care about the OptiX launch dimension and the resulting launch index inside a single ray programming model.

What is the optimum Optix launch configuration for this problem?
My initial thought was to create a grid of 10000 x 10000 (width x height), but I am not sure if this is a valid configuration.

optixLaunch(..., width = 10000, height = 10000, depth = 1);

That is a 2D launch of 10,000 * 10,000 == 100,000,000 elements.
That is below the OptiX launch dimension limit of 2^30 and while these are effectively 100 million rays shot, that should be manageable.

An alternative thought is to create a 100x100 for the directions and then add a depth of 10000 for the different origins.

That’s not necessary and potentially slower.

Could you give some basic instructions of how to design such an optix kernel using the __raygen__rg() and computeRay( uint3 idx, uint3 dim, float3& origin, float3& direction ) functions as shown in the optixTriangle sample?

With the things I said in this post: https://forums.developer.nvidia.com/t/optixtriangle-how-to-shoot-rays-to-specific-set-of-co-ordinates/295901/8 you should be able to figure that out.

If you previously had 10,000 world positions and 1 camera position, you just need to provide a new array of camera positions the same way as you provided that array of rayCoords

Then you increase the launch dimension from (10000, 1, 1) to (10000, 10000, 1) and the respective result buffers also needs to contain 100 million elements.
So if you only need the information which camera ray hit something when shooting through which of your rayCoords, then you could for example allocate a 10,000 x 10,000 array of bytes on the device (in replacement of that uchar4 image buffer used inside the optixTriangle example) and write the hit or miss result into each byte element per launch index.

The indexing of the camera resp. the rayCoords array would simply be the optixGetLaunchIndex .x and .y values which go from 0 to 9999.
So inside the raygen program you would read one element from the camera array and one element from the rayCoord. Each pair defines one ray direction and one result.

uint3 theLaunchDimensions = optixGetLaunchDimensions();
uint3 theLaunchIndex = optixGetLaunchIndex();

float3 origin    = cameraCoords[theLaunchIndex.x];
float3 rayCoord  = rayCoords[theLaunchIndex.y]
float3 direction = normalize(rayCoord - origin);

// Since you wanted to count miss events, use the visibility ray logic here. 
// Default payloadMissed == 0 assumes miss shader is not reached.
// Must implement miss shader which sets this to one. 
// No closest hit or anyhit hit shader required, set ray flags accordingly! (See links below.)
unsigned int payloadMissed = 0; 

optixTrace(..., payloadMissed);

// cameras indices to the right per row (.x), rayCoord indices down per column (.y).
bufferResult[theLaunchDimensions.x * theLaunchIndex.y + theLaunchIndex.x] = (unsigned char) payloadMissed;

That optixTrace there should implement the fastest visibility ray like described inside the links here:
https://forums.developer.nvidia.com/t/optix-payload-value-incorrect/294505/8
with ray flags OPTIX_RAY_FLAG_DISABLE_ANYHIT | OPTIX_RAY_FLAG_DISABLE_CLOSESTHIT | OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT

As result you get a 10000 x 10000 matrix of bytes which are 0 when hit and 1 when missed.

1 Like