The problem is that I don’t know the radius of each beam ahead of time.
At what time inside that algorithm do you know the radius of each beam then?
When it comes to summing the contributions of large numbers of beams to large numbers of point receivers
Could you give some absolute numbers for these two large numbers?
Where do the beams come from?
Is there any influence of the receivers on the beams?
Means do beams simply go through everything and contribute to all receivers on their ways which are inside the beam’s radius or is there any absorption or scattering effects affecting the beam?
Is there any other geometry inside the scene which would need to be intersected?
(How would you do that with beams of different radii?)
Because if not, there wouldn’t be a need to solve this with ray tracing at all.
Calculating the distance of a line to point could simply be done inside a native CUDA kernel.
 You have a number of receivers (points) and a number of rays (lines),
 calculate the minimum distance between each point and each line.
 if the distance is below a beam’s radius, add the beam’s contribution to the receiver,
 repeat until all beams are handled.
If the number of receivers is big enough to saturate the GPU, this would be a super simple CUDA kernel which would just need to be launched as often as there are rays (gather algorithm, no atomics needed).
If one beam at a time is not saturating the kernel, chunk the work into bigger partitions, handle more beams per kernel launch.
It could also be implemented as a scatter algorithm over the beams where contributions are added to receivers with atomics if that saturates the GPU better. Might be slower though.
Now if you really need to use ray tracing for this, to be able to hit something, the BVH traversal algorithm needs to find AABBs which intersect with the ray. That traversal part is fully hardware accelerated on RTX boards when using singlelevel (GAS) or twolevel (IAS → GAS) acceleration structure scene graphs in OptiX.
You cannot change the AABB sizes during traversal. Similar idea rejected here:
https://forums.developer.nvidia.com/t/sphereintersectionwithraydistancedependentradius/60405
The simplest approach would be to implement the receivers as custom sphere primitives. Make them as big as as the biggest radius of all beams to be able to hit the AABB with any beam radius. Inside the custom intersection program, calculate the actual distance from the beam center ray, if it’s below the beam’s radius (needs to be stored inside the ray payload which is accessible inside the intersection program in OptiX 7) then add its contribution to the receiver.
Because this is a scatter algorithm (many beams can hit the same receiver) adding the contribution must happen with atomics.
This approach has the drawback that the receivers’ AABB have the maximum necessary radius and might potentially result in a lot of unnecessary custom intersection program invocations which are also interrupting the hardware BVH traversal.
It’s the simplest approach though and I would recommend implementing this first to see if this is good enough.
I would expect this to be slower than the native CUDA kernel method if that is applicable.
Other idea:
Implement the receivers as builtin triangle geometry.
It needs to be some convex hull containing the required receiver sphere.
For example, use a simple box made of 12 triangles. (Example code: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo10/src/Box.cpp)
You could of course also use a geometry which is more spherical, like an icosahedron, to approximate the sphere better.
Intersecting triangles is again fully hardware accelerated on RTX boards.
Now there would be two different approaches.

Using closest hit and continuation rays (iterative path tracer):
Implement a closest hit program which get the geometry intersection.
You only want one intersection per geometry, which could be done with face culling.
Construct the box geometry to have all front faces on the outside.
Enable face culling for backfaces on all boxes.
When hitting the box, calculate the actual distance of the ray to the receiver center inside the closest hit program and when below the radius, add the contribution to the receiver with atomics.
Return to the ray generation program and if the ray had not missed, shoot a new ray with the same direction and changed origin or t_min values to skip the previously hit box. Repeat until there is no more hit.

Using an anyhit program:
You could also use an anyhit program and do the calculation of the receiver to ray distance there, add the contribution to the receiver, again using atomics, and then ignore the intersection to continue the BVH traversal.
Anyhit program invocations are not in ray direction order but depend on the BVH traversal order.
You also need to specify that geometry can only invoke an anyhit program only once with OPTIX_GEOMETRY_FLAG_REQUIRE_SINGLE_ANYHIT_CALL.
This method assumes no ordering is required and receivers do not affect the beams.
It will end when there are no more intersections (which would reach the miss program which isn’t required here.
Additional idea:
Nest up to 8 of these AABB boxes for each receiver to handle different radii and assign different instance.visibilityMask bits to the instances holding each level.
Use the optixTrace visibilityMask to partition the intersected geometry by the beam’s radius by setting only one of the 8 bits to select the smallest required box level and let the hardware intersect only that,
I have no idea if that is actually faster than a single AABB with max radius. Nested AABB inside the traversal are usually bad.
This is all draft thinking and would need to be tried to see what works reasonably well.