OptiX 7 and MSVS 2017 - Hello World on Windows 10

Right, these examples and more are listed in both sticky posts I linked to in my first answer.

For performance reasons, I would be wary of the gdt/math/vec.h used in the SIGGRAPH course examples, because I do not expect them to result in vectorized loads and stores for 2- and 4-component vectors as will happen with the CUDA built-in vector types.
Instead I would recommend using the CUDA built-in vector types or at least base derived types on those to benefit from the faster vectorized load and store instructions.
Really, fast memory accesses are important.

Last question, if that is okay, can anyone tell me if there is a programmatic way to determine and/or set the total number of rays being cast ?

There is no automatic way to count or set that. You would need to implement that yourself.

“Setting” the number of rays cast is totally depending on your implementation. You completely control when an optixTrace() call is done or not.
It’s usual to limit the ray depth resp. path length in ray tracers globally. You must know the maximum number of recursive optixTrace() calls up front anyway because you cannot calculate the OptiX pipeline’s stack size otherwise.
I do not recommend using recursive ray tracers esp. not when they require a lot of stack space, because the maximum stack size has a hard limit at 64 kB today.

Counting the number of optixTrace() calls is pretty simple though.
You would just need to increment a counter before each optixTrace() call in your code.

There are different ways how to manage that counter.

I would hold that in the per ray payload and write it out to a buffer at the end of the ray generation program. That would also allow visualizing the number of rays as a heat map to see which parts of the scene used most rays.
Compare that with this method: https://forums.developer.nvidia.com/t/timing-rttrace-via-nvapi/129726

Then you can sum up the total number of rays counted in that buffer after each launch or at after the final frame either on the host or with a separate CUDA kernel and print the result to the console.

If this is in a progressive renderer you could also initialize the counter buffer to zero at the initial sub-frame and add counted rays over multiple sub-frames.

If this is meant to be used in a benchmark, the issue with that is that the additional counting will slow down the performance and you should be running the exact same launch with and without that counting mechanism to get really accurate results.

If performance wouldn’t matter for the counting you could also combine all results with atomicAdd() calls to a single counter location.