Can optix 3.9.1 version work with cuda 9 or 10

I assume you’re using OptiX 6.5.0 now. This would be a different forum thread then, unrelated to OptiX 3.9.1 which doesn’t support this at all.

Generally, if possible you should avoid calling trace inside callable programs for performance reasons.

How many of the these InternalTrace() callable programs do you have?
Do you need to switch between them dynamically inside a single launch?

This is effectively the same explanation as in https://forums.developer.nvidia.com/t/unknown-value-when-trying-to-figure-out-pointer-space/74158/4
(That issue has been solved by moving to the OptiX SDK 7 which doesn’t have this issue due to the explicit continuation callables API in there.)

This OptiX Programming Guide chapter explains how to call rtTrace from callable programs including source code examples for the individual cases depending on how you declared the program ID variable (individually or in a buffer of callable program IDs)

Listing 4.43 in there explains exactly which two use cases of callable programs calling rtTrace will be handled by OptiX automatically and which other require additional instrumentation by the developer to indicate manually which callable programs are calling trace, with the functions rtMarkedCallableProgramId on device side and rtProgramCallsiteSetPotentialCallees on host side, so the OptiX can handle the additional resource management in a callable program hierarchy.

Your two lines will obviously not work together when using the same name.
It’s either one or the other, and the host code would need to be different when setting the individual callable program IDs inside a variable or in a buffer of callable program IDs.

I also wouldn’t name either the same as any of the callable programs assigned to those IDs.

Then both your failing declarations use the incorrect function signature, not matching the actual callable program, and the buffer has a zero dimension!

If your callable program function signature is
void InternalTrace(const float3& vOrigin, const float3& vDir, Scene_PolishPRD& prdPolish, Scene_InternalPRD& prdInternal)
then the declaration for a callable program ID variable holding that would be
rtDeclareVariable(rtCallableProgramId<void(const float3&, const float3&, Scene_PolishPRD&, Scene_InternalPRD&)>, myInternalTraceVariable);
or as a 1-dimensional buffer of callable program IDs (you had a zero for the dimension! One is also the default template argument):
rtBuffer<rtCallableProgramId<void(const float3&, const float3&, Scene_PolishPRD&, Scene_InternalPRD&)>, 1> myInternalTraceBufferOfCallableProgramIDs;

This is explained inside the programming guide chapter about callable programs.

My old OptiX Advanced Examples use four buffers with callable program IDs for lens shaders, light samplers, BSDF samplers and BSDF evaluation.
I do not call rtTrace from callable programs.
For example the lens shader in this buffer of callable program IDs and its call.