cuda.h & cuda_runtime_api.h interception layers with CuHook

I’m developing an interception/injection layer for cuda enabled closed source applications that will allow multiple applications to run safely on the same GPU concurrently (memory allocation & virtual GPU creation), I’m playing with the cuHook sample in the samples directory however I can’t find any documentation outlining further how to access information being passed by the application other than “counts”.
My goal is to eventually not only intercept a call in real time, but to force cuda to allow my application to act as a layer between the application & the GPU device itself, and replace calls (eg: replace cuDeviceGetAttribute with a synthetic device).

Does anyone know where I can find further documentation on cuHook and any other interception/injection techniques that have worked for them? I noticed that cuHook uses dlsym which I think makes sense since I’m only after host -> gpu calls.