Recreating a 'scene capture' component - access camera buffer?

I am trying to recreate how the scene capture component works in Unreal Engine.

What I want to do is grab the buffer / view of a certain camera, and access it in code so I can create a material from it.

I have this working well by setting the camera to a view port and grabbing the viewport view with:

viewport = vp_utils.get_viewport_from_window_name(self.camName)

    if viewport:
        print(f"Found viewport: {self.camName}")
        capture_viewport_to_buffer(viewport, self.on_viewport_captured)
        asyncio.ensure_future(omni.kit.app.get_app().next_update_async())
        capture_viewport_to_buffer(viewport, self.on_viewport_captured)

    else:
        print(f"Viewport not found.")

def on_viewport_captured(self, buffer, buffer_size, width, height, format):
“”“Handles viewport capture and updates UI with image data.”“”
size = (width, height)
try:
ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.POINTER(ctypes.c_byte * buffer_size)
content = ctypes.pythonapi.PyCapsule_GetPointer(buffer, None)
img = Image.frombytes(“RGBA”, size, content.contents)
np_data = np.asarray(img).data
# self.providers[0].set_data_array(np_data, img.size)
self._dynamic_texture.set_data_array(np_data, img.size)
print (‘capture complete’)
except Exception as e:
carb.log_error(f"Capture failed: {e}")

How can I do this WITHOUT creating a viewport, and just grab the camera view directly?

The other thing I am stuck on is hiding specific objects from the camera (in UE5 ‘hidden from scene capture’)

I can hide from all cameras, but seemingly not a specific one. How can I do this?

Thanks!

hey @ben115

Great question! It’s a bit complicated as UE and OV are not 1:1 and OV relies on the viewport to generate a render output.
So, to answer your first question more simply, there is no way you can skip creating the viewport.
The only thing you are able to do is work ‘offscreen’ - without a visible viewport - and render a camera to an image path, which it looks like you are already doing by utilizing omni.kit.viewport.utility
For your second question, you can change the visibility of a specific prim in your python script with the following code

from pxr import Sdf, Usd
import omni.usd

#get open stage
stage = omni.usd.get_context().get_stage()

#get prim, prim attr, set invisible
prim = stage.GetPrimAtPath("Path_to_Prim")
prim_visibility = prim.GetAttribute("visibility")
prim_visibility.Set("invisible")

#or set back to visible
prim_visibility.Set("inherited")

Let me know if this works for you :)

Thanks for getting back to me.

for the first question, is it possible in c++ with low level code?

I came across this snippet:

PyObject* BufferAccessor::getCameraBuffer(const char* camera_path, int width, int height)
{
    carb::log_info("Getting camera buffer for camera: %s (%dx%d)", camera_path, width, height);
    
    try {
        // Get the USD stage
        auto stage = pxr::UsdStage::GetCurrentStage();
        if (!stage) {
            carb::log_error("No active USD stage");
            Py_RETURN_NONE;
        }
        
        // Get the camera prim
        auto cameraPrim = stage->GetPrimAtPath(pxr::SdfPath(camera_path));
        if (!cameraPrim) {
            carb::log_error("Camera prim not found: %s", camera_path);
            Py_RETURN_NONE;
        }
        
        // Create USD camera
        pxr::UsdGeomCamera camera(cameraPrim);
        if (!camera) {
            carb::log_error("Invalid camera prim: %s", camera_path);
            Py_RETURN_NONE;
        }
        
        // Get camera parameters (transform, projection, etc.)
        pxr::GfCamera gfCamera;
        camera.GetCamera(&gfCamera);
        
        // Get the Hydra engine
        auto& hydraEngine = omni::hydra::getHydraInterface();
        
        // Create a render target
        omni::gpu::RenderTargetDesc rtDesc;
        rtDesc.width = width;
        rtDesc.height = height;
        rtDesc.colorFormat = omni::gpu::Format::RGBA8_UNORM;
        rtDesc.depthStencilFormat = omni::gpu::Format::D32_FLOAT;
        
        // Get GPU interface
        auto& renderInterface = omni::gpu::getRenderInterface();
        auto renderTarget = renderInterface.createRenderTarget(rtDesc);
        
        // Set up camera view and projection matrices
        // Note: In a real implementation, these would be extracted from the camera
        pxr::GfMatrix4d viewMatrix = gfCamera.GetTransform().GetInverse();
        pxr::GfMatrix4d projMatrix = gfCamera.GetProjectionMatrix();
        
        // Render scene from camera view to our render target
        // The exact API calls here will depend on the specific Omniverse version
        hydraEngine.RenderToTarget(renderTarget, viewMatrix, projMatrix);
        
        // Get the rendered image data
        size_t bufferSize = width * height * 4; // RGBA, 1 byte per channel
        std::vector<uint8_t> pixelData(bufferSize);
        
        // Read pixels from GPU
        renderInterface.readPixels(renderTarget->getColorAttachment(0), 
                                  0, 0, width, height, 
                                  omni::gpu::Format::RGBA8_UNORM, 
                                  pixelData.data());
        
        // Clean up the render target
        renderInterface.destroyRenderTarget(renderTarget);
        
        // Create numpy array from the pixel data
        // Using pybind11's numpy array creation which is simpler than raw PyArray API
        auto dtype = py::dtype::of<uint8_t>();
        std::vector<size_t> shape = {static_cast<size_t>(height), static_cast<size_t>(width), 4};
        std::vector<size_t> strides = {static_cast<size_t>(width * 4), 4, 1};
        
        // Create numpy array object - this will copy the data
        auto array = py::array(dtype, shape, strides, pixelData.data());
        
        // Return the numpy array (no need to manually increment reference count with pybind11)
        return array.release().ptr();
        
    } catch (const std::exception& e) {
        carb::log_error("Exception getting camera buffer: %s", e.what());
        PyErr_SetString(PyExc_RuntimeError, e.what());
        return nullptr;
    }
}

but it relies on these headers:

#include <omni/gpu/RenderTarget.h>
#include <omni/hydra/HydraEngine.h>
#include <omni/gpu/TextureDesc.h>

which i cant find.

Is this valid code?

Thanks!

Also, back in python, I have found this:

which works great to get the camera buffer as a render_product. Now, however I am stuck. Can I grab the pixels somehow after running that code? Or do I need to get this functionality into c++?

can i run the results of:

ldr_color_res = ldr_texture.get(“rp_resource”)

into something like:

size = (width, height)
                try:
                    ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.POINTER(ctypes.c_byte * (width*height))
                    content = ctypes.pythonapi.PyCapsule_GetPointer(ldr_texture, None)
                    img = Image.frombytes("RGBA", size, content.contents)
                    np_data = np.asarray(img).data
                    print(f"Successfully obtained pixel data with shape: {np_data.shape}")
                    # self.providers[0].set_data_array(np_data, img.size)
                   # self._dynamic_texture.set_data_array(np_data, img.size)
                # print ('capture complete')
                except Exception as e:
                    carb.log_error(f"Capture failed: {e}")

Let me grab a rendering engineer and see if they can answer your question - thanks for your patience! :)

@ben115 may I ask what you are trying to achieve overall so we can better understand your approach here. You are trying to capture a camera view and save it as a file? And what is that captured file for? And why are you looking to do this, without capturing from a viewport?

@ben115
As Richard suggested above, knowing your full usecase would be helpful but, to add - a few of the RTX Engineers agreed that sticking with omni.kit.viewport.utility would be more appropriate and that ldr_color_res = ldr_texture.get("rp_resource") is not a good idea.

Here is the code again for reference:

from omni.kit.viewport.utility import get_active_viewport, capture_viewport_to_file
capture_viewport_to_file(get_active_viewport(), file_path="/path/to/file.png")

from omni.kit.viewport.utility import get_active_viewport, capture_viewport_to_buffer
capture_viewport_to_file(get_active_viewport(), capture_callback)

HI Richard, thanks for jumping in. I am building a complicated application around kit, the first part of this is that I need to access the camera buffer from multiple cameras, adjust the pixels and apply them as a dynamic material (1 per camera) to objects in the scene. I don’t want to have to have multiple viewports open just to do this, so I need to grab the image data directly from the camera.

Right now I am using the example linked above in python:

from omni.kit.hydra_texture import acquire_hydra_texture_factory_interface
import omni

texture_factory = acquire_hydra_texture_factory_interface()
# Make sure to run any initialization/startup that must occur
texture_factory.startup()

engine_name = "rtx"
usd_context_name = ""
usd_context = omni.usd.get_context(usd_context_name)
if not usd_context:
    raise RuntimeError(f"UsdContext named '{usd_context_name}' does not exist")
if engine_name not in usd_context.get_attached_hydra_engine_names():
    omni.usd.add_hydra_engine(engine_name, usd_context)

hydra_texture = texture_factory.create_hydra_texture(
    name="your_unique_name",
    width=1280,
    height=720,
    usd_context_name=usd_context_name,
    usd_camera_path="/OmniverseKit_Persp",
    hydra_engine_name=engine_name)


# Import carb for logging and type-hinting support with the callback
import carb

def renderer_completed(event: carb.events.IEvent):
    if event.type != omni.kit.hydra_texture.EVENT_TYPE_DRAWABLE_CHANGED:
        carb.log_error("Wrong event captured for EVENT_TYPE_DRAWABLE_CHANGED!")
        return

    # Get a handle to the result
    result_handle = event.payload['result_handle']
    # And pass that to the HydraTexture instance to get the AOV's that are available
    aov_info = hydra_texture.get_aov_info(result_handle)
    print(f"Available AOVs: {aov_info}")

    # Get an object for a specific AOV and include the GPU texture in the info
    ldr_info_array = hydra_texture.get_aov_info(result_handle, 'LdrColor', include_texture=True)
    print(f"LdrColor AOVs are: {ldr_info_array}")

    ldr_texture = ldr_info_array[0].get("texture", None)
    assert ldr_texture is not None

    ldr_color_res = ldr_texture.get("rp_resource")
    ldr_color_gpu_tex = ldr_texture.get("rp_resource")
    print(f"LdrColor[0]: {ldr_color_res}, {ldr_color_gpu_tex}")

    # YOU CANNOT USE THE RESOURCE OUTSIDE OF THIS FUNCTION/SCOPE
    # ldr_color_gpu_tex must be consumed now or passed to another object
    # that will use / add-ref the underlying GPU-texture.

event_sub = hydra_texture.get_event_stream().create_subscription_to_push_by_type(
    omni.kit.hydra_texture.EVENT_TYPE_DRAWABLE_CHANGED,
    renderer_completed,
    name="Your unique event name for profiling/debug purposes",
)

# When event_sub object is destroyed (from going out of scope or being re-assigned)
# then the callback will stop being triggered.
# event_sub = None

I am just missing the next step, how can I go from this ldr_color_gpu_tex = ldr_texture.get("rp_resource") to a numpy array for the camera view? I believe this is how the ViewportWidget works, so it seems possible.

I am happy to do this in Python or c++, whichever exposes what I need. Please let me know if this is teh best approach, or if there is a better way in low level c++.

Thanks again.

Hey Ashley, i have added more info below to Richard. I cannot use the viewport method here, so I do need to find a solution to accessing the camera buffer. Can you ask the engineers if there is a c++ approach that does what I want? Why is ldr_color_res = ldr_texture.get("rp_resource") not a good idea? Thanks!

“apply them as a dynamic material” - Yes, why? This is the part I am trying to understand. The only use case I can think of for this, is you are trying to create real-time camera feedback loop in the rtx viewport. For example, if you have an array of security monitors, inside your rtx scene, and you wanted those security monitors to display other cameras in the scene, all in real-time. So you are viewing a camera, through a camera.

Whilst this may be possible, it is incredibly risky for performance and stability. You are creating a serious feedback loop. If you are just grabbing one frame and not full realtime streaming, it is a little better, but still.

Just for a moment, let’s talk workflow and not code. What is it you are trying to do? A real-time usd scene, capable of displaying other real-time cameras? This is the “mirror-within-a-mirror” effect. An infinite loop.

Hi Richard, the broad strokes of the application are this: (can this discussion be made ‘private’, between myself and nvidia employees? Than I can add more details)

A application that allows users to set up scenarios using multiple projectors, that will project images from a usd scene onto complex geometry.
This involves setting ‘projector’ transforms for very precise positioning, allowing multiple projected images to fit together and cover the geometry without seams or overlap.

In OV, I am using cameras as ‘projectors’ to calculate these positions.
The ‘live’ part of this is launching OV on different machines, and sending thew view from each ‘projector’ camera to a real world projector, which projects onto real world geometry.

This is something I have achieved in other packages already, but I want to move to ov for the usd aspect and vastly improved image quality.

The steps I need are:

  1. place the camera precisely via code to cover the section of the geometry that is required, and capture the scene from that exact vewpoint.
  2. Grab the camera buffer directly.
  3. edit the camera buffer to allow for precise projection coverage (using an implementation of MPCDI to calculate the projection math )
  4. Apply the edited image as a dynamic texture onto the geometry of the scene.

These steps I already have working in a rough proof of concept, using the viewport capture api. However I want to swap to accessing the buffer, as I mention previously. To avoid the mirror-in-mirror effect you mention, I am using the camera clipping planes to ensure the projection geometry is ‘hidden’ from the camera view

The last part of this is to send this edited camera buffer to a viewport, for the ‘live’ mode.

I am happy to get as deep into c++ as I need to here, I understand I may be stretching what OV is built to do a little!

Thank you for your help!

Ok, so this is real world projection mapping. Yes I have set those up myself. And yes, I have to say that you are way outside the Omniverse scope, but we can try to help :-)

So now I know what you are trying to do, let me give you a better way of doing it. Don’t use cameras as projectors. Use projector lights as projectors. We have that ability built in already. You still have your cameras to view the scene geometry position, but you want to project the texture onto the geometry using the projector lights.

Secondly, how are you going to achieve this “Apply the edited image as a dynamic texture onto the geometry of the scene” ?

right now I am doing this:


            self._dynamic_texture = omni.ui.DynamicTextureProvider(self.camName + "_tex")


            material_path = f"{plane_path}/Material"
            material = UsdShade.Material.Define(stage, material_path)
            shader = UsdShade.Shader.Define(stage, f"{material_path}/Shader")

            # Set up OmniPBR shader
            shader.SetSourceAsset("OmniPBR.mdl", "mdl")
            shader.SetSourceAssetSubIdentifier("OmniPBR", "mdl")
            shader.CreateIdAttr("OmniPBR")

            # Set the dynamic texture
            texture_name = self.camName + "_tex"
            shader.CreateInput("diffuse_texture", Sdf.ValueTypeNames.Asset).Set(f"dynamic://{texture_name}")

            # Connect shader to material
            material.CreateSurfaceOutput().ConnectToSource(shader.ConnectableAPI(), "surface")

            # Bind the material to the mesh
            # Make sure the mesh has the MaterialBindingAPI
            plane_prim.ApplyAPI(UsdShade.MaterialBindingAPI)
            UsdShade.MaterialBindingAPI(plane_prim).Bind(material)
            print(f"Material with dynamic texture bound to: {plane_path}")

and then viewport capture with:

# Put the viewport capture outside the if statement, matching CreateNodeCamera
        viewport = vp_utils.get_viewport_from_window_name(self.camName)
        if viewport:
            print(f"Found viewport: {self.camName}")
            capture_viewport_to_buffer(viewport, self.on_viewport_captured)
            asyncio.ensure_future(omni.kit.app.get_app().next_update_async())
            capture_viewport_to_buffer(viewport, self.on_viewport_captured)
        else:
            print(f"Viewport not found.")

and:

    def on_viewport_captured(self, buffer, buffer_size, width, height, format):
        """Handles viewport capture and updates UI with image data."""
        size = (width, height)
        try:
            ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.POINTER(ctypes.c_byte * buffer_size)
            content = ctypes.pythonapi.PyCapsule_GetPointer(buffer, None)
            img = Image.frombytes("RGBA", size, content.contents)
            np_data = np.asarray(img).data
           # self.providers[0].set_data_array(np_data, img.size)
            self._dynamic_texture.set_data_array(np_data, img.size)
           # print ('capture complete')
        except Exception as e:
            carb.log_error(f"Capture failed: {e}")

this is the part I want to move to direct camera-buffer access!

Yes looking at your history, I can see that you have been asking about direct G-Buffer access for a while. I remember asking the engineers about this and they were not sure it is easy to do. I can ask again.

Is this projection mapping going to be running a “live” animated texture that is changing in real-time, or just a static one? Is that why capturing from a viewport is too slow?

I would have 5 computers, running 5 copies of the usd scene, in Omniverse, each displaying the rtx viewport in real-time, fullscreen, and run your MPCDI calculations on the direct rtx viewport output. A lot of projection software does that. That way you don’t need to access the G-buffer.

Anyway, I will ask the engineers again about G-Buffer access, but in the meantime, I think you need to plan on using the rtx viewport directly. I don’t think you lose anything by doing so.

The output will be live, animated, dynamic and updating. Each camera needs to capture the same usd scene, as you describe, in real time.

ah I see, so I could intercept the viewport output, edit it, and ‘replace’ it with the edited version?

The other issue with viewports is they need to be open, as I understand it. So if I capture 5, or even 10 cameras, it will open 10 viewports. As a compromise here, is it possible to render viewports ‘offscreen’? Or create them, stream data from them to my textures, but hide the viewport itself?

If there is a c++, super low latency way to grab the camera view without a viewport, I would definitely rather dig into that, so thanks for checking with engineers.

Exactly. You are deep in impressive code, but there is easy “keying” software that does all of this for you in real-time at the push of a button. You just run the raw rtx viewports out to a projector and use off the shelf MPCDI to “dial them in”. I have done this in other software before without the code. Just using good real-time software on the output. Once you have your projectors dialed in, you can run any dynamic lighting and texturing out of Omniverse, you want.

Ideally, stick to ONE viewport per Composer, per computer. That way each projector gets its direct rtx feed from one rock solid real-time output.

I need to do this all-in-one software, so will need to dig in deeper. While we wait for engineer response:

  1. Is it possible to render viewports ‘offscreen’?
  2. If I intercept the viewport with veiewport capture, and edit it, how would i then ‘replace’ the image back to the viewport?
  3. Is there a resolution setting on the projector lights? I need to keep very high image quality.

Thanks again for your help!

Well then, let me talk to our rtx dev team first. But as far as I know, we do not render “offscreen”. We are built from the ground up to be fully GPU accelerated and real-time.

for clarity, when i say offscreen, I mean fully real time, but from a hidden viewport.

I also sent you a DM with a bit more context. Thanks!