[Help Needed] Issue Deploying Nvidia Omniverse Kit App Streaming on AWS – Load Balancer / Target Group Errors

Hi everyone,

I’m currently working on deploying the NVIDIA Omniverse Kit App Streaming solution on AWS Cloud following the architecture provided in the official docs:

I’ve successfully set up the core infrastructure and am trying to stream a test app (usd-viewer). The core services involved are:

  • RMCP
  • Streaming Session Manager
  • App and Profile Manager
  • Nvidia AWS NLB Manager

❌ Problem Encountered

I’m attempting to create a stream using the Streaming Session Manager’s /stream API, but it’s failing with the following response:

Error code 500
{
  "detail": "Failed to start a new stream due to an unknown error"
}

I then checked the streaming pod logs and found this:

ERROR: Failed request to http://nlb/allocation: 429, {"detail":"Unable to find a Load Balancer with available Target Groups."}
ERROR: Failed to start a new stream: <class 'nv.svc.streaming._csp._APIError'>, Error 429: Failed request to http://nlb/allocation: 429, {"detail":"Unable to find a Load Balancer with available Target Groups."}

On the NLB Manager side, I observed the following errors:

ERROR: Failed to create Target Group (TCP/41001): ValidationError - Target group name '-TCP-41001' cannot begin or end with '-'
ERROR: Failed to create Listener (TCP/41001): cannot access local variable 'targetGroupArn' where it is not associated with a value
... and 10 more similar errors on TCP ports

ERROR: Failed to create Target Group (UDP/41026): ValidationError - Target group name '-UDP-41026' cannot begin or end with '-'
ERROR: Failed to create Listener (UDP/41026): cannot access local variable 'targetGroupArn' where it is not associated with a value
... and 10 more similar errors on UDP ports

Then, after create stream api is called, I see below logs:

ERROR: Unable to reserve a Target Group for allocation: media / ListenerProtocol.udp
ERROR: Unable to allocate sufficient resources.
ERROR: Unable to find a Load Balancer with available Target Groups.

🔍 What I’ve Tried

  • Confirmed all Omniverse services are deployed and discoverable.
  • Verified IAM permissions for the NLB manager pod.

❓ Questions

  • Has anyone run into this Target Group naming issue before? Could this be a bug in the NLB manager’s name generation logic?
  • Are there specific naming conventions or pre-checks I can enforce to avoid invalid target group names (those starting/ending with -)?
  • Is there a way to manually pre-provision the Target Groups and have the NLB manager pick them up?

Any help, pointers, or ideas would be appreciated!

Thanks in advance

Hi and welcome to the forum. Thank you for your question. I will try to find a streaming expert to answer it for you.

Here is what the engineers have to say:

If I were to guess you possibly have an empty “value” for the tag lookup, from the docs:

The LBM searches for a configurable tag and use its value as the NLB’s alias for the streaming session.

Tag Lookup Configuration

The LBM supports dynamic configuration of NLBs at service startup and through the GET:/refresh API endpoint. These settings can be configured via the service’s application.toml and Helm chart values file using the following parameters:

nv.ov.svc.streaming.aws.nlb.resource.lookup.tag.key = “”
nv.ov.svc.streaming.aws.nlb.resource.lookup.tag.value = “”
Any NLBs with a matching tag key/value are configured by the LBM.

This error is also something to look into:

ERROR: Failed to create Listener (TCP/41001): cannot access local variable ‘targetGroupArn’ where it is not associated with a value
… and 10 more similar errors on TCP ports

I suspect both are empty and it’s not finding a load balancer to manage

Hii Richard, thanks for the support. I will do one thing i will share my values file for the kit-appstreaming-aws-nlb helm chart i have actually added the look up key value but i just wanna know if i added under right scope or not. Please once go through the file.
Thankyou so much.
values-aws-nlb.yaml.zip (2.7 KB)

Ok we will take a look. Let me know how it goes.

I have actually tried other annotations as well Richard still gave the same logs. The problem is the target group names while creating them.

Also Trying to start stream without this NLB Manager Pod since its optional by using the create stream API and changed the profile from nlb to default
The API and its payload

curl --location 'http://k8s-omnivers-streamin-xxxxxxxxxxxxxxxxx.elb.amazonaws.com/stream' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--data '{
  "id": "usd-viewer",
  "profile": "default",
  "version": "0.2.0"
}'

The api responded with Status 500

{
    "detail": "Failed to start a new stream due to an unknown error"
}

Correspondingly the logs of the pod are

{"Timestamp": 1747117346449246566, "SeverityNumber": 17, "SeverityText": "ERROR", "InstrumentationScope": "", "Body": "Failed to start a new stream: <class 'KeyError'>, 'service'", "Attributes": {"app.clock": 142559.816913605, "timestamp": "2025-05-13T06:22:26.449246566Z", "correlation_id": "10ff7d5b62ba45shvdhsds6eda35b13873e1"}, "Resource": {"app.namespace": "nv.svc", "app.name": "nv.svc.streaming", "app.instance.id": "0ydysd89dc1-07bf-41d3-a623-99f71hhss5760c", "app.version": "1.9.0"}}

During startup there are some debug logs that can help us troubleshoot further.

nlb:
  serviceConfig:
    logging:
      level: "DEBUG"

In addition to that, we need to get really detailed here. We need:

  • Config for the services you are using
  • Logs for all the services deployed when the failure happens
  • The profile and applications you are using
  • Assuming you are using Flux that deploys a helm chart, you should be able to see the pods and everything else coming up. It would be good to get the information there as well, what pods are starting, what errors you are seeing in the k8s cluster

Hi Richard, Thanks for the response, Yes we are using the flux as given in the Arch Diagram for this deployment.
omni.zip (14.6 KB) containes the values config we are using for every pod and ofcourse the sensitive content is morphed as xxxxxxxx.

So we have the sample app two versions, one for the default streaming and one for the nlb streaming. You will find them in the kit-app-registration folder, and the for the flux to release the helm chart from the ngc-omniverse helm repo since we need the ngc api token, we have those config files in the fluxcd folder.

The startup logs of the aws-nlb-manager pod with below config

nlb:
  serviceConfig:
    logging:
      level: "DEBUG" // And WARN, ERROR

gave same logs -

WARNING:root:Unable to autoconfigure tracing, ensure nv.svc.core is installed with 'tracing' extra.
ERROR:root:Failed to create Target Group (TCP/41001): An error occurred (ValidationError) when calling the CreateTargetGroup operation: Target group name '-TCP-41001' cannot begin or end with '-'
ERROR:root:Failed to create Listener (TCP/41001): cannot access local variable 'targetGroupArn' where it is not associated with a value
ERROR:root:Failed to create Target Group (UDP/41026): An error occurred (ValidationError) when calling the CreateTargetGroup operation: Target group name '-UDP-41026' cannot begin or end with '-'
ERROR:root:Failed to create Listener (UDP/41026): cannot access local variable 'targetGroupArn' where it is not associated with a value
ERROR:root:Operation completed with some errors: ["Failed to create Target Group (TCP/41001): An error occurred (ValidationError) when calling the CreateTargetGroup operation: Target group name '-TCP-41001' cannot begin or end with '-'", "Failed to create Listener (TCP/41001): cannot access local variable 'targetGroupArn' where it is not associated with a value", "Failed to create Target Group (UDP/41026): An error occurred (ValidationError) when calling the CreateTargetGroup operation: Target group name '-UDP-41026' cannot begin or end with '-'", "Failed to create Listener (UDP/41026): cannot access local variable 'targetGroupArn' where it is not associated with a value"]
WARNING:root:Failed to create NLB resources, service will attempt to start anyway...

Now Case 1 (default profile) - When we try to call api POST /stream of the streaming-session-manager pod,

curl --location 'http://k8s-xxxx-xxx-xxx-xxx.ap-south-1.elb.amazonaws.com/stream' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--data '{
  "id": "usd-viewer",
  "profile": "default",
  "version": "0.2.0"
}'

we get response 200 with body.

{ 
  "id": "bcd94dbe-e997-4068-ac94-69e7a2d75d14", 
  "routes": {}, 
  "status": { 
    "condition": "reconciling", 
    "status": false, 
    "message": "" 
  } 
}

Now, here the routes supposed to be with some ports information, but in my case the routes are empty. The stream-session helm release is successfully happening by fluxcd and the pod running within the namespace expcetedly. Since the routes are empty we unable view the stream. So help needed here

Case 2 (nlb profile) - We now call api POST /stream of the streaming-session-manager pod with payload -

curl --location 'http://k8s-xxxx-xxx-xxx-xxx.ap-south-1.elb.amazonaws.com/stream' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--data '{
  "id": "usd-viewer",
  "profile": "nlb",
  "version": "0.2.0"
}'

We get response 500

{
    "detail": "Failed to start a new stream due to an unknown error"
}

The logs of the streaming-session-manager pod -

ERROR: Failed request to http://nlb/allocation: 429, {"detail":"Unable to find a Load Balancer with available Target Groups."}
ERROR: Failed to start a new stream: <class 'nv.svc.streaming._csp._APIError'>, Error 429: Failed request to http://nlb/allocation: 429, {"detail":"Unable to find a Load Balancer with available Target Groups."}

and correspondingly the logs of the aws-nlb-manager pod -

ERROR: Unable to reserve a Target Group for allocation: media / ListenerProtocol.udp
ERROR: Unable to allocate sufficient resources.
ERROR: Unable to find a Load Balancer with available Target Groups.

Note - I have already created an NLB with the TCP 41001 and UDP 41026 listeners with tag matching the lookup config in the values-aws-nlb.yaml

This is where we are now richard. Atleast for the first case we should be able to stream the app by opting out this aws-nlb-manager pod since it is an optional.
And for the 2nd case, the nlb pod supposed to create an NLB and listeners with target groups is failing and so is the API with 500 response.

Hope we get some support on this since there are no other blogs or resources available.
Thankyou so much.