ACE / IGI FAQ

TomNVIDIA · February 12, 2025, 4:33pm

Q: What is NVIDIA In-Game Inferencing?

A: The NVIDIA In-Game Inferencing (NVIGI) SDK streamlines AI model deployment and integration for PC application developers. The SDK supports on-device inferencing for in-process (C++) execution. And it supports all major inference backends, across different hardware accelerators (GPU, NPU, CPU).

Q: What are NVIGI Plugins?

A: NVIGI is architected as a suite of plugins, containing both core inferencing plugins as well as helper plugins, for integration into end-user applications. The “helper” plugins are shared amongst the various inference plugins. Examples of “helper” plugins include network functionalities like OpenAI API, gRPC or D3D12 device/queue/command list management for integration of 3D workloads and AI workloads. Core AI inferencing plugins implement many different models using multiple runtimes, but all of them share the same creation and inference APIs as one another. As a result, all of the LLM plugins and all of the ASR (Speech Recognition) plugins share functionality-specific APIs and can be easily swapped in and out for one another by an application with minor code modifications. All of this is possible with the core plugin architecture by creating interfaces that are shared by all plugins that implement the specific functionality.

Q: What local inference backends are supported?

A:

ONNX Runtime With DirectML
- A cross-platform machine-learning model accelerator for Windows, allowing access to hardware-specific optimizations.
GGML
- An open-source machine learning library written in C and C++ with a focus on Transformer inference using CUDA or CPU backends.
TensorRT
- NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs.

Q: Does NVIGI support PyTorch?

A: No, the NVIGI SDK does not include a local Python interpreter. The core components of the NVIGI SDK includes C++ headers, libraries, and Windows native DLLs.

However, the NVIGI plugin “gpt.cloud.rest” implements the GPT feature using a REST API for remote execution with a cloud backend such as NVIDIA NIM™–which is built on robust foundations, including inference engines like TensorRT, TensorRT-LLM, and PyTorch.

Q: What local AI models are supported?

A:

GGML-based LLMs on CPU or GPU (CUDA)
GGML-based Speech Recognition on CPU or GPU (CUDA)
GGML-based embeddings on CPU or GPU (CUDA)
ONNX GenAI Runtime-based LLMs on GPU
(*Coming Soon) TensorRT-based Speech Recognition on GPU (CUDA)

See full list here.

Q: Does the SDK include sample projects?

A: NVIGI provides several precompiled samples and source code which allows instant experimentation:

Basic: A command-line sample that shows the use of individual ASR and GPT (for SLM or LLM models) plugins to implement conversational AI. The user can provide input by typing their queries or by using a microphone to pass their verbal query to speech recognition. The GPT plugin responds to the query with conversational context. The GPT plugin may be switched from local to cloud models via the command line.
Pipeline: A command-line sample that shows the use of a pipeline plugin, capable of running a sequence of ASR and GPT plugins via a single evaluation call. This sample uses audio input from an audio file
RAG: A command-line sample that shows how to use the GPT and embedding plugins to implement Retrieval Automated Generation, or RAG. Specifically, the sample takes a text file to use as its reference, or “corpus” when answering queries, along with a prompt to guide how it uses the corpus. The user may type in queries for the RAG.

A 3D sample exists in a top-level sample directory. It includes a wider range of plugins, as well as a GUI for interaction and a 3D scene rendered at the same time with support for local and cloud GPT in addition to ASR via GUI-based recording.

Q: Can I use NVIGI for other applications besides games?

A: Yes. NVIGI is an open-sourced solution that simplifies integration of the latest deep learning technologies from NVIDIA and other providers into both games and real-time applications. This framework allows developers to easily implement one single integration and enable multiple technologies supported by the hardware vendor or cloud provider. Supported technologies include AI inference but, due to the generic nature of the NVIGI SDK, can be expanded to graphics or any other field of interest.

Q: How does CPU performance compare to GPU?

The NVIGI SDK includes plugins that use the CPU for inference in addition to the GPU. The developer can choose which is appropriate for their workload. The performance of model inference on the GPU is much higher than the CPU but in some cases the CPU may provide adequate responsiveness.

Q: Can I bring my own model and have it work with NVIGI?

It depends on the particular model and how it was built. If the developer is using a model built on the architecture of a well-known deep learning network or is fine-tuning a well known model, the NVIGI SDK should support it. If the architecture is new or the model is a hybrid combination of networks, the developer may need to create a custom NVIGI plugin or extend one of the existing plugins to handle loading and inference of the custom model.

For more information on writing custom plugins, please see the NVIGI Plugin Development Kit documentation.

Q: Do all the models that work on-device, work in the cloud?

NVIGI is compatible with many different AI models and includes a selection for developers to start experimenting with straight away. Some of these models have endpoints on NVIDIA’s build.nvidia.com AI discovery site. If a developer wants to connect to a model, on build.nvidia.com or via another provider, they can do so using the IGI network plugin. NVIGI allows easy connections to any endpoint running an NVIDIA NIM or using the OpenAI API.

Topic		Replies	Views
Fast and Scalable AI Model Deployment with NVIDIA Triton Inference Server Technical Blog	0	422	November 9, 2021
Optimize AI Model Performance and Maintain Data Privacy with Hybrid RAG Technical Blog	1	61	July 11, 2024
Deploying Fine-Tuned AI Models with NVIDIA NIM Technical Blog nim	1	126	November 26, 2024
Using HashiCorp Nomad to Schedule GPU Workloads Technical Blog	0	485	August 25, 2020
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	1709	January 25, 2024
Scale High-Performance AI Inference with Google Kubernetes Engine and NVIDIA NIM Technical Blog nim	1	33	October 16, 2024
Choosing a Server for Deep Learning Inference Technical Blog	0	426	May 12, 2022
Streamline LLM Deployment for Autonomous Vehicle Applications with NVIDIA DriveOS LLM SDK Technical Blog	3	82	March 24, 2025
Quickly Voice Your Apps with NVIDIA NIM Microservices for Speech and Translation Technical Blog nim	1	27	September 18, 2024
Production Deep Learning with NVIDIA GPU Inference Engine Technical Blog	18	670	October 24, 2016

ACE / IGI FAQ

Related topics