NVIDIA Riva Vulnerabilities Leave AI-Powered Speech and Translation Services at Risk

Summary:

Trend Micro Research discovered a pattern of exposed NVIDIA Riva API endpoints across multiple organizations deploying Riva in cloud environments. These exposed instances were found operating without authentication, potentially leaving them accessible to threat actors. We identified two vulnerabilities (CVE-2025-23242 and CVE-2025-23243) that contributed to these exposures.
Misconfigured Riva deployments enable unauthorized access, allowing attackers to abuse GPU resources and API keys without restriction. Exposed APIs also increase the risk of data leakage, denial-of-service (DoS) attacks, and system disruptions.
Organizations running AI-powered services with proprietary models could be at risk of intellectual property theft, particularly if their models or inference services are exposed through misconfigured APIs.
Companies should assess their deployments, particularly those running default or misconfigured settings in cloud environments, as these can lead to unintended exposure of Riva services. Organizations using Triton Inference Server in advanced configurations should also verify their security posture, as misconfigurations or excessive exposure could introduce security risks and potential attack vectors.
Trend Vision One™ Cloud Risk Management proactively detects unintended network exposure in cloud deployments. For additional best practices and detailed recommendations, see the mitigation guidance below.

NVIDIA Riva represents a breakthrough in AI speech recognition, translation, and synthesis, which enables companies to integrate high-performance models into various applications including transcription, voice assistants, and conversational AI.

However, its implementation brings new and unique security challenges. The rush to harness advanced speech recognition capabilities can expose enterprises to security risks, as the complex nature of the deployment architecture and intricate layers of AI models and APIs create an expansive attack surface that demands careful consideration.

We discovered a concerning pattern of exposed NVIDIA Riva API endpoints across multiple organizations deployed in cloud environments. These instances exemplify security oversight, operating without authentication safeguards and remaining accessible to any potential threat actor.

These issues would allow anyone to access Riva services free of charge, such as using expensive hardware resources and paid API keys. Another consequence includes the disclosure of underlying service information, making it susceptible to denial-of-service (DoS) and memory attacks against Triton Inference Server when advanced configuration is applied.

We identified two vulnerabilities (CVE-2025-23242 and CVE-2025-23243) that consistently contributed to these exposures. Following a responsible disclosure process in cooperation with Trend Zero Day Initiative™ (ZDI), these vulnerabilities have been fixed and disclosed under ZDI-25-145 and ZDI-25-144.

To understand these vulnerabilities, let’s start with the NVIDIA Riva Quick Start Guide. Following the tutorial, we can obtain an NGC command-line utility and download the QuickStart resource. The initialization script will download the necessary container images and AI models from the NVIDIA Artifact Registry. Note that suitable hardware with GPUs and valid API keys are required to obtain and use the models.

Figure 2. An illustration of NVIDIA Riva’s pipeline (Source: NVIDIA)

Once the initialization is complete, we can start the Riva service using the “bash riva_start.sh” command. Once the operation starts successfully, the Riva server listens for the gRPC connections on port 50051. The underlying implementation can remain hidden due to the use of third-party libraries. It serves as a convenient all-in-one package and an out-of-the-box solution for complex software.

Multiple exposed ports from the container to the host are open, listening on 0.0.0.0 (all IP addresses). This network setting is equivalent to the docker --network host parameter and, without any firewall settings, will be accessible to everyone.

The Riva gRPC API protocol is shipped with gRPC reflection enabled, allowing everyone to identify the type of service and reconstruct the binary protocol. This is not an issue on its own, since security by obscurity is not a good practice. When left exposed, however, it simplifies service identification. The Riva gRPC protocol is well-known to developers through multiple open-sourced repositories available on GitHub.

This raises the question: Can these gRPC endpoints be secured? Going through the documentation and available examples, a user might assume that the service can be configured in a secure way by modifying the config.sh script and generating appropriate certificates.

Figure 3. Snapshot of Riva’s documentation

However, even when all the certificate parameters are provided in the config.sh from the NVIDIA QuickStart package, the gRPC server enforces only TLS/SSL connection and encrypt the traffic between the client and the server. This means you will be able to verify that the server is what it claims to be. However, nobody will verify the client, and everyone will be able to use the service. This behavior might invoke a false sense of security while the services are exposed to everyone.

What about the other exposed ports? The Riva server internally communicates with the Triton Inference Server. In fact, it just translates API requests into a language that Triton Inference Server understands. Those ports expose the Triton Inference Server binary due to the container configuration:

REST API endpoint (default 8000)
gRPC API endpoint (default 8001)
HTTP metrics endpoint (default 8002) (only /metrics endpoint)

This make the REST and gRPC Triton Inference Server API available for everyone. So even when successfully securing the Riva server gRPC endpoint, it could still be completely bypassed by translating the requests to the Triton Inference Server endpoints.

Notably, some of the endpoints pose a significant security risk when Triton Inference Server is configured with advanced settings, as they might expose inherent flaws and previously disclosed vulnerabilities.

Many might dispute those issues as a security problem for the user. However, we first need to understand the problem’s scope and prevalence. To answer that, we must describe how we first identified the problem. We previously wrote about a problem of insecure gRPC implementations, for instance.

Extending our previous research, we found 54 unique IP addresses with their NVIDIA Riva services exposed, all of which belonging to multiple cloud service providers. These finding led us to analyze the root source of the problem.

Security best practices and recommendations

We recommend all Riva service administrators check their configuration against unintended service exposure and ensure that they’re running the latest version of the Riva framework. In addition to NVIDIA’s best practices, consider implementing the following security measures:

Implement a secure API gateway and expose only intended gRPC or REST API endpoints. These help prevent unauthorized access and protect back-end services.
Apply network segmentation by restricting access to the Riva server and Triton Inference Server to trusted networks. This helps minimize the attack surface and prevents unauthorized access from the internet.
Require strong authentication mechanisms and enforce role-based access control to ensure only authorized users and services can interact with Riva APIs. Consider zero-trust approaches, such as identity-aware access, to ensure that only authenticated and authorized users and devices can interact with Riva services.
Review and modify container settings to disable unnecessary services, remove unused ports, and restrict privileged execution. This prevents attackers from exploiting exposed services or misconfigurations.
Enable logging and monitoring on Riva and Triton Inference Server to detect unusual access patterns, anomalous activities, or potential abuse.
Consider rate limiting and API request throttling, particularly if gRPC or REST endpoints are exposed to external networks or integrated into environments where threat actors could attempt brute-force or DoS attacks.
Keep the Riva framework, Triton Inference Server, and dependencies up to date to mitigate known vulnerabilities and protect against newly discovered exploits.

Figure 4. Trend Vision One Cloud Risk Management prevents unsecure default settings, such as one shown in the image, from being implemented.

Trend Micro protection

Cloud Risk Management proactively detects unintended network exposure in cloud deployments similar to those we find exposed. Cloud Risk Management IDs EC2-016 and EC2-001 are examples of security checks that prevent such exposure.

EC2-016 can help ensure that Amazon EC2 default security groups restrict all inbound public traffic in order to enforce AWS users (administrators, resource managers, etc.) to create custom security groups that exercise the Principle of Least Privilege (POLP) instead of using the default security groups.
EC2-001 can help ensure that Amazon EC2 security groups don't have range of ports opened for inbound traffic in order to protect the associated EC2 instances against Denial-of-Service (DoS) attacks or brute-force attacks. Cloud Risk Management strongly recommends opening only specific ports within your security groups, based on your applications requirements.

NVIDIA Riva Vulnerabilities Leave AI-Powered Speech and Translation Services at Risk

Authors

Trend Vision One™ - Proactive Security Starts Here.

Resources

Support

About Trend

Country Headquarters

Trend Vision One™ - Proactive Security Starts Here.

Resources

Support

About Trend

Country Headquarters

The Americas

Middle East & Africa

Europe

Asia & Pacific