NVIDIA-accelerated AI Models

Explore and deploy top AI models built by the community, accelerated by NVIDIA’s AI inference platform, and run on NVIDIA-accelerated infrastructure.

Explore Models View Performance

DeepSeek

DeepSeek is a family of open-source models that features several powerful models using a mixture-of-experts (MoE) architecture and provides advanced reasoning capabilities. DeepSeek models can be optimized for performance using TensorRT-LLM for data center deployments. You can use NIM to try out the models for yourself or customize with the open-source NeMo framework.

Gemma

Gemma is Google DeepMind’s family of lightweight, open models. Gemma models span a variety of sizes and specialized domains to meet each developer's unique needs. NVIDIA has worked with Google to enable these models to run optimally on a variety of NVIDIA’s platforms, ensuring you get maximum performance on your hardware, from data center GPUs like NVIDIA Blackwell and NVIDIA Hopper architecture chips to Windows RTX and Jetson devices. Enterprise customers can deploy optimized containers using NVIDIA NIM microservices for production-grade support and customize using the end-to-end NeMo framework. With the latest release of Gemma 3n, these models are now natively multilingual and multimodal for your text, image, video, and audio data.

gpt-oss

NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA DGX™ back in 2016. The collaborative AI innovation continues with the OpenAI gpt-oss-20b and gpt-oss-120b launch. NVIDIA has optimized both new open-weight models for 10x inference performance on NVIDIA Blackwell architecture, delivering up to 1.5 million tokens per second (TPS) on an NVIDIA GB200 NVL72 system.

Kimi

Kimi is a family of open-weight models, including MoE, thinking, and specialized models, from Moonshot AI. Kimi K2 is a state-of-the-art MoE language model with 32 billion activated parameters and 1 trillion total parameters. The Kimi K2 Thinking MoE model—ranked as the most intelligent open-source model on the Artificial Analysis leaderboard—saw a 10x performance leap on the NVIDIA GB200 NVL72 rack-scale system compared with NVIDIA HGX™ H200. Fireworks AI has deployed Kimi K2 on the NVIDIA B200 platform to achieve the highest performance on the Artificial Analysis leaderboard.

Llama

Llama is Meta’s collection of open foundation models, most recently made multimodal with the 2025 release of Llama 4. NVIDIA worked with Meta to advance inference of these models with NVIDIA TensorRT™-LLM (TRT-LLM) to get maximum performance from data center GPUs like NVIDIA Blackwell and NVIDIA Hopper™ architecture GPUs. Optimized versions of several Llama models are available as NVIDIA NIM™ microservices for an easy-to-deploy experience. You can also customize Llama with your own data using the end-to-end NVIDIA NeMo™ framework.

NVIDIA Nemotron

The NVIDIA Nemotron™ family of open models, including Llama Nemotron, excel in reasoning along with a diverse set of agentic tasks. The models are optimized for various use cases: Nano offers cost-efficiency, Super balances accuracy and compute, and Ultra delivers maximum accuracy. With an open license, these models ensure commercial viability and data control.

Phi

Microsoft Phi is a family of Small Language Models (SLMs) that provide efficient performance for commercial and research tasks. These models are trained on high quality training data and excel in mathematical reasoning, code generation, advanced reasoning, summarization, long document QA, and information retrieval. Due to their small size, Phi models can be deployed on devices in single GPU environments, such as Windows RTX and Jetson. With the launch of the Phi-4 series of models, Phi has expanded to include advanced reasoning and multimodality.

Qwen

Alibaba has released Tongyi Qwen3, a family of open-source hybrid-reasoning large language models (LLMs). The Qwen3 family consists of two MoE models, 235B-A22B (235B total parameters and 22B active parameters) and 30B-A3B, and six dense models, including the 0.6B, 1.7B, 4B, 8B, 14B, and 32B versions. With ultra-fast token generation, developers can efficiently integrate and deploy Qwen3 models into production applications on NVIDIA GPUs, using different frameworks such as NVIDIA TensorRT-LLM, Ollama, SGLang, and vLLM.

NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Cost for Agentic AI

Built to accelerate the next generation of agentic AI, NVIDIA Blackwell Ultra delivers breakthrough inference performance with dramatically lower cost. Cloud providers such as Microsoft, CoreWeave, and Oracle Cloud Infrastructure are deploying NVIDIA GB300 NVL72 systems at scale for low-latency and long-context use cases, such as agentic coding and coding assistants.

This is enabled by deep co-design across NVIDIA Blackwell, NVLink™, and NVLink Switch for scale-out; NVFP4 for low-precision accuracy; and NVIDIA Dynamo and TensorRT™ LLM for speed and flexibility—as well as development with community frameworks SGLang, vLLM, and more.

Data center illustration showing multi-modal AI tokens for image, audio, visual and more as part of the NVIDIA “Think SMART” framework.

More Resources

Ethical AI

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

Try top community models today.