OpenInference

Browse models provided by OpenInference (Terms of Service)

3 models

Tokens processed on OpenRouter

Google: Gemma 4 31BGemma 4 31B
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license.
by googleApr 2, 2026262K context$0/M input tokens$0/M output tokens

OpenInference

Browse models provided by OpenInference (Terms of Service)

3 models

Tokens processed on OpenRouter

Google: Gemma 4 31BGemma 4 31B
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license.
by googleApr 2, 2026262K context$0/M input tokens$0/M output tokens

OpenAI: gpt-oss-120bgpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

by openaiAug 5, 2025131K context$0/M input tokens$0/M output tokens

OpenAI: gpt-oss-20bgpt-oss-20b

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

by openaiAug 5, 2025131K context$0/M input tokens$0/M output tokens