What is Step 3.7 Flash?

Step 3.7 Flash is a powerful, multimodal sparse Mixture-of-Experts (MoE) model designed specifically for the agentic era. With 198 billion parameters and 11 billion active parameters, it is built to handle complex workflows, including coding, web search, and visual interface interaction, at lightning speeds.

Why Founders Need It

If you are building autonomous AI agents, latency and reliability are your biggest enemies. Step 3.7 Flash provides up to 400 tokens per second, making it ideal for high-frequency, multi-turn agent loops where the model must see, think, and act in real-time. Its ability to navigate UI elements and terminal commands makes it a top-tier choice for building automated software engineers and research assistants.

How to Use It

You can integrate the model via API through the StepFun platform or OpenRouter. Developers can choose between three reasoning levelsβ€”low, medium, and highβ€”to optimize for cost and cognitive depth. It is fully compatible with vLLM, SGLang, and NVIDIA TensorRT-LLM for flexible deployment.

Pricing

  • Input (cache hit): $0.04/M tokens
  • Input (cache miss): $0.20/M tokens
  • Output: $1.15/M tokens

Integrations & Alternatives

The model integrates with major AI frameworks and supports wide compatibility through the OpenRouter ecosystem. Compared to Claude 3.5 Sonnet or DeepSeek V3, Step 3.7 Flash distinguishes itself through superior visual agency and specialized benchmarks in tool calling and repository-level coding tasks.