According to monitoring by 1M AI News, AI inference infrastructure company Fireworks AI has released a preview of Fireworks Training, expanding from a pure inference platform to an integrated platform for training and deployment. Fireworks AI was founded by Lin Qiao, a former Meta engineer involved in building PyTorch, and is currently valued at $4 billion, processing 15 trillion tokens daily. The platform offers three tiers: 1. Training Agent: Designed for product teams without ML infrastructure, allowing them to describe tasks and upload data to complete the entire process from training to deployment, currently supporting only LoRA; 2. Managed Training: Aimed at ML engineers, supporting SFT, DPO, and reinforcement learning fine-tuning, including full parameter training; 3. Training API: Targeted at research teams, allowing customization of loss functions and training loops, supporting algorithms such as GRPO and DAPO with full parameter training scales ranging from single-node Qwen3 8B to Kimi K2.5 (trillion parameters) on 64 NVIDIA B200s. Fireworks AI's production inference clients, AI programming tools Cursor, Vercel, and Genspark, have completed cutting-edge reinforcement learning training on this platform. Vercel trained an automatic error correction model for its code generation product v0, achieving a 93% error-free code generation rate, compared to only 62% with Sonnet 3.5, and improved end-to-end latency by 40 times compared to the previously used closed-source model. Genspark fine-tuned the trillion-parameter open-source model Kimi K2 with reinforcement learning to build a deep research agent, increasing tool usage by 33% and reducing costs by 50%. Cursor completed distributed reinforcement learning training for Composer 2 across 3 to 4 clusters globally (currently ranked first on CursorBench), sharing the same GPU pool for training and production inference. Fireworks AI emphasizes its core technological differentiation in numerical consistency between training and inference. MoE (Mixture of Experts) models are numerically more fragile than dense models, where minor changes in hidden states can flip expert routing and amplify cascading effects. Fireworks has published the KL divergence values between training and inference for all supported models, all below 0.01.
All Comments