According to monitoring by Dongcha Beating, Liquid AI has open-sourced two small-scale multimodal models: LFM2.5-VL-1.6B-Extract and LFM2.5-VL-450M-Extract. These new models are specifically optimized for extracting structured data from images, allowing users to convert images into JSON format data directly on the device based on a specified list of fields, eliminating the need for traditional multimodal models to generate full text before secondary parsing. The new models are available in two parameter specifications: 1.6 billion (1.6B) and 450 million (450M), released under the LFM Open License v1.0. Official evaluations show that the new models perform excellently in scenarios such as document scanning, in-vehicle cabin understanding, and industrial inspection. In benchmark tests, the 1.6B model's performance can compete with general multimodal models at the 4 billion (4B) level, while the 450M model is comparable to 2 billion (2B) level models. In terms of deployment, the new models are adapted for various smart hardware and edge device chips (SoC), enabling offline deployment in scenarios such as in-vehicle cabin understanding, document scanning, and industrial inspection. Liquid AI has now made the model weights available for download on the Hugging Face platform.
All Comments