A powerful reasoning model for converting images and text into structured text outputs.
Discovered on HuggingFace via HuggingFace:unknown