A pipeline that converts image inputs into text outputs utilizing advanced reasoning capabilities.
Discovered on HuggingFace via HuggingFace:unknown