A pipeline that processes image-text to text, enhancing reasoning capabilities for users integrating various models.
Discovered on HuggingFace via HuggingFace:unknown