A pipeline that converts images and text into a coherent text format.
Discovered on HuggingFace via HuggingFace:unknown