A model that specializes in text generation based on visual input.
Discovered on HuggingFace via HuggingFace:unknown