A model that converts image inputs into descriptive text outputs.
Discovered on HuggingFace via HuggingFace:unknown