Meta, the company formerly known as Facebook, has introduced a new tool called CM3leon, which offers text-to-image and image-to-text generation capabilities. CM3leon achieves impressive performance in text-to-image generation despite being trained on significantly less computing power compared to previous transformer-based methods, according to Meta’s blog post.
The architecture of CM3leon is based on a decoder-only transformer, similar to other generative AI models like GPT-3, GPT-4, ChatGPT and LaMDa. However, what sets CM3leon apart is its unique ability to input and generate both text and images, as claimed by Meta.
While text-only generative models are fine-tuned for various tasks to enhance their accuracy in generating results based on instruction prompts, image generation models are typically specialized for specific tasks. In the case of CM3leon, Meta engineers have applied large-scale multitask instruction tuning, which is commonly used for text-only generative models, for both text and image generation.