ShareGPT-4o-Image is a large-scale, high-quality image generation dataset where all images are generated using GPT-4o’s image generation capabilities.

This dataset aims to combine the advantages of open-source multimodal models with GPT-4o’s strengths in visual content creation.

It includes 45,000 text-to-image and 46,000 image-to-text samples, making it a practical resource for enhancing multimodal models in image generation and editing tasks.

Janus-4o is a multimodal LLM capable of text-to-image and text+image-to-image generation. It is based on Janus-Pro and fine-tuned using the ShareGPT-4o-Image dataset. Compared to Janus-Pro, Janus-4o introduces text+image-to-image generation capabilities and achieves significant improvements in text-to-image generation.

Dataset Overview

The ShareGPT-4o-Image dataset contains 91,000 GPT-4o image generation samples, categorized as follows:

  • Text-to-image: 45,717
  • Text-plus-image-to-image: 46,539

Related Links

Code: github click here

Model: get the ShareGPT-4o-Image model

Paper: click here

Paper Introduction

Recent advancements in multimodal generation models have unlocked realistic, instruction-aligned image generation. However, leading systems like GPT-4o-Image remain proprietary and inaccessible.

To make these capabilities accessible to the public, the paper introduces ShareGPT-4o-Image, the first dataset containing 45,000 text-to-image and 46,000 text-plus-image-to-image examples, all synthesized using GPT-4o’s image generation capabilities to refine its advanced image generation abilities.Using this dataset, the paper developed Janus-4o, a multimodal large language model capable of text-to-image and text-plus-image-to-image generation.

Janus-4o not only significantly improves text-to-image generation capabilities over its predecessor Janus-Pro but also introduces text-plus-image-to-image generation capabilities.Notably, it achieves impressive performance in generating images from text and images from scratch using only 91K synthetic samples and trained for 6 hours on an 8×A800 GPU machine.

We hope the release of ShareGPT-4o-Image and Janus-4o will promote open research in photo-realistic, instruction-aligned image generation.

Method Overview

ShareGPT-4o-Image enhances image generation performance. By fine-tuning Janus-Pro with ShareGPT-4o-Image, we generated Janus-4o, which demonstrates significantly improved image generation performance. Janus-4o also supports text-to-image and image-to-image generation, outperforming other benchmarks with only 91,000 training samples.

Janus-4o Model Overview. The model is based on Janus-Pro and constructed by fine-tuning it on ShareGPT-4o-Image. It incorporates enhancements to support text-to-image and image-to-image generation. Both text-to-image and text-to-image tasks are trained jointly.

Experimental Results

Conclusions

ShareGPT-4o-Image is the first large-scale dataset capable of capturing GPT-4o’s advanced image generation capabilities in text-to-image and text-to-image generation. Based on this dataset, the paper developed Janus-4o, a machine learning model (MLLM) capable of generating high-quality images from pure text or image-text combinations.

Janus-4o achieves significant improvements in text-to-image generation and achieves highly competitive results in text-to-image tasks, demonstrating the high quality and practicality of ShareGPT-4o-Image.

Thanks to the efficiency of self-regressive image generation based on MLLM, Janus-4o can be trained in just 6 hours on an 8×A800 GPU machine and achieves significant performance improvements with extremely low computational requirements.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *