The New Star of Multimodal Image Generation: Janus-4o? ShareGPT-4o-Image Sets a New Standard for Datasets, Aligning Image Generation with GPT-4o.

ShareGPT-4o-Image is a large-scale, high-quality image generation dataset where all images are generated using GPT-4o’s image generation capabilities.

This dataset aims to combine the advantages of open-source multimodal models with GPT-4o’s strengths in visual content creation.

It includes 45,000 text-to-image and 46,000 image-to-text samples, making it a practical resource for enhancing multimodal models in image generation and editing tasks.

Janus-4o is a multimodal LLM capable of text-to-image and text+image-to-image generation. It is based on Janus-Pro and fine-tuned using the ShareGPT-4o-Image dataset. Compared to Janus-Pro, Janus-4o introduces text+image-to-image generation capabilities and achieves significant improvements in text-to-image generation.

Table of Contents

Dataset Overview

The ShareGPT-4o-Image dataset contains 91,000 GPT-4o image generation samples, categorized as follows:

Text-to-image: 45,717
Text-plus-image-to-image: 46,539

Paper Introduction

Recent advancements in multimodal generation models have unlocked realistic, instruction-aligned image generation. However, leading systems like GPT-4o-Image remain proprietary and inaccessible.

To make these capabilities accessible to the public, the paper introduces ShareGPT-4o-Image, the first dataset containing 45,000 text-to-image and 46,000 text-plus-image-to-image examples, all synthesized using GPT-4o’s image generation capabilities to refine its advanced image generation abilities.Using this dataset, the paper developed Janus-4o, a multimodal large language model capable of text-to-image and text-plus-image-to-image generation.

Janus-4o not only significantly improves text-to-image generation capabilities over its predecessor Janus-Pro but also introduces text-plus-image-to-image generation capabilities.Notably, it achieves impressive performance in generating images from text and images from scratch using only 91K synthetic samples and trained for 6 hours on an 8×A800 GPU machine.

We hope the release of ShareGPT-4o-Image and Janus-4o will promote open research in photo-realistic, instruction-aligned image generation.

Method Overview

ShareGPT-4o-Image enhances image generation performance. By fine-tuning Janus-Pro with ShareGPT-4o-Image, we generated Janus-4o, which demonstrates significantly improved image generation performance. Janus-4o also supports text-to-image and image-to-image generation, outperforming other benchmarks with only 91,000 training samples.

Janus-4o Model Overview. The model is based on Janus-Pro and constructed by fine-tuning it on ShareGPT-4o-Image. It incorporates enhancements to support text-to-image and image-to-image generation. Both text-to-image and text-to-image tasks are trained jointly.

Experimental Results

Conclusions

ShareGPT-4o-Image is the first large-scale dataset capable of capturing GPT-4o’s advanced image generation capabilities in text-to-image and text-to-image generation. Based on this dataset, the paper developed Janus-4o, a machine learning model (MLLM) capable of generating high-quality images from pure text or image-text combinations.

Janus-4o achieves significant improvements in text-to-image generation and achieves highly competitive results in text-to-image tasks, demonstrating the high quality and practicality of ShareGPT-4o-Image.

Thanks to the efficiency of self-regressive image generation based on MLLM, Janus-4o can be trained in just 6 hours on an 8×A800 GPU machine and achieves significant performance improvements with extremely low computational requirements.

Uncategorized

The complete explanation: from DeepSeek Janus to Janus-Pro!

Byjanus-ai January 30, 2025January 30, 2025

Take Home Message: Janus is a simple, unified, and extensible multimodal comprehension and generation model that decouples multimodal comprehension and generated visual coding, mitigating potential conflicts between the two tasks. It can be extended to incorporate additional input modalities in the future. Janus-Pro builds on this foundation by optimizing the training strategy (including increasing the…

Uncategorized

Cursor supports DeepSeek R1, and new versions update multiple functions

Byjanus-ai January 29, 2025January 29, 2025

Currently, there are too many AI programming tools: Windsurf, Trae (The Real AI Engineer), Cursor, and Copilot. Among these, Cursor is the most advanced and also the most expensive. I have already paid for Cursor and always pay attention to the latest features to get the best value for my money. With the advent of…

Uncategorized

how to run deepseek r1 locally

Byjanus-ai January 31, 2025January 31, 2025

DeepSeek-r1 has caused a heated discussion in the global community due to its outstanding performance. However, as the number of users surges, and with cyberattacks in some unknown areas, official services are often overwhelmed. Today, I will teach you how to build a dedicated, never-down AI assistant at zero cost. Why choose private deployment? What…

Uncategorized

A comprehensive guide to DeepSeek, a usage technique that 90% of people don’t know (recommended for bookmarking)

Byjanus-ai January 29, 2025January 29, 2025

A comprehensive guide to DeepSeek, a usage technique that 90% of people don’t know (recommended for bookmarking) Since DeepSeek-V3 was released a month ago, I have been updating articles and videos related to DeepSeek because I think it is a very awesome company. Until yesterday, history was finally witnessed, topping the US Apple App Store,…

Uncategorized

I distilled DeepSeek-R1’s reasoning ability knowledge into Qwen2, and the results were really explosive!!!

Byjanus-ai January 29, 2025January 29, 2025

Ⅰ. What is knowledge distillation? Knowledge distillation is a model compression technique used to transfer knowledge from a large, complex model (the teacher model) to a small model (the student model). The core principle is that the teacher model teaches the student model by predicting results (such as probability distributions or inference processes), and the…

Uncategorized

deepseek image generator

Bywd.gstar@gmail.com January 28, 2025January 28, 2025

DeepSeek Image Generator: A Revolutionary Breakthrough in AI-Powered Image Creation Introduction The artificial intelligence landscape has witnessed a remarkable transformation with the emergence of DeepSeek’s cutting-edge image generation technology. The DeepSeek Image Generator, particularly through its Janus Pro series, has established itself as a game-changing solution in the competitive field of AI-powered image creation. This…

The New Star of Multimodal Image Generation: Janus-4o? ShareGPT-4o-Image Sets a New Standard for Datasets, Aligning Image Generation with GPT-4o.

Dataset Overview

Related Links

Paper Introduction

Method Overview

Experimental Results

Conclusions

The complete explanation: from DeepSeek Janus to Janus-Pro!

Cursor supports DeepSeek R1, and new versions update multiple functions

how to run deepseek r1 locally

A comprehensive guide to DeepSeek, a usage technique that 90% of people don’t know (recommended for bookmarking)

I distilled DeepSeek-R1’s reasoning ability knowledge into Qwen2, and the results were really explosive!!!

deepseek image generator

Leave a Reply Cancel reply

Resources

Product

Dataset Overview

Related Links

Paper Introduction

Method Overview

Experimental Results

Conclusions

Similar Posts

Leave a Reply Cancel reply

Resources

Product