Safari is not supported yet。
Janus Pro WebGPU is a cutting-edge application designed for in-browser unified multimodal understanding and generation. It leverages the Janus-Pro-1B model, which is an autoregressive framework developed to handle both text and image inputs and outputs, making it a versatile tool for various AI tasks.
Overview of Janus Pro WebGPU
- Framework: The application is built using React and Vite, utilizing Transformers.js for model integration and WebGPU for hardware acceleration.
- Model Capabilities: Janus-Pro-1B excels in multimodal tasks, allowing users to input images and receive generated images or text-based responses. This model is particularly notable for its ability to interpret and generate content based on visual inputs, showcasing advanced capabilities in both understanding and generating visual data.
- Performance: The model operates efficiently in web browsers that support WebGPU, such as Chrome. Users have reported significant performance benefits, including faster inference times compared to traditional GPU setups.
Getting Started with Janus Pro WebGPU
To set up and run the Janus Pro WebGPU application locally, follow these steps:
- Clone the Repository:bash
git clone https://github.com/huggingface/transformers.js-examples.git
- Navigate to the Project Directory:bash
cd transformers.js-examples/janus-webgpu
- Install Dependencies:bash
npm install
- Run the Development Server:bash
npm run dev
After executing these commands, open your browser and navigate tohttp://localhost:5173
to interact with the application
Model Specifications
- Training: Janus-Pro-1B is trained using a lightweight distributed training framework, achieving competitive performance across various benchmarks. It features a unique architecture that separates visual encoding pathways for understanding and generation tasks, enhancing both stability and performance
- Input Limitations: The model supports image inputs of up to 384 × 384 pixels, which can affect its performance in detailed tasks like optical character recognition (OCR). Users may notice that while the generated images are semantically rich, they might lack fine detail due to this resolution limitation
- Open Source: Janus Pro is available under an open-source license, allowing developers to explore its capabilities freely while adhering to ethical usage guidelines