12.7 C
New York

Introducing PaliGemma 2 mix: A vision-language model for multiple tasks

Published:

This past December, we launched PaliGemma 2, an upgraded vision-language model in the Gemma family. The release included pretrained checkpoints of different sizes (3B, 10B, and 28B parameters) that can be easily fine-tuned on a wide range of vision-language tasks and domains, such as image segmentation, short video captioning, scientific question answering and text-related tasks with high performance.

Now, we’re thrilled to announce the launch of PaliGemma 2 mix checkpoints. PaliGemma 2 mix are models tuned to a mixture of tasks that allow directly exploring the model capabilities and using it out-of-the-box for common use cases.

What’s new in PaliGemma 2 mix?

  • Multiple tasks with one model: PaliGemma 2 mix can solve tasks such as short and long captioning, optical character recognition (OCR), image question answering, object detection and segmentation.
  • Developer-friendly sizes: Use the best model for your needs thanks to the different model sizes (3B, 10B, and 28B parameters) and resolutions (224px and 448px).

If you were already using the original PaliGemma mix checkpoints, you can directly upgrade to PaliGemma 2 without needing to do any changes. The model performs different tasks depending on how it’s prompted. You can review the different prompt task syntax in the official documentation and learn more about how PaliGemma 2 was developed in our technical report.


Detection

  • Task: Detection (PaliGemma-2-3b-mix-224)
  • Input: “detect android\n”

Result: a cow standing on a beach next to a sign that says warning dangerous rip current.

Optical Character Recognition (OCR)

Result: A cow standing on a beach next to a warning sign.

Result:

WARNING DANGEROUS

RIP CURRENT


Get Started Today

Ready to discover the potential of PaliGemma 2? Here is how you can explore the mix model capabilities:

  • Try out the mix model with a few clicks: Explore the mix model capabilities directly on the Hugging Face demo.
  • Learn how to run the model: Try out the Keras inference notebook directly in Google Colab or locally.

While PaliGemma 2 mix has strong performance across multiple tasks, you will get the best results by fine-tuning PaliGemma 2 in your own task or domain. To learn how to do it, dive into our comprehensive documentation, check our official example notebooks for Keras and JAX, or use the Hugging Face transformers example. We’re looking forward to seeing what you build with it!

Source link

Related articles

Recent articles