AI image generators can do some impressive things, but they are often limited by your own ability to explain your vision in words for a prompt. Even when the AI can translate your words into the image in your head in some ways, getting the right mix of characters, location, and style all in one image can be difficult.
DALL-E or other tools are able to create images based on pictures you upload, but even then, it can be tough to get the right mix. That’s what makes the new Google Whisk experiment so interesting.
Using Google Gemini and the Imagen 3 image creation model, Whisk can create entirely new images by blending existing ones. Whisk skips the hassle of descriptive poetry by taking images assigned as either subject, scene, or style and combining them appropriately. Should you prefer not to hunt down the right image for one or more of those facets, you can describe it and see what Google makes of it before creating the final form.
For example, I was able to take a picture of my dog and ask to see it as a plushie, an enamel pin, and a sticker, and then get the results below.
How to Whisk
Whisk is available on Google Labs, though only in the U.S. for now. Once you’re in, the interface is refreshingly simple. You’ve got three slots to upload an image, write a prompt that Google will expand on, or ask for a random image from Google’s library. You pick the subject or subjects for the image, meaning it’s not just limited to one and could be a person, animal, or object. Then, you choose the scene, the backdrop, or the location you want. Finally, you select the style, which can be literally any form of art or, as with the plushie, even a crafted object.
Each image has a text description written by Gemini that you can change up if you think it got it wrong. Or, if it’s a generated image, you can play around with the description to get something else. You then can put in more details for the final image, for instance, having my dog balancing on a ball with a funny hat on.
With those in place, Whisk generates two image that doesn’t just combine your inputs, it interprets them. This isn’t Photoshop layering; it’s full-on AI remix culture.
Whisk is at its best when you lean into the unexpected and fun. Whisk thrives on experimentation, which means half the fun is watching how it interprets your wildly mismatched inputs. Sometimes, it gets it right; sometimes, you’re left with something gloriously weird. Either way, it’s a win.
For example, the first image below started with a picture of a pocket watch, a library, and a gothic painting. The second used a photo of a punk rocker, an old alley photo from New York City, and a written description of a classic old comic book art. The third took a photo of a bear in the wild, a photo of an old diner, and an illustration from a children’s book. The results speak for themselves.
Whisked Away
While Whisk is intuitive, a few tricks can help you get the most out of it. Using high-quality images greatly helps, especially if you want to get the subject close to the original character or object. The AI does its best work when it knows what it’s looking at.
Also, think outside the box. You never know what these combinations will lead to. And if it’s not working as you want, it’s much easier to upload new photos of who or whatever you want the AI to play with. Lastly, you can always tweak the underlying captions and inputs for more fine-tuned results.
Not needing meticulously written prompts will likely make Whisk far more attractive to the average person. That said, it will probably face more pushback from creators whose work was used to train the AI models behind it.
Still, if you struggle to put your creative vision into words, an AI image creator that focuses on visuals instead of vocabulary might be your new favorite toy, even if it’s just to see what you would look like as a plushie of yourself.
You might also like
https://cdn.mos.cms.futurecdn.net/7ezmS9dhxv7UyjcWkWZR5W-1200-80.png
Source link
erichs211@gmail.com (Eric Hal Schwartz)