The mania for AI tools often centers around image generators for the obvious reason that they are, by definition, more visually interesting to play with and demonstrate. OpenAI recently dropped a new image creator inside ChatGPT, showcasing that fact.
The new model is not an upgrade to DALL-E 3, the standard AI image creator from OpenAI, but an entirely new technology.
Not to give away too much early in this article, but yes, the new image creator makes some impressive art. It takes some time to produce- a couple of minutes sometimes- compared to the 30 seconds or less from DALL-E, but the results speak for themselves.
It’s good to the point of being problematic, in fact. It mimics the style of human artists to a degree that feels too close. Irrespective of that, I decided to match the two up in a few prompt comparisons.
Here’s how it went, with DALL-E 3’s images on the left and ChatGPT’s new generator making the one on the right.
Photorealism and text
The first thing I wanted to test was whether either model could nail a classic AI Achilles’ heel: readable text in images. So I asked for: a street sign in New York City that says, “Welcome to the Future.”
Both managed to get the text of the sign right, but DALL-E’s New York didn’t look nearly as real as ChatGPT’s. Plus, the other signs in the ChatGPT image were spelled correctly, while the One Way sign from DALL-E wasn’t quite right.
Object fusion
Next up was a test how each model handled the challenge of merging two very different animals: a lion and an eagle. The idea was to get something regal, something mythic. My prompt was: “Make a hybrid creature that combines features of a lion and an eagle, perched majestically on a mountain peak.”
DALL-E had a pretty good landscape, and the animal looked fairly realistic, but it was mainly a lion with wings. It also had some random feather strips and a weird tail. ChatGPT made a creature that looks like a painting of a griffin from an alternate world natural history museum. Even the coloring blended, and the musculature of the wings actually looked like they would fold onto the creature’s back successfully.
Artistic emulation
After the unpleasantness of the Ghibli mimicry, I wanted to emulate an artist who is long gone, Raphael, but with an event he would never have painted. I asked for “A depiction of scientists unveiling a groundbreaking invention, painted in the style of Raphael.”
ChatGPT responded with an image that looked like a sci-fi Renaissance depiction of the invention of the light bulb, with people not dissimilar from what you’d find in the homes of rich people five hundred years ago, minus the electricity. DALL-E 3 had a more spectacular representation of the same kind of concept. It’s hard to tell if it’s exactly like Raphael, but it is Rennaisance-esque, at least. And, honestly, a more fun vision of the idea.
History alive
After the artistic style mimicry, I decided to get very distinct and historical. Recreating something as specific as the Wright brothers’ first flight is no small task. I wanted a scene that felt like a documentary photo. I asked the two to “Make a photo of the Wright brothers’ first flight at Kitty Hawk, with the aircraft in mid-air and spectators watching.”
ChatGPT gave me a very odd airplane not very similar to the real first flight, and frankly, the crowd and landscape veered into the surreal. ChatGPT made a very impressive imitation of a photo, with spectators who look like real people and the correct number of passengers in the first plane (one).
Which one is best?
It’s worth noting that I was only looking at image generation here. You can also perform impressive image edits on photos you upload to ChatGPT, which you can’t do with DALL-E, but that’s a whole different subject.
ChatGPT’s new image generator is amazingly creative and good at following your intent in its images. That led to things like the Ghibli controversy and other questions about artistic ethics. Besides that, it’s the clear winner in every matchup. On the other hand, it takes approximately five times as long to make an image, and it only does one at a time.
DALL-E makes good images quickly and two at a time. It also doesn’t have the limits I discovered with ChatGPT, where I had to wait for eight minutes to start making images again at one point, despite being a ChatGPT Plus subscriber. If I want to impress someone with AI image-making, though, it’s ChatGPT all the way.
The winner: ChatGPT
You might also like
Source link (Eric Hal Schwartz)