
Background information
From horsies to horror – creating fantasy landscapes with AI system DALL-E
by Michelle Brändle
While AI models such as ChatGPT respond to text input or create images from texts like MidJourney does, MiniGPT-4’s approach is different. It creates texts from images.
In March 2023, OpenAI introduced GPT-4, which is available on ChatGPT. It’s a multimodal model that can handle more than just text. There’s no function to create text from images yet. To test the potential of the tool, a research group has created a smaller version of the program with the mentioned image feature – MiniGPT-4.
MiniGPT-4 currently fills a gap, as other large language models (LLM) don’t process images yet. In this respect, the program already shows what will be possible when chatbots can process visual input – in other words, when they can «see». And it’s a lot. MiniGPT-4 can write a poem from an uploaded image, create HTML code from sketches of a website, and tell you what’s wrong with your sick plant. Give it a try if you want. I gave it a go and was amused by the results.
The program is really simple. You upload a picture and write a short description of what you want MiniGPT-4 to do. Check out the demo video below to see what’s possible. For example, the program gives you tips on how to revive your sick plant. It also creates detailed descriptions of artwork.
I can already imagine asking MiniGPT-4 what type of flower I’ve come across on a hike, the program then telling me whether I can make tea out of it – or if it will poison me.
But I’ll start with something less daring. Using a photo of my favourite dish, Pad Thai, I want MiniGPT-4 to give me the recipe.
The chatbot understands and responds in English.
I take a photo of the noodle dish and write:
What is the recipe for this dish?
MiniGPT-4’s answer is short and sweet, but not what I was hoping for:
This dish appears to be a stir-fry with chicken, noodles, and lime. It is likely made with a combination of ingredients such as chicken, noodles, and lime. The dish may be served with a side of rice or other starch. It is likely a popular dish in Southeast Asia.
MiniGPT-4’s answer is formulated very carefully. But it’s not exactly what I was hoping for. I follow up on the conversation with a more specific prompt:
How do I make it?
And I’m rewarded with a recipe (emphasis on a recipe):
To make this dish, you will need the following ingredients:
Instructions:
Right, that seems a bit half-finished and not quite right. There’s surely no butter in Pad Thai. And the tamarind paste is missing. So is the egg. For now, I wouldn’t follow this recipe, but it comes pretty close. Only based on a picture – quite impressive.
The program is supposed to be able to compose lyrics and poems based on a picture. I take a picture I drew myself and have it create a few romantic lines for me. My prompt to MiniGPT-4 is:
Make a poem out of this picture.
The result:
The girl with blue hair and headset
Is surrounded by bubbles in the air
She listens to music on her phone
With headphones on, she’s in her own lair
\The blue and white checkerboard pattern
Creates a fun and playful atmosphere
The girl with blue hair and headset
Is surrounded by bubbles in the air
She listens to music on her phone
With headphones on, she’s in her own lair
I smile. MiniGPT-4 wrote this poem in two minutes. Interesting to see what the program picked up on. But Shakespeare definitely doesn’t need to worry just yet.
I’m a fan of rock music. Bands like the Foo Fighters. But I’m always really bad with names. Apart from Dave Grohl and late drummer Taylor Hawkins, I’ve no idea what the band members are called. A case for detective MiniGPT-4. I upload a photo of the band, ask for their names and let the program do the work.
First I ask who’s in this picture and MiniGPT-4 immediately recognises them as the band Foo Fighters. After further questions, the program names me all the band members. Probably based on prior knowledge of who’s a member of the Foo Fighters, as the order of their names doesn’t match the picture.
MiniGPT-4’s answers aren’t always perfect, but definitely impressive.
My head is overflowing with more ideas. Will MiniGPT-4 be able to tell me what kind of cake I could bake with the ingredients in my cupboard? Or does it know exciting facts about works of art I discover in the museum? Or…
You’re bound to have some ideas, too. Let me know in the comment section what you asked MiniGPT-4, and if you were happy with the answer.
In my world, Super Mario chases Stormtroopers with a unicorn and Harley Quinn mixes cocktails for Eddie and Peter at the beach bar. Wherever I can live out my creativity, my fingers tingle. Or maybe it's because nothing flows through my veins but chocolate, glitter and coffee.