Skip to content

Step-by-Step Guide to Building Visual Conversation Apps

Ever wished you could have a conversation with a Large Language Model (LLM) about the images you see? With LLM technology improving, this is now possible! You can show an image to an LLM and ask questions about it, and the LLM will respond in real-time. It’s like chatting with a smart assistant that can "see" the picture and understand it.

In this post, we’ll walk you through a simple setup that lets you start a visual conversation with an LLM, using just an image and your questions. You’ll learn how to set up this system and have a conversation with an LLM about anything you like in the image.

The Problem

Usually, when we use LLMs to analyze images, the results are limited to a few labels or categories. For example, an LLM might tell you that an image contains a "dog" or a "tree," but it doesn’t let you interact or ask follow-up questions. What if you could ask the LLM, "What’s happening in this image?" or "Can you describe the scene for me?" This is where our visual conversation tool comes in.

This system allows you to have a back-and-forth conversation with an LLM, where the LLM answers questions about what it sees in the image you provide.

The Solution

With this tool, you can show an image to the LLM, ask it what’s in the picture, and keep the conversation going. The LLM uses advanced technology to analyze the image and provide detailed responses. You can even ask follow-up questions based on what the LLM describes.

Key Components:

  • Groq Client: This tool allows you to send messages to the LLM model and receive its responses.
  • Base64 Encoding: This method converts your image into a text format so the LLM can process it.
  • Llama 3.2-90B Vision Model: This is the powerful LLM model that understands and answers your questions about the image.

Implementation

Here’s how to set everything up. You just need a few lines of code to get started with this visual conversation system in Python.

import os
from groq import Groq
import base64
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize Groq client
client = Groq(api_key=os.getenv("GROQ_API_KEY"))

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def visual_conversation(image_path):
    base64_image = encode_image(image_path)

    conversation_history = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What do you see in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
                }
            ]
        }
    ]

    while True:
        response = client.chat.completions.create(
            model="llama-3.2-90b-vision-preview",
            messages=conversation_history
        )

        assistant_response = response.choices[0].message.content
        print("\nAssistant:", assistant_response)

        user_input = input("\nYou (type 'exit' to end): ")
        if user_input.lower() == 'exit':
            break

        conversation_history.append({"role": "assistant", "content": assistant_response})
        conversation_history.append({"role": "user", "content": user_input})

def main():
    image_path = "path/to/your/image.jpg"
    print("Starting a visual conversation. You can ask questions about the image.")
    visual_conversation(image_path)

if __name__ == "__main__":
    main()
Note

Preview models are intended for evaluation purposes only and should not be used in production environments as they may be discontinued at short notice.

Output

Starting a visual conversation. You can ask questions about the image.

Assistant: The image features a kingfisher bird as its central focus. The bird is positioned facing left, with its body turned towards the viewer and its head turned to the left side of the image. It has a long, black beak and vibrant blue feathers on its head, back, and wings, complemented by white feathers around its neck and orange feathers on its chest and belly. The bird's feet are a deep red color.

The bird is perched on a light-colored wooden post or branch, which is situated in the lower right corner of the image. The background of the image is blurred, but it appears to be a natural setting, possibly a field or meadow, with shades of green and brown. The overall atmosphere of the image suggests that it was taken during the daytime, likely in a sunny or partly cloudy environment.

You (type 'exit' to end): What species of bird is shown in this image?

Assistant: This bird in the image is a kingfisher bird. There are 114 different kingfisher species around the globe, and they can be found in all regions except the polar areas. Kingfishers generally dwell in forests, deserts, coastal areas, and other bodies of water that contain their primary diet, fish. Despite their name, not all kingfisher species feed on fish, including the giant kingfisher which eats large insects, and the belted kingfisher which likes to consume crayfish.

You (type 'exit' to end): What habitat is this bird likely to be found in?

Assistant: Kingfishers inhabit a wide range of environments, extending from dense forests to arid deserts, and from mountainous regions to coastal areas and various bodies of water. They adapt to diverse habitats, but typically prioritize areas with access to a suitable body of water where they can hunt for food, of which fish is a major component of their diet.

Some kingfisher species can even be found dwelling by dry riverbeds if available. Most kingfisher species reside near a body of water or even a riverbed that can still flow with water during periods of heavy rain.

You (type 'exit' to end): Is this bird known for any specific behaviors or characteristics?

Assistant: Kingfishers are recognized for their remarkable agility and hunting prowess. They are characterized by swift flight, which is occasionally accompanied by a series of distinctive quick wingbeats followed by a glide-like motion. Their hunting technique typically involves hovering above the water's surface before swooping down in a swift dive to catch fish, insects, and worms. They also exhibit a fascinating courtship behavior, where two kingfishers may fly close together while engaging in a "pebble dance", which involves passing pebbles to one another in midair to symbolize their commitment.

Another notable characteristic of kingfishers is their propensity for burrowing into rocks, sand, or cave walls. These burrows serve as safe havens from predators and harsh weather conditions, and also provide a cozy space for them to incubate their eggs and raise their young. Some species can venture as deep as 6 feet into a bank to create their burrows.

You (type 'exit' to end): Are there any conservation concerns for this bird species?

Assistant: Yes, there are conservation concerns for kingfisher bird species. Several kingfisher species are currently considered to be facing threats to their survival on a global scale. Among these, six species are listed as Critically Endangered, while an additional eight species are classified as Near Threatened.

The primary contributors to the decline of kingfisher populations are habitat destruction, climate change, pesticide pollution, and hunting. Unfortunately, even though some cultures believe birds to be sacred, this notion has not protected kingfishers from certain forms of human exploitation, such as hunting.

You (type 'exit' to end): How can one identify this bird in the wild?

Assistant: Kingfisher birds tend to be visually striking, showcasing vibrant colors such as blue, turquoise, and white plumage with striking orange or reddish orange belts across their chest or just below their neck. They also sport bright yellow bills and feet, featuring a black stripe that may be situated above or below their eyes.

For identification, note the distinctive shape and size of the kingfisher species you encounter. While on the whole, they possess rounded wings and are on the smaller side, as the majority of kingfisher species typically range in size from 10-15 cm. Some larger kingfisher species can exceed up to 30 cm in length. The shape and size of their beak are also indicative of their eating habits, as kingfishers that spend most of their time hunting for insects and other small organisms have shorter, more pointed beaks.

While in flight, watch how the kingfisher bird progresses through its "glides and plunge" flight pattern. Many kingfishers glide at increased speeds, followed by a series of three to four wingbeats. You may also watch as they take off with their tail feathers up in the air and retract them only once the bird has achieved higher altitudes.

Lastly, the location where the kingfisher has been spotted can help in its identification. Each species migrates and lives in distinct regions of the world, and knowing which species are found in your area could assist in determining which kingfisher species you are observing. More specifically, you should note the quality of the land and water, as most kingfishers live along freshwater sources. A kingfisher sighting that lacks a body of water would be unusual unless it is part of a migration pattern or a mistake.

You (type 'exit' to end): What are the breeding habits of this bird?

Assistant: Breeding habits of kingfishers are unique among birds, with deep burrows commonly used to shelter their nests and raise their offspring. An excavated burrow can span up to 8 feet long and 12-20 inches deep, and serves as insulation from extreme temperatures and protection against predators.

Kingfishers raise their chicks inside intricate networks of interconnected tunnels and chambers, while utilizing one large chamber to house the actual nest. The entire network functions as an earth-friendly incubator, shielded from predators and external environmental threats.

Diet typically includes fish, mollusks, crustaceans, and insects. Males will usually present their mates with an offering of a freshly caught meal during the mating season, demonstrating his potential as an excellent provider.

Overall, kingfisher breeding habits involve a careful selection of nesting sites and mates, precise excavation techniques, and vigilant care for young.

You (type 'exit' to end): exit

How It Works

  1. Encoding Images: The encode_image function reads the image from your computer and converts it into a text format (base64). This makes it easy for the LLM to "see" the image and process it.

  2. Starting the Conversation: The visual_conversation function sends the encoded image to the LLM, asking, "What do you see in this image?"

  3. Interactive Dialogue: Once the LLM responds, you can ask it follow-up questions. The conversation continues as long as you want, or until you type "exit" to stop.

Conclusion

This tool makes it easy to start a conversation with an LLM about an image. Whether you’re curious about the contents of a picture, want to analyze it, or just enjoy chatting with an LLM, this setup gives you everything you need. It’s a simple but powerful way to integrate LLMs into your projects.

Ready to try it? Run the code above, use your own image, and start chatting with the LLM. You can also expand this tool by adding new features or exploring how it can work with other LLM models.