Pages

Sunday, January 18, 2026

Python 3.13.0 : Use image with ollama models Moondream and BakLLaVA.

Today, I tested with one big image 1366 x 768 px, with ollama on my laptop, only 16Gb RAM, Intel video card.
This script demonstrates how to send the same image to two different multimodal AI models (Moondream and BakLLaVA) using the Ollama API. It loads an image, encodes it in Base64 for Windows compatibility, sends it to both models, and prints their separate outputs so you can compare how each model interprets the same visual input.
import requests
import base64
import time

IMAGE_PATH = "test.png"
OLLAMA_URL = "http://localhost:11434/api/generate"

# Load and encode image
with open(IMAGE_PATH, "rb") as f:
    img_bytes = f.read()

img_b64 = base64.b64encode(img_bytes).decode("utf-8")

def query_model(model_name, prompt):
    payload = {
        "model": model_name,
        "prompt": prompt,
        "images": [img_b64],
        "stream": False
    }

    start = time.time()
    r = requests.post(OLLAMA_URL, json=payload)
    end = time.time()

    r.raise_for_status()
    response_text = r.json().get("response", "").strip()
    elapsed = end - start

    return response_text, elapsed

# Run Moondream
moondream_output, moondream_time = query_model("moondream", "Describe the image in detail")

# Run BakLLaVA
bakllava_output, bakllava_time = query_model("bakllava", "Describe the image in detail")

# Print results
print("=== Moondream Output ===")
print(moondream_output)
print(f"\n[Moondream time: {moondream_time:.2f} seconds]")

print("\n=== BakLLaVA Output ===")
print(bakllava_output)
print(f"\n[BakLLaVA time: {bakllava_time:.2f} seconds]")
python test_time.py
=== Moondream Output ===
The image shows a video game screen with four players engaged in an intense battle. The player on the left is holding 
a sword, while another player stands nearby wielding a shield. A third player is seen running towards them, and the 
fourth player is positioned further back, also armed with a sword. The background of the scene features a dark and 
mysterious setting, possibly a cave or underground area.

[Moondream time: 506.34 seconds]

=== BakLLaVA Output ===
The scene displays a group of monsters standing around a central point, creating an atmosphere reminiscent of a 
Dungeons & Dragons game. A total of nine monsters can be seen throughout the image, with some located closer to 
the center and others nearer corners.

In addition to the monsters, there are several items scattered around, such as a bottle situated in the lower-left
part of the scene. Interestingly, this image also contains text box information and appears to have a web address 
displayed, possibly related to the game or providing additional context for the scene.

[BakLLaVA time: 597.49 seconds]
This is the image, one screenshot from my favorite game Star Trek Online: