analitics

Pages

Friday, June 12, 2026

Python Qt : Simple script to use wandb and weave.

WandB and Weave work together as complementary tools that enhance the process of evaluating, monitoring, and understanding machine‑learning and large‑language‑model behavior, each focusing on a different layer of the workflow while sharing the same ecosystem. WandB functions primarily as an experiment‑tracking platform that records metrics, logs model outputs, stores configuration details, and organizes results into interactive dashboards, making it easy to compare multiple runs, visualize performance trends, and maintain a structured history of experiments across time. It acts like a scientific notebook that automatically captures everything relevant during evaluation, from scores and prompts to timing information, enabling reproducibility and long‑term analysis. Weave complements this by focusing on the granular tracing of LLM calls, capturing each prompt, response, intermediate step, and metadata associated with model execution, which allows developers to inspect how a model arrived at a particular answer, debug unexpected behavior, and analyze qualitative aspects of model reasoning. While WandB summarizes experiments at a high level, Weave dives deep into the internals of each interaction, providing structured logs that can be searched, filtered, and compared. Together, they create a unified workflow where WandB offers experiment‑level insights and Weave provides call‑level transparency, giving developers a complete picture of model performance, reliability, and behavior across different prompts, models, or configurations, especially useful when benchmarking or refining LLMs.
Let's install these:
python -m pip install wandb weave
Collecting wandb
...
Successfully installed abnf-2.2.0 backoff-2.2.1 chardet-7.4.3 cint-1.0.0 diskcache-weave-5.6.3.post1 fickling-0.1.11
googleapis-common-protos-1.75.0 gql-4.0.0 graphql-core-3.2.11 intervaltree-3.2.1 kaitaistruct-0.11 
opentelemetry-api-1.42.1 opentelemetry-exporter-otlp-proto-common-1.42.1 opentelemetry-exporter-otlp-proto-http-1.42.1
opentelemetry-proto-1.42.1 opentelemetry-sdk-1.42.1 opentelemetry-semantic-conventions-0.63b1 pdfminer.six-20260107
polyfile-weave-0.5.9 protobuf-6.33.6 sentry-sdk-2.62.0 sortedcontainers-2.4.0 wandb-0.27.2 weave-0.52.42
Let's see one exemple with my custom artificial intelligence model and PyQt6.
The PyQt6 script is a small LLM evaluation application that takes two inputs: the Ollama model you select and a fixed set of short test prompts. When you start the evaluation, the script sends each prompt to the chosen model, collects the generated responses, then sends those responses to a smaller judge model to obtain a numerical quality score. All generation uses reduced context and limited output length to keep execution fast on an i3 CPU. As it runs, the script displays each answer in the text panel and updates a progress bar. When all prompts are processed, it compiles the collected scores and displays them as a bar chart in the canvas, giving you a quick visual summary of the model’s performance.
The online tool show this result for this script: