Today, this script tool is a small desktop tool that lets you visually synchronize spoken audio with written dialogue. You load an audio file and the program generates a waveform so you can click or drag to select exact time ranges. At the same time, you select the matching text, and the tool creates timestamped dialogue segments.
Each segment includes a dialog ID, start time, end time, and the associated text. Segments appear in a list, can be played individually, deleted, or tested in a separate window where each dialog ID becomes a playback button. This makes it easy to verify timing and structure.
The entire project—audio path, full text, and all segments—can be saved or loaded as a JSON file. Older JSON formats containing only segments are automatically converted. The final JSON is ready for use in game engines like Godot or Unity for precise voice‑over playback.
I used artificial intelligence to fix some issues, this is the result:
