Audio To Image
Transform spoken descriptions into images with this workflow. Record or upload audio, which is transcribed by Whisper and then visualized by Stable Diffusion. Perfect for quickly generating images from verbal ideas without typing.

The workflow
Nodes in this workflow
3 nodes · 3 types- Audio Inputnodetool.input.AudioInput
- Automatic Speech Recognitionnodetool.text.AutomaticSpeechRecognition
- Text To Imagenodetool.image.TextToImage
How to run it
- 01
Download NodeTool Studio
Install the free desktop app for macOS, Windows, or Linux. It runs on your own machine, no account required to start.
- 02
Open the Audio To Image template
Browse the built-in template library inside Studio and open this workflow onto the canvas. Every node is already wired up.
- 03
Add your keys
Connect the providers this workflow uses (Audio Input, Text To Image). Bring your own keys — you pay the provider directly.
- 04
Run and remix
Hit Run to execute the graph and watch results stream in. Swap models, edit prompts, or rewire nodes to make it yours.
Run Audio To Image on your machine
Free, open source, and yours to run. Download Studio, open the template, and run it with your own keys.






