I have a 7-minute linear algebra final coming up. Instead of studying for it, here's what I did today.
The Goal
I wanted to transcribe audio and video files locally on my Mac. No cloud APIs, no subscriptions, just something that runs on my hardware.
What Didn't Work
I started with insanely-fast-whisper. After some troubleshooting, it became clear that my Mac (Intel i9 with an AMD GPU) wasn't a good fit. The AMD GPU's VRAM constraints and the lack of Apple Silicon's Unified Memory meant it wasn't going to be fast at all, let alone insanely fast.
Next I tried whisper-ctranslate2, a faster-whisper implementation. pipx install whisper-ctranslate2 failed building the av package dependency. The culprit was a missing pkg-config. I installed it via Homebrew, but then the Python build still failed with Cython compilation errors clashing with the brand-new FFmpeg 8.1 I had just installed.
At that point I gave up on Python wrappers entirely.
What Worked: whisper.cpp
I installed whisper.cpp directly via Homebrew. It's written in C/C++ and optimized for Intel CPUs. No Python, no GPU nonsense.
I wrote a bash script called transcribe that wraps ffmpeg (to convert any audio or video into the 16kHz WAV format that whisper.cpp requires) and then runs the transcription. A few things went wrong along the way:
- FFmpeg filename issue. macOS's
mktempgenerated a random filename likewhisper.XXXXXX.wav.iZqpgm7lZ1that didn't end cleanly in.wav, so FFmpeg didn't know what format to output. I fixed it by usingmktemp -dto create a temp directory instead, then placing anoutput.wavinside it. - Wrong binary name. I called
whisper-cppin my script, but Homebrew actually installs the binary aswhisper-cli. - Cleanup. I added a
trap 'rm -rf "$TMP_DIR"' EXITto the script so the temp WAV files get cleaned up automatically, even if the script crashes midway.
First test: an MKV file, about 7.5 minutes of audio. It processed in roughly 2.5 minutes using the base.en model, entirely on CPU. The test file was mostly silence, though, so the model just hallucinated the word "You" over and over. Not the most convincing demo.
Multilingual Support and Cleanup Check
After the initial test, I verified a few things:
- Temp files. Checked my system's temp directories. The
trapworked. No straywhisper.*directories or WAV files left behind. - Output. By default,
whisper-cliprints the transcription with timestamps to the terminal. If you need it saved, just redirect:transcribe file.mkv base.en > output.txt. - Passthrough flags. I updated the script to accept additional arguments via
"${@:3}"so I could pass nativewhisper.cppflags straight through. - Hebrew test. I downloaded the multilingual
basemodel (different from the English-onlybase.en) and ran it on a Hebrew song ("Chabad Eshet Chayil") using-l heto force the language. It output the Hebrew transcription to my terminal.
Unrelated: Codex Statusline
While I was at it, I also configured Codex's /statusline command. I recommend enabling it. Mine shows something like:
gpt-5.4 xhigh · 100% left · /private/tmp · 5h 82% · weekly 75% · [550e8400-e29b-41d4-a716-446655440000]
You get the model, quality tier, remaining quota (daily and weekly), working directory, session uptime, and session ID at a glance. Useful for keeping track of what each Codex instance is doing, especially when you're running more than one.
Back to Linear Algebra
Or not. We'll see.
Enjoyed this post?
Get notified when I publish something new. No spam, unsubscribe anytime.