I Skipped Studying for My Linear Algebra Final to Set Up Whisper Transcription

Posted by Michael S. on March 29, 2026

I have a 7-minute linear algebra final coming up. Instead of studying for it, here's what I did today.


The Goal

I wanted to transcribe audio and video files locally on my Mac. No cloud APIs, no subscriptions, just something that runs on my hardware.


What Didn't Work

I started with insanely-fast-whisper. After some troubleshooting, it became clear that my Mac (Intel i9 with an AMD GPU) wasn't a good fit. The AMD GPU's VRAM constraints and the lack of Apple Silicon's Unified Memory meant it wasn't going to be fast at all, let alone insanely fast.

Next I tried whisper-ctranslate2, a faster-whisper implementation. pipx install whisper-ctranslate2 failed building the av package dependency. The culprit was a missing pkg-config. I installed it via Homebrew, but then the Python build still failed with Cython compilation errors clashing with the brand-new FFmpeg 8.1 I had just installed.

At that point I gave up on Python wrappers entirely.


What Worked: whisper.cpp

I installed whisper.cpp directly via Homebrew. It's written in C/C++ and optimized for Intel CPUs. No Python, no GPU nonsense.

I wrote a bash script called transcribe that wraps ffmpeg (to convert any audio or video into the 16kHz WAV format that whisper.cpp requires) and then runs the transcription. A few things went wrong along the way:

  1. FFmpeg filename issue. macOS's mktemp generated a random filename like whisper.XXXXXX.wav.iZqpgm7lZ1 that didn't end cleanly in .wav, so FFmpeg didn't know what format to output. I fixed it by using mktemp -d to create a temp directory instead, then placing an output.wav inside it.
  2. Wrong binary name. I called whisper-cpp in my script, but Homebrew actually installs the binary as whisper-cli.
  3. Cleanup. I added a trap 'rm -rf "$TMP_DIR"' EXIT to the script so the temp WAV files get cleaned up automatically, even if the script crashes midway.

First test: an MKV file, about 7.5 minutes of audio. It processed in roughly 2.5 minutes using the base.en model, entirely on CPU. The test file was mostly silence, though, so the model just hallucinated the word "You" over and over. Not the most convincing demo.


Multilingual Support and Cleanup Check

After the initial test, I verified a few things:

  • Temp files. Checked my system's temp directories. The trap worked. No stray whisper.* directories or WAV files left behind.
  • Output. By default, whisper-cli prints the transcription with timestamps to the terminal. If you need it saved, just redirect: transcribe file.mkv base.en > output.txt.
  • Passthrough flags. I updated the script to accept additional arguments via "${@:3}" so I could pass native whisper.cpp flags straight through.
  • Hebrew test. I downloaded the multilingual base model (different from the English-only base.en) and ran it on a Hebrew song ("Chabad Eshet Chayil") using -l he to force the language. It output the Hebrew transcription to my terminal.

Unrelated: Codex Statusline

While I was at it, I also configured Codex's /statusline command. I recommend enabling it. Mine shows something like:

gpt-5.4 xhigh · 100% left · /private/tmp · 5h 82% · weekly 75% · [550e8400-e29b-41d4-a716-446655440000]

You get the model, quality tier, remaining quota (daily and weekly), working directory, session uptime, and session ID at a glance. Useful for keeping track of what each Codex instance is doing, especially when you're running more than one.


Back to Linear Algebra

Or not. We'll see.

Enjoyed this post?

Get notified when I publish something new. No spam, unsubscribe anytime.