Voice Memos to Markdown

May 17, 2026

Categories: AI Dev Tools Optimizations

Tags: claude-code dev workflow workflow-automation

Recently i started to record a lot of voice memos on my iphone and on my mac. The nice part is you can get the transcription of these memos and use the text in an AI to summarize, extract ideas, stuff like this.

With the recent AI advancements, i started to build a personal wiki karpathy style. i would just type notes, in different contexts and dump them all in one place for an agent to pick them up and organize, correlate, extract ideas, you know, all the bells and whistles.

With voice memos the process was even more annoying. I would end up having lots of .m4a files like New Recording 18, and no idea what i said unless i listen to them again.

So i built a small Mac app to fix that for me. It's called voice2md.

How it works

You pick two folders in Settings: an input folder and an output folder. After that you forget about the app. Drop a file into the input folder — by AirDrop, by iCloud sync, by drag and drop, whatever — and a few seconds later a .md file shows up in the output folder. The filename looks like 2026-05-17_0930_meeting-with-tom.md, which sorts nicely by date and is easy to grep.

For the transcription itself the app tries two things, in order:

If the file is an Apple Voice Memo recorded on a recent iPhone, the transcript is already inside the file. Apple writes it into a small piece of metadata in the .m4a. The app just reads it out. No model, no waiting.
Otherwise it falls back to whisper.cpp running locally on your machine. You pick which model size you want in Settings — small and fast, or big and accurate. Nothing leaves your laptop.

What you get out

By default the output is plain and boring, which is exactly what I want. A bit of frontmatter at the top with the date, duration, source filename and a few other fields, then a heading, then the transcript verbatim. That is it. No summaries, no bullet points, no robot voice telling you what you just said.

---
title: meeting with tom
date: 2026-05-17
duration: 00:04:12
source: New Recording 12.m4a
---

# meeting with tom

So the thing I wanted to talk about was...

If you do want a structured version with summary, key points, action items, topics, there is an optional toggle in Settings that sends the transcript to a model of your choice.

You can point it at Claude, at an Azure OpenAI deployment, or at a local Ollama instance if you would rather not send anything to the cloud. It is off by default.

The boring plain Markdown is the default on purpose.

Where the personal wiki agent comes in

This is the part I actually care about. The output folder is not just a dumping ground — it is the inbox for my personal wiki agent. The agent watches the same folder, picks up new Markdown files, decides where they belong in my notes (work, ideas, errands, a particular project), links them to related pages, and files them away.

So the full loop is: I talk into my phone → the recording syncs to my Mac → voice2md drops a Markdown file in the output folder → the wiki agent picks it up and files it. I never touch a file manager. I never rename anything. I just talk and a few minutes later the thought is in my notes, in the right place, searchable.

Because the output is plain Markdown with predictable frontmatter, you can plug whatever agent or script you like on the other end. Obsidian vault, a static site, a script that pipes everything into a search index — it does not matter to the app. It writes files. You decide what reads them.

The small things that make it nice to use

It is a menu-bar app, no Dock icon, no window in the way. The icon is a little waveform.
There is a Pause toggle for when you are copying a pile of files in and do not want them all processed at once. I use it more than I expected.
Recent shows the last few files it produced, one click reveals them in Finder.
Process all in input folder is for when you want to backfill. Dump a year of old recordings in and walk away.
Re-process missing files looks at the input folder, looks at the output folder, and transcribes anything that does not have a matching .md yet. Useful after you delete a note by accident.
It remembers what it has already processed by hashing the file contents, so dropping the same recording in twice does not give you two transcripts.
The waveform turns into a spinner while it is working. There is a desktop notification when something fails.

What it deliberately does not do

It does not try to be clever. No speaker diarization, no multi-language support (English only for now), no upload to a wiki, no account, no telemetry.

The app does one job — audio file in, Markdown out — and leaves the rest to other tools. The wiki agent is one of those other tools, and that is on purpose.

Should you build one

Probably, yes. The whole thing is maybe two thousand lines of Swift, most of which is settings UI and tests. The actual pipeline — watch a folder, hash the file, transcribe it, render the Markdown — is a couple of hundred lines. The hard parts (the transcription, the file watching) are libraries other people wrote. If you have a folder full of recordings you have not listened to, a weekend is enough to wire something like this up for yourself.

The thing I keep coming back to is how much friction it removes. Before, a voice memo was a task: I have to listen to this and write it down before I forget what I meant. Now it is just a thought I said out loud once, and it shows up in my notes on its own. That turns out to be a much nicer way to think.

Build this yourself: https://github.com/adrianprecub/blog/tree/main/voice2md