About · tomevox.com

Why We Built TomeVox

D
Dany
Founder, TomeVox

The story behind TomeVox starts with a problem I ran into while trying to publish an audiobook edition of a book I'd been involved with. The book was finished. It was already selling as an ebook and in print. Adding an audiobook edition seemed like the obvious next step — audiobooks are the fastest-growing format in publishing, and Audible has a massive built-in audience. It should have been straightforward.

It was not straightforward at all.

The Cost of Traditional Audiobook Production

The first thing I did was get quotes from professional narrators on ACX. I posted the title, received auditions, listened to a dozen voices, and started having conversations about pricing. What I found was that producing a single audiobook title through traditional narration costs between $3,000 and $5,000 — and that's for a standard-length book with a mid-range narrator. A literary fiction title, or anything requiring character voices or specific accent work, could go higher.

$3–5k
Typical cost to produce a single audiobook title through professional narration services in 2024–2025, including recording, editing, mastering, and QC.
10–14 wk
Realistic timeline from narrator contract to live Audible listing, including recording, revisions, post-production, and ACX review queue.

These numbers might make sense for a bestselling author with a known track record and a publisher backing the investment. But for an independent author, a small press, or anyone with a backlist of titles to produce, the math simply doesn't work. Spend $4,000 producing an audiobook and you need to sell several hundred copies just to break even — before you've earned anything from the book itself. Most independent audiobooks don't sell that many copies. The economics are broken.

The royalty share model on ACX — where the narrator works for a percentage of future earnings rather than upfront pay — helps with the cash flow problem, but introduces its own issues. Experienced narrators who can attract upfront work have little incentive to take royalty share deals on unknown titles. You end up with a choice between paying a lot of money you may not recoup, or working with less experienced narrators on a bet that the book will sell.

The Attempt to Use Existing TTS Tools

After the sticker shock of traditional narration quotes, I spent several months testing every text-to-speech and AI voice tool I could find. I went in genuinely hoping to find something that worked — I just needed a tool that could convert a book into a listenable audiobook without costing thousands of dollars and taking three months.

What I found was a landscape of tools built for entirely different use cases.

Tools Evaluated — Long-Form Audiobook Production
ElevenLabs
Excellent voice quality for short clips. Character limit per generation requires splitting a full manuscript into 50–80 individual requests. No chapter detection, no M4B output, no ACX compliance processing. DIY post-production required.
Short-form only
Google TTS / Cloud
API-only, requires technical development to build a production pipeline. Output quality was robotic and unsuitable for fiction. No audiobook-specific features whatsoever.
Not usable
Amazon Polly
Developer API, no end-user product. Good for utility voice applications. Voice expressiveness is limited for narrative fiction. No book ingestion or chapter handling.
Not usable
Speechify
Designed as a reading-assistance tool, not audiobook production. Outputs for personal listening only; no commercial distribution rights, no M4B, no ACX output.
Wrong use case
PlayHT
Better voice quality than earlier TTS tools. Still character-limited per generation. No EPUB parsing, no chapter structure, no M4B output. Usable as a synthesis layer but requires building a full pipeline around it.
Synthesis only
Various others
Every tool I evaluated was either a developer API without an author-facing product, a reading assistant without production output, or a short-form synthesis tool that maxed out at a few thousand characters per generation.
Wrong use case

The pattern was clear: there were plenty of tools that could synthesize speech from text. There were zero tools that could take an EPUB file and produce a properly structured, ACX-compliant M4B audiobook without a significant amount of technical work wrapping them.

Every existing tool solved the synthesis problem but ignored the production problem. Converting speech synthesis into a finished, distributable audiobook requires chapter detection, proper file structure, loudness mastering, ACX compliance, and M4B packaging — none of which any tool was handling end to end.

Building What Should Have Existed

I had enough technical background to understand what building this would take. The synthesis layer is the part that looks most impressive in demos, but the real work is everything around it: the document parsing pipeline that can handle EPUB, PDF, DOCX, and TXT formats and extract clean, structured text; the chapter detection logic that reads heading hierarchy and produces one audio file per chapter; the post-production pipeline that normalizes loudness, applies peak limiting, verifies noise floor, and adds the required room tone buffers; the M4B assembly that embeds chapter markers, cover art metadata, and title information into a properly formed container file.

None of that is trivial to build. But once it's built, it runs in a couple of hours on any book file an author uploads — versus weeks of coordination with a human narrator.

The first version of TomeVox was rough. Chapter detection missed edge cases. The loudness normalization had occasional outliers. The PDF parser struggled with certain formatting patterns. I tested it on dozens of books across different genres — fiction, nonfiction, romance, thriller, children's books, business books, memoirs — and fixed every failure case I found. It took most of 2025 to get the pipeline to the point where the output was something I'd be comfortable putting my own name on.

The Genre Problem Nobody Talks About

One thing that surprised me during development was how differently TTS models perform across genres — and how little attention most tools pay to this.

A voice that sounds natural reading a business book often sounds flat reading a romance novel. The emotional register is different. The pacing of dramatic scenes in a thriller requires different handling than the measured, informative tone of narrative nonfiction. Children's books need a warmth and animation that technical documentation doesn't. A model trained primarily on news articles and business content — which describes most of the underlying models in this space — brings those rhythms and intonations to everything it reads, even when the source material calls for something entirely different.

TomeVox's voices are specifically tuned for long-form narrative content. That means expressiveness without overacting. Emotional range without instability. Consistent pacing that serves the story rather than fighting it. The difference is most apparent in fiction, but it shows up in memoir, creative nonfiction, and anywhere the author's voice — their actual authorial style and rhythm — is part of what makes the reading experience work.

Who TomeVox Is For

TomeVox was built for independent authors who have already written a book and want to publish an audiobook edition without spending $5,000 and waiting four months. It was built for small publishers with backlist titles that have never had audio editions because the economics never made sense. It was built for self-publishing authors who treat their writing as a business and understand that audiobooks are a meaningful revenue channel they can't afford to ignore.

It was also built with a specific respect for the reader's experience. An audiobook isn't just a different container for text — it's a different kind of encounter with a story, one that happens during commutes and workouts and quiet evenings with eyes closed. The quality of the voice, the pacing of the delivery, and the integrity of the production matter. TomeVox is not a rough draft or a workaround. It's meant to produce something a listener would choose to pay for and enjoy.

Made for authors, not developers

You upload a file and download an audiobook. No API keys, no audio engineering, no DAW software. The technical complexity is ours to carry.

Flat pricing, no surprises

A single flat fee per book. No subscription, no per-minute charges, no character-limit math. You know exactly what production costs before you commit.

Distribution-ready by default

Output meets ACX technical specifications (44.1 kHz, 192 kbps, normalised to distribution loudness standards) — ready to upload to Spotify, Apple Books, INaudio, Google Play, and Kobo without any post-processing.

Try before you pay

The free first-chapter preview means you hear your actual book in the actual voice before spending a dollar. No demos. No leap of faith.

What We're Building Toward

TomeVox launched in 2025 with a focused scope: take any book file, produce a professional audiobook, handle all the technical requirements, deliver output that authors can distribute immediately. That scope is complete and working.

The roadmap from here goes in a few directions. Better chapter-level customization — the ability to adjust pacing, emphasis, or voice style for specific scenes within a book. Expanded voice options across more styles and regional accents. Faster processing times as the production infrastructure scales. Direct distribution integrations that submit your audiobook to Spotify, Apple Books, INaudio, and other platforms without requiring you to manage file uploads yourself.

The core problem — the one that started all of this — is solved. Producing an audiobook edition of your book no longer requires $5,000 and fourteen weeks. It requires an afternoon and a flat fee that makes economic sense even for modestly selling titles. That was the thing worth building.

If you've written a book that doesn't have an audio edition yet, I'd like to help you fix that.

— Dany, founder of TomeVox

Try TomeVox on your book

Upload your EPUB, PDF, DOCX, or TXT file and get your first chapter produced as a free audio preview. No credit card required until you decide to go forward.

Get Your Free Chapter Preview