About · tomevox.com

Why We Built TomeVox

Daniel Shilansky

Founder, TomeVox

The story behind TomeVox starts with a problem Daniel Shilansky, TomeVox's founder, ran into while trying to publish an audiobook edition of a book he'd been involved with. The book was finished. It was already selling as an ebook and in print. Adding an audiobook edition seemed like the obvious next step — audiobooks are the fastest-growing format in publishing, and Audible has a massive built-in audience. It should have been straightforward.

Traditional audiobook production turned out to be anything but straightforward.

The Cost of Traditional Audiobook Production

The first thing I did was get quotes from professional narrators on ACX. I posted the title, received auditions, listened to a dozen voices, and started having conversations about pricing. What I found was that producing a single audiobook title through traditional narration costs between $3,000 and $5,000 — and that's for a standard-length book with a mid-range narrator. A literary fiction title, or anything requiring character voices or specific accent work, could go higher.

$3–5k

Typical cost to produce a single title through professional narration services in 2024–2025, including recording, editing, mastering, and QC.

10–14 wk

Realistic timeline from narrator contract to live Audible listing, including recording, revisions, post-production, and ACX review queue.

Traditional audiobook production costs of $3,000–$5,000 per title might make sense for a bestselling author with a known track record and a publisher backing the investment. But for an independent author, a small press, or anyone with a backlist of titles to produce, the math simply doesn't work. Spend $4,000 producing an audiobook and you need to sell several hundred copies just to break even — before you've earned anything from the book itself. Most independent audiobooks don't sell that many copies. The economics are broken.

The royalty share model on ACX — where the narrator works for a percentage of future earnings rather than upfront pay — helps with the cash flow problem, but introduces its own issues. Experienced narrators who can attract upfront work have little incentive to take royalty share deals on unknown titles. You end up with a choice between paying a lot of money you may not recoup, or working with less experienced narrators on a bet that the book will sell.

The Attempt to Use Existing TTS Tools

After the sticker shock of traditional narration quotes, I spent several months testing every text-to-speech and AI voice tool I could find. I went in genuinely hoping to find something that worked — I just needed a tool that could convert a book into a listenable audiobook without costing thousands of dollars and taking three months.

The existing TTS and AI voice landscape turned out to be built for entirely different use cases than full-length audiobook production.

Tools Evaluated — Long-Form Audiobook Production

ElevenLabs

Excellent voice quality. Their Studio product accepts EPUB/PDF/DOCX, detects chapters, and provides a full timeline editor with multi-voice support. It's a capable production environment — but it's a production tool, not a production pipeline. You still spend hours in the timeline editor, and the output is MP3/WAV (no M4B, no ACX compliance). For authors who want creative control, it's strong. For authors who just want a finished audiobook, the production overhead is significant.

Powerful tool, requires production work

Google TTS / Cloud

API-only, requires technical development to build a production pipeline. Output quality was robotic and unsuitable for fiction. No audiobook-specific features whatsoever.

Not usable

Amazon Polly

Developer API, no end-user product. Good for utility voice applications. Voice expressiveness is limited for narrative fiction. No book ingestion or chapter handling.

Not usable

Speechify

Designed as a reading-assistance tool, not audiobook production. Outputs for personal listening only; no commercial distribution rights, no M4B, no ACX output.

Wrong use case

PlayHT

Better voice quality than earlier TTS tools. Still character-limited per generation. No EPUB parsing, no chapter structure, no M4B output. Usable as a synthesis layer but requires building a full pipeline around it.

Synthesis only

Various others

Every tool I evaluated was either a developer API without an author-facing product, a reading assistant without production output, or a short-form synthesis tool that maxed out at a few thousand characters per generation.

Wrong use case

The pattern was clear: there were plenty of tools that could synthesize speech from text. There were zero tools that could take an EPUB file and produce a properly structured, ACX-compliant M4B audiobook without a significant amount of technical work wrapping them.

Every existing tool solved the synthesis problem but ignored the production problem. Converting speech synthesis into a finished, distributable audiobook requires chapter detection, proper file structure, loudness mastering, ACX compliance, and M4B packaging — none of which any tool was handling end to end.

Building What Should Have Existed

I had enough technical background to understand what building this would take. The synthesis layer is the part that looks most impressive in demos, but the real work is everything around it: the document parsing pipeline that can handle EPUB, PDF, DOCX, and TXT formats and extract clean, structured text; the chapter detection logic that reads heading hierarchy and produces one audio file per chapter; the post-production pipeline that normalizes loudness, applies peak limiting, verifies noise floor, and adds the required room tone buffers; the M4B assembly that embeds chapter markers, cover art metadata, and title information into a properly formed container file.

Building an end-to-end audiobook production pipeline — from document parsing to M4B assembly — is not trivial. But once built, TomeVox runs the entire process in a couple of hours on any book file an author uploads, versus weeks of coordination with a human narrator.

The first version of TomeVox was rough. Chapter detection missed edge cases. The loudness normalization had occasional outliers. The PDF parser struggled with certain formatting patterns. I tested it on dozens of books across different genres — fiction, nonfiction, romance, thriller, children's books, business books, memoirs — and fixed every failure case I found. It took most of early 2026 to get the pipeline to the point where the output was something I'd be comfortable putting my own name on.

The Genre Problem Nobody Talks About

AI text-to-speech models perform dramatically differently across book genres — and most TTS tools pay little attention to genre-specific voice tuning.

A voice that sounds natural reading a business book often sounds flat reading a romance novel. The emotional register is different. The pacing of dramatic scenes in a thriller requires different handling than the measured, informative tone of narrative nonfiction. Children's books need a warmth and animation that technical documentation doesn't. A model trained primarily on news articles and business content — which describes most of the underlying models in this space — brings those rhythms and intonations to everything it reads, even when the source material calls for something entirely different.

TomeVox's voices are specifically tuned for long-form narrative content. That means expressiveness without overacting. Emotional range without instability. Consistent pacing that serves the story rather than fighting it. The difference is most apparent in fiction, but it shows up in memoir, creative nonfiction, and anywhere the author's voice — their actual authorial style and rhythm — is part of what makes the reading experience work.

Who TomeVox Is For

TomeVox was built for independent authors who have already written a book and want to publish an audiobook edition without spending $5,000 and waiting four months. It was built for small publishers with backlist titles that have never had audio editions because the economics never made sense. It was built for self-publishing authors who treat their writing as a business and understand that audiobooks are a meaningful revenue channel they can't afford to ignore.

TomeVox was also built with a specific respect for the listener's experience. An audiobook isn't just a different container for text — it's a different kind of encounter with a story, one that happens during commutes and workouts and quiet evenings with eyes closed. The quality of the voice, the pacing of the delivery, and the integrity of the production all matter. TomeVox is meant to produce audiobooks a listener would choose to pay for and enjoy.

Made for authors, not developers

TomeVox is a done-for-you audiobook production service. Authors upload a manuscript file and download a finished audiobook — no API keys, no audio engineering, no DAW software required.

Flat pricing, no surprises

TomeVox charges a single flat fee per book based on word count: $49 for short books, $79 for standard, $99 for long books. No subscription, no per-minute charges, no character-limit math.

Distribution-ready by default

TomeVox audiobook output meets ACX technical specifications (44.1 kHz, 192 kbps, normalized to distribution loudness standards) — ready to upload to Spotify, Apple Books, INaudio, Google Play, and Kobo without any post-processing.

Try before you pay

TomeVox offers a free first-chapter preview — authors hear their actual book narrated in the actual voice before spending a dollar. No generic demos. No leap of faith.

What We're Building Toward

TomeVox launched in 2026 with a focused scope: take any book file, produce a professional audiobook, handle all the technical requirements, deliver output that authors can distribute immediately. That scope is complete and working — in 12 languages: Arabic, Chinese, English, French, German, Hindi, Italian, Japanese, Korean, Russian, Spanish, and Swedish.

TomeVox's roadmap includes better chapter-level customization — the ability to adjust pacing, emphasis, or voice style for specific scenes within a book. Additional languages and voice options. Faster processing times as the production infrastructure scales. Direct distribution integrations that submit audiobooks to Spotify, Apple Books, INaudio, and other platforms without requiring authors to manage file uploads themselves.

TomeVox solves the core problem of audiobook production economics. Producing an audiobook edition no longer requires $5,000 and fourteen weeks. With TomeVox, it requires an afternoon and a flat fee starting at $49 — a price that makes economic sense even for modestly selling titles.

If you've written a book that doesn't have an audio edition yet, TomeVox was built to change that.

— Daniel Shilansky, founder of TomeVox

Try TomeVox on your book

Upload your EPUB, PDF, DOCX, or TXT file to TomeVox and get your first chapter produced as a free audio preview. No credit card required until you decide to produce the full book.

Get Your Free Chapter Preview