SapientX

Blazing Speed

founder @ SapientX

Published on Dec 10, 2020

Most graphs are... well... kind of boring. I would like to share one with you that is, to me, quite exciting. One of the holy grails of our business is to create a human like synthetic voice. Last year, Google tantalized us with a very realistic voice. What they didn't tell you was that it needs to run on a super computer. 

We are practical people, making solutions for everyone... not just those who own supercomputers. These days, most synthetic voices (TTS) are based on the open source Tacotron project. It's better than what came before it but it needs a strong computer to deliver a good voice. On simple systems, it generated an unacceptable 2-4 second delay in responding to people. Since many of our customers use low powered systems, this is a big issue.

To address this, we began to work on speed optimizations this summer. I'm happy to report that this hard work has yielded a roughly 4X speed improvement over Tacotron! This means that we can deliver great voices to even small mobile devices now. This may seem like a small detail but we sweat the details here in a quest to deliver the best conversational voice experience for our customers products.