Text-to-speech and Speech-to-text Machine Learning platforms for English and Vietnamese.
The CodeLink text-to-speech converts speech to text and back smoothly in both English and Vietnamese with a natural output available in a range of voices and tones.
AI & ML Lead
Quality Assurance Engineer
Google Cloud Platform (GCP)
How can we leverage Machine Learning to create a fluid and natural text-to-speech and speech-to-text model for Vietnamese?
CodeLink tasked the internal product team with creating a model and standards for text-to-speech and speech-to-text between English and Vietnamese.
We worked as an autonomous team to create the text-to-speech and speech-to-text model for Vietnamese.
CodeLink internal teams worked for 4 months to develop the proprietary text-to-speech and speech-to-text AI model.
To find the best approach, we tested several TTS and STT models. Because there were no quality Vietnamese datasets available in the market, we decided to start with text-to-speech technology in Vietnamese. We created an annotation app using ReactJS, Firebase, and GCP for the annotation process and coordinated with local voice talent and text labelers to create the data. To optimize the core back-end, we built a Kubernetes cluster for inference services. After training the model, we deployed the system.