Text-to-speech and Speech-to-text platforms for English and Vietnamese

Overview

The CodeLink text-to-speech converts speech to text and back smoothly in both English and Vietnamese with a natural output available in a range of voices and tones.

Team Model

Front-end Developer

Back-end Developer

DevOps

AI & ML Lead

Quality Assurance Engineer

Technology

AI/ML

TensorflowJS

RabbitMQ

Kubernetes

Python

Firebase Crashlytics

Google Cloud Platform (GCP)

ReactJS

Redis

NestJS

NextJS

Flutter

AI (Pytorch)

Tensorflow

Platform

Web

Challenge

How can we leverage Machine Learning to create a fluid and natural text-to-speech and speech-to-text model for Vietnamese?

Request

CodeLink tasked the internal product team with creating a model and standards for text-to-speech and speech-to-text between English and Vietnamese.

Engagement Model

We worked as an autonomous team to create the text-to-speech and speech-to-text model for Vietnamese.

Engagement Length and Scale

CodeLink internal teams worked for 4 months to develop the proprietary text-to-speech and speech-to-text AI model.

Project Outcome

To find the best approach, we tested several TTS and STT models. Because there were no quality Vietnamese datasets available in the market, we decided to start with text-to-speech technology in Vietnamese. We created an annotation app using ReactJS, Firebase, and GCP for the annotation process and coordinated with local voice talent and text labelers to create the data. To optimize the core back-end, we built a Kubernetes cluster for inference services. After training the model, we deployed the system.

case study

Let's discuss your project needs.

CodeLink Newsletter

Contact Us