As we have mentioned in our ML & AI: From Problem Framing to Integration, problem framing is crucial because a well-defined problem sets the direction for the entire AI project, influencing data collection, model selection, and the overall success of the solution.
In this blog, we will expand on the question framework we used in the problem-framing phase with sample answers for our Tanoto project.
What is the problem you want to solve?
This involves identifying the specific issue, understanding its context, and outlining the desired outcomes. This step will help you to get the right strategy goal and avoid any overcomplexity.
What specific output do you want to predict?
What input do you have?
How many training samples can you provide?
Are there reference solutions, like a rival company’s products or research papers?
Do you have a benchmark?
What evaluation metrics are you using?
What is the minimum level of metrics you expect?
What would a perfect solution look like?
Are there deadlines to be aware of?
When can you provide the first result / when will the customer expect the first result / final solution?
Who will be involved? The best backup team, who is your domain expert?
What technologies can be used? Will our team need to learn new skills?
In this section, we will discover how Tanoto’s AI interviewer was made at CodeLink. Tanoto application replicate an online interview experience with an AI interviewer which can easily be accessed by user who wants to improve their interview skill.
First, we have to clear the application’s motivation. We found out that non-verbal expressions during communication are as significant as spoken words. So we want to make recommendations and advice based on the emotions displayed in facial expressions during the interview with an expression classification.
Now we define our input and output. The output of the model will be one facial expression the interviewee is showing. The list of expressions will include happiness, fear, anger, disgust, surprise, sadness, and neutral.
The input will be a frame of the webcam feed showing the user’s face. Facial expression is a very common problem so there will be a lot of public datasets, and for this example,Blog post 1: Problem Framing 6 we use FER2013 dataset.
The model will receive an image frame, and then a square box containing a single face is extracted using face detection and segmentation. This is then fed into an image classifier to determine an emotion label.
Since we're working with a public dataset, we've got a handy benchmark to guide us. Head over to Paperswithcode.com to check dataset benchmarks. As of when we're writing this post, the FER2013 benchmark has a top accuracy of 76.826%. So, we've set our sights on hitting a minimum accuracy of 70%, and we're also fine-tuning other evaluation metrics like precision, recall, and F1 score.
We organize a team with:
Our Full-stack Web Developers haven’t worked on face detection projects, but they have skills in JavaScript, so they can easily handle face detection with TensorFlow MediaPipe. As for our AI Engineers, they have to learn basic JavaScript in less than a week to help Full-stack Web Developers integrate the TFJS model into the website.
The project was set for 3 months. We divided it into several 2-week sprints. After each sprint, the development team should deliver new changes into the test environment for others to test out.
Defining the problem upfront is like choosing your destination. It shapes everything from the look and feel of the app to how it solves real user issues. It influences decisions onwhat features to include, how to collect data, and what tech to use.
So, if you're ever wondering why problem framing is a big deal, remember: it's the difference between a road trip with a clear destination and one with endless detours. Choose wisely, and happy app building!