STAR: Speech-to-Text AR Engine
November 2021 - January 2022
Multimodal | Video Prototyping | Unity
Design and development of a prototype for real-time speech-to-text interactions in augmented 3D for Microsoft HoloLens users with hearing impairments
Role : Research, interface design
Outcome : Preliminary prototype that can convert spoken speech to visual text on Microsoft HoloLens
Contributors: Ritty Thomas, Navya Sanjna Joshi, Shreeya Sathe
Background
Individuals with hearing impairments often experience difficulties in social interactions, specifically when it comes to understanding speech in conversations. It can be challenging to navigate these situations. An ergonomic text-to-speech system in Augmented Reality (AR) on the HoloLens is a potential solution to aid those individuals in such scenarios.
Objective
Develop a prototype that can
- Convert speech to text
- Display text in real-time with AR and Unity
- Understand existing system in the Unity program
- Implement multimodal interactions
- Design an integrated hearing aid feature
Process
1.
Secondary
Research
2.
User
Insights
5.
User
Testing
4.
Analyze
AR Glasses
3.
Technical
Solutions
Mixed Reality UI Components
Pixel vs. Real-world size comparison of UI Components
Mixed Reality UI Components
User testing with HoloLens
Interface
Figma prototype for occularity accessibility
Figma prototype for legibility accessibility
Challenges
Challenges with recruiting participants with hearing impairments
Extensive process of setting up the Unity development environment
Speech-to-text systems frequently unreliable, delayed, and had excessive latency
Outcomes
For this prototype, we decided to focus on designing for one-to-one conversation that one-to-many. This was mainly because the interface configuration in Unity was proving to be challenging and time consuming. We incorporated our 4 key design takeaways as our priority.
Occularity and luminance of the optics and interface are essential for accessibility
For increased readability, font charasteristics like stroke width, roundedness of text etc. should be considered
Certain color combos of text like white on black, yellow on black etc. make the text more legible
Situational text alignment, limiting to 2 lines of text etc. are responsibile factors for best visual practices
Takeaways
I found this project to be an enlightening experience in exploring design interactions through the medium of AR. It was challenging but it offered me the opportunity to understand the applications of design through research and technology. The key takeaway from this project was the importance of considering the human aspect while designing, and the value of learning and becoming familiar with new tools for development.