A project by the University of Oxford and UK-based DeepMind, owned by Google parent company Alphabet, trained an artificial-intelligence system to read lips by analyzing 5,000 hours of TV programs, reported New Scientist on Monday.
University of Oxford and DeepMind researchers trained the AI system on six different TV programs that aired between January 2010 and December 2015. After training, the AI was capable of deciphering 46.8 percent of words without any error from 200 randomly selected clips. A professional lip-reader attempted to annotate the same set, but was only able to decipher 12.4 percent of words without any error.
Lip-reading AI would likely be used for consumer applications including improving accuracy of voice recognition systems and silent dictation, New Scientist reported.
More than 5,000 hours of footage from TV shows including Newsnight, Question Time, and the World Today, was used to train DeepMind’s “Watch, Listen, Attend, and Spell” program. The videos included 118,000 difference sentences and some 17,500 unique words, compared to LipNet’s test database of video of just 51 unique words.
DeepMind’s researchers suggest that the program could have a host of applications, including helping hearing-impaired people understand conversations. It could also be used to annotate silent films, or allow you to control digital assistants like Siri or Alexa by just mouthing words to a camera (handy if you’re using the program in public).