This article was originally published on November 16, 2022.
The use of human voice work is ubiquitous in modern media platforms, from video games to television and film. But increasingly, the voices you hear on screen aren’t entirely human-made; they are the result of artificial intelligence.
Respeecher, a voice cloning company founded in 2018 and based in Ukraine, is currently working with LucasFilm to provide voice services for star Wars projects. Respeecher’s speech-to-speech technology is responsible for synthesizing the voice of a younger Luke Skywalker in both The Mandalorian and The Boba Fett Bookas well as restoring the iconic James Earl Jones Darth Vader’s voice to its original quality in Obi-Wan Kenobi series.
The company also digitally recreated the voice of the late NFL coach Vince Lombardi for 2021. Super Bowl commercial and helped make possible an Aloe Blacc tribute to Aviciiin which Blacc sings in multiple languages - some of which he doesn’t actually speak.
How AI Voice replication works
Dmytro Bielievtsov, co-founder and chief technology officer at Respeecher, says the process starts with targeted voice recordings of a real person. These recordings, usually an hour or two long in total, are fed into the company’s artificial intelligence software tool and analyzed until the voice can be cloned.
Testing is then conducted—to ensure that this cloned voice is indistinguishable from the original voice—before the replicated voice form is applied to a human “eastern speaker” (an actor who reads lines from whatever project is being produced ). The result is synthetic speech recordings that include the emotions, intonations and nuances of the real human voice beyond what text-to-speech programs with a sounding robot can offer.
“In other words,” Bielievtsov says, “you talk into a microphone and the technology can make you sound exactly like a young Luke Skywalker.” In case The Mandalorianthe company captured the younger target voice of actor Mark Hamill by analyzing old interviews, voiceovers, and automated dialogue replacements, the latter of which were post-production tracks used to enhance the actor’s dialogue.
Respeecher also has a voice marketplace on its website; this allows clients to choose what voices they would like to use for their projects, whether they are making a TV commercial, an audiobook, or some other form of content.
The company is currently working on real-time voice conversion technology that synthesizes a person’s voice in real time. Bielievtsov says the current system sacrifices some qualities in favor of speed and has so far been used in a limited capacity, but its applications are inspiring. In health care, he explains, the technology can help people with voice impairments from procedures like laryngectomies — allowing them to “speak” in their natural voice again.
More videos on YouTube of what Respeecher’s technology can do to a person’s voice may provoke an unusual valley response in some people. The revelation that director Morgan Neville digitally recreated the voice of the late Anthony Bourdain in the documentary roadrunner, for example – uttering a few lines that Bourdain wrote but never spoke – caused considerable controversy.
Read more: The eerie feeling in the uncanny valley
Similarly, the 2020 Emmy Award-winning short film In the event of a lunar disaster, produced by MIT’s Center for Advanced Virtuality for deepfake technology research, includes audio assistance from Respeecher. The documentary features Richard Nixon reading the speech that must be given if the Apollo 11 mission to the moon never returns to Earth. Nixon, of course, never said those words. But in this alternate history, his deeply false speech rewrites reality.
It’s not hard to imagine what voice cloning technology might look like in the wrong hands. Still, Bielievtsov says Respeecher takes the ethical and safety considerations of its technology very seriously.
“We achieve ethical use of synthetic voices by requiring permissions to clone voices and limiting the ability to copy someone’s voice in the Voice Marketplace,” he says, adding that the company is developing two technical safeguards for its technology: a synthetic speech detector and an audio watermark .
Is AI voice replication the way of the future?
Bielievtsov sees the future of AI voice replication as widespread applications in many fields. Some of these apps are already giving great results.
For example, the English actor Michael York (whom many know as Basil Exposition in Austin Powers franchise) suffers from rare disease amyloidosis. In recent years, his speech has been slurred due to swelling of the tongue, one of the symptoms of the disorder.
When he is tasked with recording a new narration for an animated medical film he narrated a few years ago, Yorke discovers that his voice is not what it used to be. Fortunately, AI technology from Respeecher helped match York’s target vote using data from the previous recording session, which successfully allows the movie to be updated.
Bielievtsov believes that voice cloning for cinematography, gaming, streaming and content creation is likely to increase in the coming years. Even call centers can now use it.
“Our team wants to democratize the technology so that smaller film and television studios and video game developers can use it to stretch their budgets even further,” he says. “We want small creators to compete with huge studios with their ideas, execution and creativity, but not budgets.”