Microsofts VALL-E AI That Can Steal Your Voice in Just 3 Seconds.

Microsoft has recently developed an AI program known as VALL-E, which has the ability to replicate a person’s voice after listening to them speak for only 3 seconds. This technology, referred to as voice cloning, is not new, but Microsoft’s approach is noteworthy for its simplicity in replicating anyone’s voice with only a small amount of audio data. The program, designed for text-to-speech synthesis, was created by a team of researchers at Microsoft by exposing the system to 60,000 hours of English audiobook narration from over 7,000 speakers to reproduce human-sounding speech. This sample is much larger than what other text-to-speech programs have been built on.

Benefits of VALL-E

Human Voice
DALL-E Voice

The Microsoft team put up a website with several demos showing how VALL-E works. The AI program can not only use a 3-second audio clip to clone someone’s voice but also change what the cloned voice says. The program can also imitate the emotion in a person’s voice or be set up to sound like different people. It can be used to create more realistic text-to-speech systems or even make it possible for people with speech impairments to communicate more effectively. VALL-E can also be used to create more personalized and realistic virtual assistants or even generate speech for characters in video games and animations.

Potential Risks

As with any new technology, there are potential risks and concerns. The ease of replicating anyone’s voice with only a short snippet of audio data means it’s not hard to imagine the same technology being used for cybercrime. The Microsoft team acknowledges this potential threat in their research paper, stating that “since VALL-E AI could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.” There is also a concern that the technology could be used to create fake audio recordings that could be used to spread misinformation or deceive people.

Preventing Misuse

To stop people from misusing VALL-E, the team thinks it might be possible to make programs that can “tell if an audio clip was made by VALL-E or not.” But it’s still important to be aware of the possible risks and take steps to stop the technology from being abused. One way to do this is to make software that can tell when an audio recording was made by VALL-E or another program like it and mark it as such. Additionally, companies and organizations should have strict policies and guidelines for using voice cloning technology.

Limitations of VALL-E

source: Vall-E Github

Despite the impressive capabilities of VALL-E, the technology is still in its early stages and has some limitations. In their research paper, Microsoft’s team notes that VALL-E sometimes has trouble or doesn’t know how to say certain words. At other times, the words can sound jumbled, like they were made with a computer, robotic, or just not right. This means that while the technology has great potential, there is still work to be done to improve the accuracy and realism of the cloned speech.

In the end, Microsoft’s VALL-E program represents a significant advancement in the field of artificial intelligence and text-to-speech synthesis. With its ability to clone a person’s voice after only hearing them speak for 3 seconds, the program has the potential to revolutionize the way we interact with technology and communicate with others. However, it’s important to be aware of the potential risks and take steps to prevent misuse of the technology. Further research and development are needed to improve the cloned speech’s accuracy and realism, but this technology’s potential benefits are undeniable.


Please enter your comment!
Please enter your name here


MetaVerse Zeus



Why is the crypto Market Falling?

The crypto market collapse was shocking. It was like a bank run, but with lines of codes instead of people crowding around a branch....

How do I find NFT metadata on OpenSea and Rarible?

Anyone with even little to no knowledge can find NFT metadata on OpenSea and Rarible with the right API, contract address, and token id. NFTs...

Twitter under Elon Musk? What is his real plan for Twitter?

A nifty app called X? A haven for free speech without bots? Now that Elon Musk might buy Twitter, after all, these are some...

How to detect AI-generated blogs or Texts?

The new chatbot from OpenAI, ChatGPT, gives us a problem: how can we detect whether we read online was generated by a person or...

Gucci NFT drop gets run over by North Korea and saved by Superplastic

The hot topic which made millionaires overnight and started a new era for artists and crypto enthusiasts, of course, I am talking about the...