Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have created a computer algorithm that can predict and create sound effects from simply being shown videos of different surfaces being struck with a drumstick. The sounds are so realistic that human test subjects cannot accurately distinguish between the synthesized sound effects and real recorded ones.
The algorithm used in this study is built upon a common feature of artificial intelligence systems called a recurrent neural network, which allows a computer to use its units in dynamic, rather than fixed, arrangements similar to the way neurons can be reconfigured in many different networks in the brain.
The researchers fed the neural network a series of 977 videos with around 48 different sounds in each one, such as plastic bags, walls, or leaves being struck with a stick. The computer then “learned” what different surfaces look and sound like and could then make predictions about future surfaces without ever having “heard” them.
These computer-generated predictions were then played for human test participants alongside the real sounds with a video of the sound-producing action being shown for each. Participants were asked to choose which sound was the real one, and chose the synthesized sound almost twice as often. Some sounds fooled human subjects more than others, such as complex noises found in the natural world:
Often when a participant was fooled, it was because the sound prediction was simple and prototypical (e.g., a simple thud noise), while the actual sound was complex and atypical. True leaf sounds, for example, are highly varied and may not be fully predictable from a silent video.
This is a breakthrough because it represents one of the first computer learning systems to be able to make predictions about information using solely visual information. In other words, the computer could guess what something would sound like based on how it looks.
The team behind this study believes this technology will be useful not only for the sound design industry, but for artificial intelligence research pertaining to computers understanding the physical world around them:
We see our work as opening two possible directions for future research. The first is producing realistic sounds from videos, treating sound production as an end in itself. The second direction is to use sound and material interactions as steps toward physical scene understanding.
This work has potential to revolutionize virtual or augmented reality environments. Once artificial intelligence systems can fully predict how the natural world looks and sounds, virtual environments completely indistinguishable from reality might be possible. So, no matter what we do to destroy and poison our natural environment, at least our new virtual utopias will sound like the Earth that was.