A camera system developed by Carnegie Mellon University researchers can see sound vibrations with such precision and detail that it can reconstruct the music of a single instrument in a band or orchestra.
Even the most powerful and directional microphones cannot eliminate nearby sounds, ambient noise and acoustic effects when capturing sound. The new system, developed at the Robotics Institute of the School of Computer Science (RI), uses two cameras and a laser to detect high-speed, low-amplitude surface vibrations. These vibrations can be used to reconstruct sound by capturing isolated sound without a logical output or microphone.
«We’ve invented a new way to see sound,» said Mark Scheinin, a researcher at the Light and Imaging Laboratory (ILIM) in Rhode Island. — «It’s a new type of camera system, a new imaging device that can see the invisible with the naked eye.»
The team has made several successful demonstrations of their system’s effectiveness in sensing vibration and sound reconstruction quality. They captured the isolated sound of individual guitars playing at the same time and individual speakers playing different music at the same time. They analyzed the vibrations of a tuning fork and used the vibrations of a Doritos package next to the speaker to capture the sound coming from the speaker.
MIT researchers had previously introduced a visual microphone. However, the new CMU system is a vast improvement over past attempts to capture sound using computer vision. The team uses conventional cameras, which cost several times less than the high-speed versions used in past studies, but provide better recording quality. The dual-camera system can pick up vibrations of moving objects, such as the movements of a guitar when a musician plays it, and simultaneously pick up individual sounds from multiple locations.
«We’ve made the optical microphone much more practical and user-friendly,» said Srinivasa Narasimhan, head of ILIM. — «We’ve improved the quality while reducing the cost.»
The system works by analyzing differences in speckle patterns in sliding shutter and global shutter images. The algorithm calculates the difference in speckle patterns of the two video streams and converts these differences into vibrations for audio reconstruction.
The speckle pattern characterizes how coherent light behaves in space after being reflected from surface irregularities. The team creates a speckle pattern by pointing the laser at the surface of the object producing the vibrations, such as the body of a guitar. This speckle pattern changes as the surface vibrates. The rolling shutter captures the image by quickly scanning it, usually from top to bottom, creating an image by superimposing one row of pixels on top of another. The global shutter captures the image in one copy at a time.
Scheinin and Narasimhan were joined in the study by Dorian Chan, Ph. D. student in computer science and Matthew O’Toole, assistant professor of RI and computer science.
«This system pushes the boundaries of what can be done with computer vision,» said Matthew O’Toole, assistant professor of RI and computer science. — «It’s a mechanism for capturing tiny vibrations at high speed, representing a new area of research.»
Much of the work in computer vision has focused on teaching systems to recognize objects or track them in space. The fact that this work allows systems to better see undetectable high-frequency vibrations opens up new possibilities for computer vision.
A dual-shutter system with an optical vibration sensor can allow sound engineers to monitor the music of individual instruments without interference from the rest of the ensemble to fine-tune the overall sound. Manufacturers could use the system to monitor the vibration of individual machines in factory shops to detect early signs of malfunction.
Meanwhile, in 2021, a team of scientists from the Swiss Federal Institute of Technology Zurich, together with specialists from the University of Edinburgh, developed a new concept of active sound masking. They were able to acoustically conceal existing objects, as well as create the illusion of the presence of objects that were not in the room. The hidden object was enclosed in a ring of microphones (control sensors), inside which was a ring of speakers (control sources). The sensors recorded external acoustic signals reaching the object. Based on the measurements, the computer calculated what secondary sounds the speakers needed to produce in order to supplement the original sound field accordingly.