Lightweight Implicit Neural Network for Binaural Audio Synthesis

Abstract

High-fidelity binaural audio synthesis is crucial for immersive experiences, but existing methods require extensive computational resources, limiting their on-device application. To address this, we propose the Lightweight Implicit Neural Network (LINN), a novel two-stage framework. LINN first generates initial estimates using a time-domain warping, which is then refined by an Implicit Binaural Corrector (IBC) module. IBC is an implicit neural network that predicts amplitude and phase corrections directly in the coordinate system, resulting in a highly compact model architecture. Experimental results show that LINN achieves statistically comparable perceptual quality to the best-performing baseline model while significantly improving computational efficiency. Compared to the most efficient existing method, our model has 3.6 times fewer parameters and significantly fewer compute operations (MACs). This demonstrates that our approach effectively addresses the trade-off between synthesis quality and computational efficiency, providing a new solution for high-fidelity on-device spatial audio applications.

Abstract

Samples from Binaural Speech Datasets