Extra materials for reviewers for ICASSP submission no. 4297:
Detection and suppression of keyboard transient noise in audio streams with auxiliary keybed microphoneBy Simon Godsill (1), Herbert Buchner(1) and Jan Skoglund(2) (1)Signal Processing Laboratory, Department of Engineering,
University of Cambridge, England and (2) Google Inc, Mountain View, California See Figs. below for the setup in the Pixel Chromebook that we have used here and an example of recorded waveforms from voice and keybed microphones.
Example simultaneously recorded waveforms; top: voice microphone with simultaneous speech and key strokes; bottom: Keybed microphone with (principally) just keyboard strikes:
Example audio processing:
Example 1:
First we present some example audio recorded from the Pixel Chromebook at 44.1 kHz, using synchonised recording of the keybed microphone and the two voice microphones. The corresponding waveforms for these 10s extracts are shown below:
Accompanying audio voice mic with key clicks: download Restored by muting detected frames: download Restored with EM algorithm: download Restored with EM plus post-processing for fidelity enhancement: download
Example 2:
An example detection and restoration is shown in Figs. below, with accompanying audio. In all three panels the frames detected as corrupted are indicated by the zero-one waveform overlaid in green. These detections agree well with a visual study of the keyclick data waveform. In the top panel we have the corrupted input voice microphone, in the middle panel the restored output, and at the bottom the original voice signal (available in this test as ‘ground-truth’). Notice that the central panel manages to preserve the speech envelope and speech events around 125 ksamples and 140 ksamples, while suppressing the disturbance well around 1.05 ksamples. The audio is significantly improved in the restoration, leaving just a little ‘click’ residue which can be removed by post-processing using standard techniques [1], Ch. 4, while the simple ‘muting’ restoration is far too extensive to be acceptable. In this fairly typical example a favourable 10.1dB improvement in segmental SNR is obtained for corrupted frames, compared to the muting restoration, and 2.5dB improvement when all frames are considered (including the uncorrupted frames).voice mic with key clicks: download Restored by muting detected frames: download Restored with EM algorithm: download Restored with EM plus post-processing for fidelity enhancement: download