Signal Processing and Communications Laboratory

Department of Engineering

Extra materials for reviewers for ICASSP submission no. 4297:

Detection and suppression of keyboard transient noise in audio streams with auxiliary keybed microphone


By Simon Godsill (1), Herbert Buchner(1) and Jan Skoglund(2)

(1)Signal Processing Laboratory, Department of Engineering,
University of Cambridge, England

and

(2) Google Inc, Mountain View, California

See Figs. below for the setup in the Pixel Chromebook that we have used here and an example of recorded waveforms from voice and keybed microphones.

chromebook_setup1.png

Example simultaneously recorded waveforms; top: voice microphone with simultaneous speech and key strokes; bottom: Keybed microphone with (principally) just keyboard strikes:

voice_key.png

Example audio processing:

Example 1:


First we present some example audio recorded from the Pixel Chromebook at 44.1 kHz, using synchonised recording of the keybed microphone and the two voice microphones. The corresponding waveforms for these 10s extracts are shown below:

mic_record.png

Accompanying audio

voice mic with key clicks: download

Restored by muting detected frames: download

Restored with EM algorithm: download

Restored with EM plus post-processing for fidelity enhancement: download

Example 2:

An example detection and restoration is shown in Figs. below, with accompanying audio. In all three panels the frames detected as corrupted are indicated by the zero-one waveform overlaid in green. These detections agree well with a visual study of the keyclick data waveform. In the top panel we have the corrupted input voice microphone, in the middle panel the restored output, and at the bottom the original voice signal (available in this test as ‘ground-truth’). Notice that the central panel manages to preserve the speech envelope and speech events around 125 ksamples and 140 ksamples, while suppressing the disturbance well around 1.05 ksamples. The audio is significantly improved in the restoration, leaving just a little ‘click’ residue which can be removed by post-processing using standard techniques [1], Ch. 4, while the simple ‘muting’ restoration is far too extensive to be acceptable. In this fairly typical example a favourable 10.1dB improvement in segmental SNR is obtained for corrupted frames, compared to the muting restoration, and 2.5dB improvement when all frames are considered (including the uncorrupted frames).

restored.png

voice mic with key clicks: download

Restored by muting detected frames: download

Restored with EM algorithm: download

Restored with EM plus post-processing for fidelity enhancement: download