Just came across this brand new (May 2016) paper, which is a very interesting read, http://www.mdpi.com/2076-3417/6/5/143/pdf.
The abstract:
The abstract:
Endowing machines with sensing capabilities similar to those of humans is a prevalent quest
in engineering and computer science. In the pursuit of making computers sense their surroundings, a
huge effort has been conducted to allow machines and computers to acquire, process, analyze and
understand their environment in a human-like way. Focusing on the sense of hearing, the ability of
computers to sense their acoustic environment as humans do goes by the name of machine hearing.
To achieve this ambitious aim, the representation of the audio signal is of paramount importance. In
this paper, we present an up-to-date review of the most relevant audio feature extraction techniques
developed to analyze the most usual audio signals: speech, music and environmental sounds. Besides
revisiting classic approaches for completeness, we include the latest advances in the field based
on new domains of analysis together with novel bio-inspired proposals. These approaches are
described following a taxonomy that organizes them according to their physical or perceptual basis,
being subsequently divided depending on the domain of computation (time, frequency, wavelet,
image-based, cepstral, or other domains). The description of the approaches is accompanied with
recent examples of their application to machine hearing related problems.