- Joined
- Mar 6, 2021
- Messages
- 214
- Likes
- 289
In recent years machine learning (e.g., training neural networks) has become common in many domains. I use it in my own research, I have a graduate student working on a project to determine the travel time of seismic waves traveling through Earth’s interior in order to expand our dataset beyond what is possible using hand-picked data (but trained on hand-picked data). This data will be used for imaging our planet’s interior. He also wants to use this approach to search for unusual seismic events in big data sets but which is impossible to do without computationally tractable methods. I have also watched with interest as digital image processing has embraced machine learning approaches, with applications ranging from “super-resolution” to “de-noise” filtering. A company called Topaz AI has taken this up and their products are growing in popularity among photographers.
So I was naturally curious to see whether these approaches have been employed in digital audio...and of course they have. Speech recognition in digital recordings is a huge application, and is developed into an entire industry. But what about hi-if audio? How about common applications such as upsampling? It seems like neural networks could be an efficient way to perform these and other routine audio processing tasks (certainly more efficient than the M Scaler).
I found some interesting research on upsampling, along with some code and examples:
https://kuleshov.github.io/audio-super-res/
Very interesting work, here is the summary of the findings:
In any case, I thought it may be interesting to query the ASR community and see what else is happening in machine learning with applications to hi-fi. Anyone know of interesting work or other potential applications?
So I was naturally curious to see whether these approaches have been employed in digital audio...and of course they have. Speech recognition in digital recordings is a huge application, and is developed into an entire industry. But what about hi-if audio? How about common applications such as upsampling? It seems like neural networks could be an efficient way to perform these and other routine audio processing tasks (certainly more efficient than the M Scaler).
I found some interesting research on upsampling, along with some code and examples:
https://kuleshov.github.io/audio-super-res/
Very interesting work, here is the summary of the findings:
Machine learning algorithms are only as good as their training data. If you want to apply our method to your personal recordings, you will most likely need to collect additional labeled examples.
Interestingly, super-resolution works better on aliased input (no low-pass filter). This is not reflected well in objective benchmarks, but is noticeable when listening to the samples. For applications like compression (where you control the low-res signal), this may be important.
More generally, the model is very sensitive to how low resolution samples are generated. Even using a different low-pass filter (Butterworth, Chebyshev) at test time will reduce performance.
In any case, I thought it may be interesting to query the ASR community and see what else is happening in machine learning with applications to hi-fi. Anyone know of interesting work or other potential applications?