• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

DSP using 'AI'

ppataki

Major Contributor
Joined
Aug 7, 2019
Messages
1,925
Likes
2,433
Location
Budapest
I have recently come across with the new Izotope Ozone 11 plugin suite where they are offering some components with 'AI' boost
Yesterday I gave it a try since I felt intrigued by the concept - although I am not sure how much real AI is behind those plugins...

Anyway, I have started with Master Rebalance and tried bringing the vocals forward using the 'Vocals' option and adjusted the slider to +2dB then started to play a new track
This thing indeed works! I have tried it with various tracks and it worked extremely well with male, female vocals but also with backup vocals. It really brings all of those forward and in a very non-fatiguing and non-artificial way. There is a delta button that enables you to hear only the part that was processes by the plugin so you can check what is actually happening with the sound.

1747040625761.png


Then I tried the new Clarity module and used the Crisp Air preset. I modified the tilt to +6dB/octave
I was like Wooooow! Highs literally came to life but without any harshness. The plugin seems to apply a target curve but very dynamically, similarly to a dynamic EQ.
You can see in real-time what frequencies it boosts and cuts vs. the target curve

1747040644048.png



I do believe that plugins like these are the future - at least for me they have actually elevated the listening experience to a new level
I will try the Impact plugin later today, I am very interested in what it will do to the lows

If any of you has any experience with these or similar 'AI'-based plugins, please share - it would be great to keep a finger on the pulse of these new technologies
 
The newer dynamic EQ and multi-band expander/compressor plugins have some very automated modes that make their use quite easy to use on real-time playback by any player or audio editor that can handle VST plugins. One of the more capable automated EQ plugins I've found is FabFilter Pro-Q4 that I've used with both foobar2000 freeware player and Audacity shareware editor:


More on using both Pro-Q4 and Pro-MB (multi-band expander/compressor) in this thread.

I typically "roll my own" in terms of plugin settings, because the results are far more successful in keeping damaging effects down while enhancing the listening experience.

Chris
 
I am watching new developments with interest. About 1/3 of my music is historic recordings, i.e recorded in the 1920's - 1950's. They are all mono, dynamically limited, scratchy, and low-res. I would love to see what AI can do about that.
 
I am watching new developments with interest. About 1/3 of my music is historic recordings, i.e recorded in the 1920's - 1950's. They are all mono, dynamically limited, scratchy, and low-res. I would love to see what AI can do about that.
I would strongly recommend checking out Izotope's plugins, they have a bunch of those, especially for 'stereoizing', de-noise, de-crackle and upward dynamic expansion (something similar to what Chris was mentioning above with Pro-Q4)
The beauty of using plugins is that you can choose from literally thousands of solutions, not just those that were already mentioned above
 
I am watching new developments with interest. About 1/3 of my music is historic recordings, i.e recorded in the 1920's - 1950's. They are all mono, dynamically limited, scratchy, and low-res. I would love to see what AI can do about that.
What would you predict as a result? I imagine that there would be virtually nothing such tools could do because of the limited information that was captured when the music was recorded. Perhaps I'm being overly pessimistic?
 
What would you predict as a result? I imagine that there would be virtually nothing such tools could do because of the limited information that was captured when the music was recorded. Perhaps I'm being overly pessimistic?
If they can separate the sound to individual investments, like our brains can, then they can mix the resulting individual mono tracks into a stereo mix with a standard arrangement, eg vocals in the center etc. This seems far fetched for now.
 
...I would love to see what AI can do about that...
Perhaps I'm in a minority on this point, but I've found that it still takes an ear and some idea of the root issues involved with any music track, and for popular music tracks, I've found that each track on an album is separately "produced" (a.k.a, compressed, clipped and custom re-EQed) from the original recorded tracks.

You can achieve limited results if you have limited knowledge.

What would you predict as a result? I imagine that there would be virtually nothing such tools could do because of the limited information that was captured when the music was recorded. Perhaps I'm being overly pessimistic?
I've found that basically all that can be done is noise removal, clip fixing (to eliminate the artificially induced 3rd and higher odd harmonics produced during mastering limiting operations), re-EQing to provide an inverse EQ filter to bring out and re-balance what's already there in the recording, and expansion to restore some of what was lost during mastering compression operations.

Otherwise, what you're wanting is to use the tracks to produce base tracks upon which someone builds new tracks to mix with the original track(s). Stereo is recorded, not "enhanced" from monophonic recordings. That was tried in the early 1960s--and it never did sound very good.

Chris
 
I think there are multiple directions that this thread might cover

- restoring old recordings (as mentioned by Keith and the others) using SOTA technology involving AI eventually
- using the same/similar plugins to elevate your listening experience in real-time (this is what I was referring to in my original post) - again eventually using AI-powered algorithms

I think both are very interesting but they are slightly different
 
I am watching new developments with interest. About 1/3 of my music is historic recordings, i.e recorded in the 1920's - 1950's. They are all mono, dynamically limited, scratchy, and low-res. I would love to see what AI can do about that.
Others have mentioned expansion and de-noising.
There's a software called "Capstan" from Celemony (the Melodyne people) for removing wow, flutter and other time and pitch artefacts. Some of the outcomes from using that tool are mighty impressive. I think you might be able to get a free trial of it...
Not AI, just good ol' fashion algorithms...
 
One of the more insidious processes used on popular music tracks over the past ~50 years is the "Aphex Aural Exciter" on recordings up through ~1980-81. This was supposedly only applied to vocal tracks, but it was ubiquitous in its application on very successful artist's albums, such as:

[in] the 1970s, certain recording artists, including Anne Murray, Neil Diamond, Jackson Browne, The Four Seasons, Olivia Newton-John, Linda Ronstadt and James Taylor stated in their liner notes "This album was recorded using the Aphex Aural Exciter."

What I've found is that when listening to these tracks using today's higher resolution hi-fi setups, the vocals actually sound damaged, like they were recorded with faulty analog tape recorders or blown microphone capsules. If there are any reverse-engineering plugins to partially undo the damage that was done by these "exciter" devices, that would be nice to have for restoration efforts.

One complication is that these devices were said to be used only in the individual vocal tracks, so in order to undo the effects of an exciter, it seems that access to the original multitrack studio recordings (used for building stereo downmix tracks) must be available to work only on vocal stems or individual vocal-only tracks of the original studio multitrack recordings.

I've not yet seen any "remastered" albums that claim to have undone Aphex Aural Exciter damage to the tracks to date, unfortunately.

Chris
 
One of the more insidious processes used on popular music tracks over the past ~50 years is the "Aphex Aural Exciter" on recordings up through ~1980-81. This was supposedly only applied to vocal tracks, but it was ubiquitous in its application on very successful artist's albums, such as:



What I've found is that when listening to these tracks using today's higher resolution hi-fi setups, the vocals actually sound damaged, like they were recorded with faulty analog tape recorders or blown microphone capsules. If there are any reverse-engineering plugins to partially undo the damage that was done by these "exciter" devices, that would be nice to have for restoration efforts.

One complication is that these devices were said to be used only in the individual vocal tracks, so in order to undo the effects of an exciter, it seems that access to the original multitrack studio recordings (used for building stereo downmix tracks) must be available to work only on vocal stems or individual tracks of the original studio multitrack recordings.

I've not yet seen any "remastered" albums that claim to have undone Aphex Aural Exciter damage to the tracks to date, unfortunately.

Chris
I have tried the Aphex Aural Exciter as a VST plugin and it indeed sounds totally horrible - so I can totally understand what you are referring to
 
There's a software called "Capstan" from Celemony (the Melodyne people) for removing wow, flutter and other time and pitch artefacts. Some of the outcomes from using that tool are mighty impressive. I think you might be able to get a free trial of it...
The free trial is full featured but you cannot save or export the results. I have been looking at this software for a while now, but it is rental only at $199 for 5 days use.
 
  • Like
Reactions: AOR
The free trial is full featured but you cannot save or export the results. I have been looking at this software for a while now, but it is rental only at $199 for 5 days use.

So it's like Aphex was in the 1970s: pay by the glass...and those were very expensive glasses of processed audio. Not very useful (and extremely greedy, I might add).

Otherwise, what you're wanting is to use the tracks to produce base tracks upon which someone builds new tracks to mix with the original track(s). Stereo is recorded, not "enhanced" from monophonic recordings.
Sorry to self quote... A lot can be done to synthesize new recordings, in similar fashion to the video AI of synthetic popular music singers:


I have reason to believe that's the type of thing that initiated this thread. Understand that there have been extensive restoration efforts in the past. Witness the following "re-recordings":


The effort was reportedly quite high to accomplish these "re-recordings". The enabling element was the Yamaha Disklavier (already in existence for several years) and some fairly tedious conversions of old recordings into Disklavier input files. It's nice to hear Glenn Gould without all the grunting and off-key humming, as well as Sergei Rachmaninoff and Art Tatum in stereo without the noise and distortion of the old recordings on piano (see quote below for more).

The real issue, however, is doing it all synthetically--leaving out the physical musical instruments and the human players altogether. What is the "sound quality" of the finished product, and will people come back to hear it again and again for musical pleasure, i.e., will it be "real music"?

One of the more interesting quotes that I recently read from Toole's book (1st Ed., pgs. 8-9):

...The early recordings that were formative in the development of jazz also suffered from recording difficulties. Spectral and dynamic limitations did not flatter instruments like pianos and drums, so substitute instruments were used. Live performances came to reflect some of these substitutions and sometimes even playing styles. For example “slap” bass playing was a means of minimizing bothersome low-frequency output but retaining some of the essential sound of the upright bass. Said Katz (2004), “The bass drum was a troublemaker even into the 1950s” (p. 81). (Wrapping it in a blanket was a common studio remedy.)

Katz explained as follows:
"Whether in France, the United States, or anywhere in the world, most listeners who knew jazz knew it through recordings; the jazz they heard, therefore, was something of a distortion, having been adapted in response to the nature of the medium. The peculiar strengths and limitations of the technology thus not only influenced jazz performance practice, it also shaped how listeners—some of whom were also performers and composers—understood jazz and expected it to sound." (p. 84)

Recordings also had significant effects on classical music. In a live performance, we wait intently while a musician pauses, lifts a bow, and leans forward to continue a work. In a recording, lacking the visual input, such a delay is “dead air.” Recorded music has, accordingly, better continuity. When sound emerges from the violin, it will likely be played with more vibrato than was customary in earlier times. Katz (2004) makes the case that this is linked to influences of sound recording, and he includes a CD of some examples with his book...

The key point to notice here is that the technology affected the performance and even the compositions. Why? A large payoff in the form of bought recordings. What is the payoff now, and will the quality be anything like what was easily available before the advent of these new tools (AIs)? It may be that "lowest common denominator" music that we hear on streaming--on earbuds--will in fact sink a bit lower. I'm not hopeful of the results.

Can it be done (i.e., converting very old low-quality recordings into stereo or multichannel recordings)? Sure. To some extent, it's already been done. The question I ask is "what is the quality" and "will people consume it like they presently listen to real recordings produced in real venues by real artists? What is the payoff period of the investment? For the Zenph people, it turns out that they stopped producing those "re-recordings" just as quickly as they produced them initially.

Chris
 
Last edited:
If they can separate the sound to individual investments, like our brains can, then they can mix the resulting individual mono tracks into a stereo mix with a standard arrangement, eg vocals in the center etc. This seems far fetched for now.
Not far-fetched at all, tools like this ("stem separators") have been available for years now.

The problem with bad recordings is not just bad sound though, it's missing information. To get a modern sounding recording from an old one, EQ and dynamics are not enough, you probably have to interpolate a lot.
 
What would you predict as a result? I imagine that there would be virtually nothing such tools could do because of the limited information that was captured when the music was recorded. Perhaps I'm being overly pessimistic?
Yes and no. These AI tools don't "restore" the tracks in the sense that they recover hidden/lost information. They "re-interpret" the data based on all their training data, removing what they are trained or told to avoid (noise, crackling, etc.) and adding what they are trained or told to expect (clear vocals, more bass on that trombone, and so on). What you get isn't what it actually sounded like in the studio 70 years ago, but it can still subjectively sound great and have you think "yeah, that's what it could have sounded like back then".

Same thing with AI image restoration, de-noising and all that stuff. In very simple terms, these neural networks are trained on pattern recognition and if trained well, they can add or remove specific pattern types in existing data.
 
For the Zenph people, it turns out that they stopped producing those "re-recordings" just as quickly as they produced them initially.

I have Zenph CD's of the Gould and Rach performances. I was hoping they would restore Schnabel's Beethoven performances, and some early Horowitz recordings. But they disappeared. I wonder what happened to them.
 
Yes and no. These AI tools don't "restore" the tracks in the sense that they recover hidden/lost information. They "re-interpret" the data based on all their training data, removing what they are trained or told to avoid (noise, crackling, etc.) and adding what they are trained or told to expect (clear vocals, more bass on that trombone, and so on). What you get isn't what it actually sounded like in the studio 70 years ago, but it can still subjectively sound great and have you think "yeah, that's what it could have sounded like back then".

Same thing with AI image restoration, de-noising and all that stuff. In very simple terms, these neural networks are trained on pattern recognition and if trained well, they can add or remove specific pattern types in existing data.
A good analogy might be the de-aging tech they use on old actors when they need to play younger versions of themselves. (The Irishman, righteous gemstones, etc.) It uses the "damaged" version to predict what a "clean" version would look like while preserving as much of the underlying performance as possible.

It's not real, but it can be realistic.

I'm conflicted on the possibility. Robert Johnson did some incredible performances on very rustic recording setups. I'd love to hear him in hi-fi, but a lot of "him" would actually be AI airbrushing. Is that a good thing, or is it selling a piece of our souls for subjective sound quality?

I think using mastering VSTs on existing recordings raises the same question to a lesser degree. What's the line between correcting and enhancing? How much do we care about authenticity and where does it come from in a recording?
 
Last edited:
I'm conflicted on the possibility.
Me, too. I’d prefer to let old recordings stand on their merits, based on the technology available of the era. The cynic in me sees the possibility of cleaning up old music and selling it at the expense of the development and release of new music and artists. Private equity buying up back catalogues does not bode well in my opinion for the future of new and emerging talent/genres.
 
Same thing with AI image restoration, de-noising and all that stuff. In very simple terms, these neural networks are trained on pattern recognition and if trained well, they can add or remove specific pattern types in existing data.
When you think about the Zenph "re-performance" process, this is what they are actually doing--adding or removing "patterns" (real piano string strikes and dampers) based mainly on the pitch and amplitude of piano tones detected in the input noisy/distorted data. So it really is a pattern recognition and re-synthesis problem.

The good thing about pianos is that they have Disklavier on a piano to do this re-synthesis process, and everyone basically buys into what is occurring. And basically you only have four variables:

1) pitch
2) strength of note strike
3) sustain
4) soft pedal muffling (typically on or off for a long sequence-not used like a sustain pedal)

Try to do this with a violin, or virtually any sort of wind, string, or percussion instrument (i.e., non-keyboard), and you begin to see the complexities, or better said, the dimensions inherent in the atomic music elements that can be produced.

So good neural nets and good training will tend to collapse the problem space in translating between noisy, distorted inputs, and high quality believable substitutions for the original signals.

But they disappeared. I wonder what happened to them.
A good guess would be: lack of demand (expressed as revenue) and profit.

Another issue is performance rights held by some trust representing the original artist(s)--and the royalties they are trying to command. It seems like the longer time marches on, the more that Congress tries to extend copyright periods--ad infinitum. It was I believe excessive to begin with. Now it's comical.

Remember that the albums/recordings that Zenph has redone have run out of protective copyrights due to age.

Chris
 
Last edited:
Just a quick update from my end: I have fine-tuned both the Clarity and the Master Rebalance plugins and the result is frankly jaw-dropping
Since the change they are making is not steady but dynamic and content dependent, basically it does not matter what type of music I play.... I have never had such an experience before
 
Back
Top Bottom