• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

"Things that cannot be measured"

Again, Its possible to go from time domain to frequency domain with Fourier and loose nothing no matter what the signal is. It dosnt have to be steady state. look up the FT of an impulse or step, there used in engineering as often as sines. The video was supposed to explain that. What part of that drawing is steady state?
How its done below. Go to 28 minutes to see how sines add up to give a signal thats random numbers, actually looks like it could be music. Yes the transforms are periodic but the period can be 100 years if you want.

Yeah my point was you get no help with compression when doing an FFT and inverse on the whole song.
However one might if one can gate the windows smartly.
 
That's all well known - and has nothing to do with measuring speakers or electronics.
Currently, it (direct measurment of timbre) has nothing to do with measuring loud speakers because we currently can't measure it. Timbre is indeed well known, it’s how a newborn knows its mother’s voice (actually before they are born). But we cannot directly measure it because we don’t know all of its characteristics, yet. Four or five have been identified since the 60s,

At one time, it was thought that movie audiences preferred a very limited frequency response. Then Harry Olson did his famous experiments and figured out that people preferred a limited frequency range when the reproduction chain was filled with distortion. In blind testing, he determined that people actually preferred full range reproduction if it was distortion-free. Then a lot of work was put into measuring the distortion of loudspeakers. That was the 40s and 50s.

My guess (that's all it is) is that when we have microphones and analyzers sophisticated enough to distinguish human voice (voice print) or one violin from another (timbre changes in an instrument over time), those measurements and analysis could probably be adapted to determining speaker preference testing in the way that frequency response and directivity, distortion are now.

In other words, we are currently at 86% correlation now, as per Dr. Floyd we still have a ways to go. Will a valid direct measurement of timbre get us closer to 100%.

What I am NOT saying is that because we cannot measure timbre that it renders what we currently have of little or no value (are measurements everything or nothing). As I said from the beginning, does that fact that we cannot measure timbre directly yet significant in audio/speaker measurements. "Probably not" is what I said.

If we could measure it, to a high enough resolution so that the output can say that the opera singer in this waveform is Joan Sutherland, the guitar is a pre-war Martin, the piano is a German Steinway. Humans (some) can identify all of those things right now blind. Or is it something that most speakers are capable of reproducing good enough so it's not something that is going to help refine preference testing?
 
Last edited:
Currently, because that is all we have. Timbre is well known, it’s how a newborn knows it’s mother’s voice (actually before they are born). But we cannot directly measure it because we don’t know all of its characteristics, yet. Four or five have been identified, since the 60s.
Saying it again dosnt make it true. Did you miss post 2081 or just ignoring it? Theres a list of 10 characteristics of timbre and how to directly measure them. Synthesizers have been using these to recreate instruments for 40 years.
 
Timbre is well known, it’s how a newborn knows it’s mother’s voice (actually before they are born). But we cannot directly measure it because we don’t know all of its characteristics, yet. Four or five have been identified, since the 60s.
I'm not sure what progress has been made on psychoacoustics and perception of timbre, and how many modes of audio perception are involved... and it's not measurable directly because it's best thought of as a subjective experience and not a quantity, but as @Cbdb2 says, we have all the tools to take timbre apart and put it back together again, and have for decades now.

Case in point, there are multiple types of synthesizer (from physical modeling to subtractive and everything in between) for every category of real instrument and some that are only imaginary.
 
It seems like you're envisioning a bridge between objective measurements and subjective descriptions, maybe? Or would it be a numerical comparison, like "Speaker A reproduces a violin with 94.2% accuracy" ? Or something else?
My speculation (hypothesis) is that a good accurate of timbre may get us from 86% correlation on good sounding speakers to closer to 100%.

It's based off of this paper from Dr. Toole (https://www.harman.com/documents/audioscience_0.pdf)

"The simple fact is that, without science, there would be no audio, as we know it. Without extensive and meticulous subjective evaluation to confirm the meaning of measurements, there would be no audio science, as we know it. Without audio science, audio engineering would be a trial and error exercise. Clearly, one must pay close attention to both the objective and subjective forms of product evaluation, because there is still more to learn. The recent work showing strong correlations between predicted (from technical measurements) and real subjective ratings is, perhaps, the strongest evidence that we are on the right track. A faith in the scientific method is not a blind faith. It is a faith built on a growing trust that measurements can guide us to produce better sounding products at every price level, for every application. The final proof, though, is in the listening because, as much as we have come to trust measurements, we must always be alert to new variables. Readers may have noted an absence of references to non-linear distortions. It is not that they are unimportant, but rather that current design practices have reduced them to the point where they are not normally factors in determining listener preferences. However, it does occasionally happen that they are, and that is another reason to listen. In the meantime, the search for more meaningful ways to quantify non-linear distortion continues. Present measures of harmonic and intermodulation distortion are crude in the extreme, exhibiting very poor correlation with how devices sound with music." [Emphasis is mine]

If we could measure timbre directly, would it move the ball forward. Would it lead to the discovery of a new variable? Encompass a more meaningful way to quantify non-linear distortion? If we knew what the characteristics of timbre were, and able to analyze it with high enough resolution to associate a recorded voice to a particular individual (like a baby can with its mother), or a telecaster from a stratacastor, a pre-war Martin from a modern one, could that be carried over to speaker measurements?

I don't know if the measurement would be a percentage of accuracy. If it led to something that advanced the science on preference, a measurably good speaker, refined the Predictability score - to where a 7.2 was significantly better than a 6.8 to a statistically significant number of people in blind testing, I do know what I would like them to name/call it: The Toole Factor.
 
Directly testing for timbre would allow you to identify a specific person from a waveform - which we are not close to being able to do.

I'm sure there is software to identify who is or is not speaking, when it has a sample database of that person with which to compare.
 
If we could measure timbre directly, would it move the ball forward. Would it lead to the discovery of a new variable?
There's no way to measure Timbre directly, and never will be, because it's a word we use to describe the subjective experience of many variables at once.
It is no different than wanting to measure flavor directly.

What would that mean?

"I have measured the flavor of this liquid, and the flavor is ketchup."

"I have measured the timbre of this sound, and it's a violin with a slightly raspy upper register."

What more can you say about Timbre or flavor? You can't put a number on it, just say what it is.

If we knew what the characteristics of timbre were, and able to analyze it with high enough resolution to associate a recorded voice to a particular individual (like a baby can with its mother),
This already exists. Voice print detection is nothing new.

could that be carried over to speaker measurements?
I'm objecting a lot to what you're asking, but I see what you're saying. Being better able to quantify why people react to certain small differences in the sound of instruments or voices could give us better tools to quantify speaker sound quality. Agree.

I just don't think a measurement of "timbre" is conceptually possible, but the things that drive perception of timbre are the same things that drive perception of speaker sound quality, or as Toole puts it,

the search for more meaningful ways to quantify non-linear distortion continues.
 
Last edited:
I'm not sure what progress has been made on psychoacoustics and perception of timbre, and how many modes of audio perception are involved... and it's not measurable directly because it's best thought of as a subjective experience and not a quantity, but as @Cbdb2 says, we have all the tools to take timbre apart and put it back together again, and have for decades now.

Case in point, there are multiple types of synthesizer (from physical modeling to subtractive and everything in between) for every category of real instrument and some that are only imaginary.
We do have the tools to take apart timbre in water (acoustic signatures) which is where my experience lies. Those measurements are on specific sounds that human perception cannot distinguish. They are buried in recording. A recording would get you a very precise spectrograph but isn't going to tell you much, There are also other measurements of the timbre, precise enough that you can program a mine to arm or disarm for "friendly" vessels. The same for torpedoes. Those measurements provide a 99.9+% correlation that what the mine/torpedo is "hearing" can correctly identify a good guy or a bad guy. God help you if it says you are a bad guy.

We may have the tools to take timbre apart on some of the known characteristics, but we don't know all of them. What we can measure currently isn't of sufficient resolution to render it useful.

For example, they have been working on "voice prints" for 50 years or more. However, no court allows an expert, FBI or otherwise, to come into court and say: "I have the 911 call recording of the anonymous caller, we compared that with a sample we obtained from the defendant, ran them through our program, and they are a match." That testimony isn't allowed because the measurements are not precise enough, and the characteristics are not fully identified to allow an expert to say it meets the threshold for forensic evidence (reasonable degree of scientific certainty." However, a lay person can come into court who is familiar with the defendant, like a family member, co-worker, and they are allowed to give a lay opinion and say, "I listened to 911 call, I have worked/lived with the defendant for 10 years, tI'm familiar with the defendant's voice, and it's my opinion that the person who made that call is the defendant."

My original post on this was a simple response to the question of the original post, are there things in sound that we cannot measure? Yes there are, we can't directly measure timbre because all of the characteristics of timbre are unknown. I also said I didn't think it really mattered much that we couldn't measure in terms of audio (speakers) evaluation.

My guess is, knowing the precise capabilities of what they are able to get a hydrophone to measure on timbre/acoustic signatures, that if that could be carried over to audio it could lead to those missing variables that Dr. Toole has mentioned to get us from 86% closer to 99%, or might lead distortion measurements of loudspeaker from being "crude in the extreme" to something more meaningful.
 
I'm sure there is software to identify who is or is not speaking, when it has a sample database of that person with which to compare.
You would think, but not so far. At least not in court.
 
For example, they have been working on "voice prints" for 50 years or more. However, no court allows an expert, FBI or otherwise, to come into court and say: "I have the 911 call recording of the anonymous caller, we compared that with a sample we obtained from the defendant, ran them through our program, and they are a match."
I think we commented past each other and my previous comment addresses a lot, but, are you sure?


the majority of the courts which have considered the question have ruled that voiceprint evidence is admissible. See United States v. Smith, 869 F.2d 348, 351 (7th Cir. 1989); United States v. Williams, 583 F.2d 1194 (2d Cir. 1978), cert. denied 439 U.S. 1117 (1979).

And this was with tech from 30+ years ago. You can buy voice print-based authentication off the shelf. There are a ton of vendors offering it. We are way beyond self-proclaimed experts eyeballing spectrograms.


My guess is, knowing the precise capabilities of what they are able to get a hydrophone to measure on timbre/acoustic signatures, that if that could be carried over to audio it could lead to those missing variables that Dr. Toole has mentioned to get us from 86% closer to 99%,

I think these are actually two different approaches to two different things.

Detecting the sonic signature of something using a machine relies on whatever quantities that are different between samples and that the machine is most sensitive to. This may or may not have anything to do with how people hear.

What Toole is talking about is identifying quantities that humans are most sensitive to, in the context of audio reproduction.

As I read it, Toole is talking about drilling down into specific modes of distortion that may be less easy to measure or that haven't been studied as extensively with experiments using human listeners.

I guess a more detailed machine analysis of Timbre could reveal things to focus on in those experiments, but again, only using human input on what they're hearing.
 
Last edited:
There's no way to measure Timbre directly, and never will be, because it's a word we use to describe the subjective experience of many variables at once.
It is no different than wanting to measure flavor directly.
You would be measuring the acoustic signature of a voice or an instrument. In human perception, we call it timbre, but scientifically, it's an acoustic signature that is capable of being measured directly, in an accurate, and repeatable way. Only because in certain areas the need was pressing enough and there was enough money available to apply. The Univeristy of Texas gets over $100 million a year from the US Navy to study "acoustics" related to "national security." If Dr. Toole (and Dr. Olive) had that kind of budget at the NRC or at Harman I'm sure he would have been able to take distortion measurements from being "crude in the extreme" to something more useful as well as identified more variables.

I have some doubts about whether the ability to produce precise acoustic signatures from identified timbre characteristics will end up being something useful regarding speaker measurements, and ultimately preference and quality of a loudspeaker. When was the last time you heard someone put on Frank Sinatra and you said "I can't tell if that's Frank or Dino?" Doesn't matter if it's TV speaker, a 6" oval dash-mounted speaker in a '66 Buick, I have never experienced that. However, a better speaker does help determine who is taking the lead when Luciano Pavarotti, Placido Domingo and Jose Carreras are singing. as "The Three Tenors" so maybe that degree of resolution of an acoustic signature would lead to something? I don't know.

I'm quite confident that Harry Olsen ran into opposition when the "state of the art" (circa 1946) was that speakers should have a high frequency cut of 5 kHz. He suspected it was something else going on and when he was given the RCA lab to run he was able to figure it out. It wasn't that they preferred a high frequency cut off, they just didn't like distortion and 5000 hZ made it more tolerable. (If he added distortion to full-range music the audience went back to preferring the cutoff). That launched the pursuit of "high fidelity" and the incentive of audio pioneers to invest capital into finding it.

It's sort of like an "unknown known." (I think the well-known quote was about the purpose of an agency was to evaluate "unknown knowns", or "the things you think you know, that it turns out you did not" or something close to that).

I think to sum up where I'm am at is that the measurements of loudspeakers (audio) has followed a path consistent with all scientific and engineering pursuits that follow the scientific method. It continues to more forward, sometimes in leaps and bounds. It uncovers dogma that was just plane wrong (listen to Dr. Toole describe what happend when he told the engineers in the lab at JBL that he wanted off axis speaker measurements). Just when you think it is "known" and capable of measurement is out there something comes along to either create a paradigm shift (less likely now) or move things ahead a step or two. It likewise has tracks with aspects of a scientific discipline create improvement in others (measurements and studies of hearing aids over to loudspeakers, voice intelligibility in movie theaters to many aspects of audio, Pro-cinema in home cinema, as examples).

I do understand that those of us in the "objective" camp do run into the tired and misused response of "not everything can be measured.", I'm not saying that the fact that the characteristics of timbre are not identified sufficiently to allow direct measurement of timbre sufficient to produce useful acoustic signatures diminishes in any what science has brought us to on the general characertiscts of a "good sounding" loudspeaker, in almost any room. It doesn't.
Machines can also recognize the babys mothers voice, so timbre can be measured. FFT.
I wasn't aware of this. Last I heard, voice biometric identification was a good ways off. So that is taking a known sample, when you open a bank account for example, and then you call up and you are asked to identify yourself and you speak what the prompt tells you to say and the computer says, "yep, this is the dude, talk to him." I haven't seen it, they have been wanting that for quite some time, it means they are pretty close then, if not there already. How long it will take for that to filter down who knows.
 
I wasn't aware of this. Last I heard, voice biometric identification was a good ways off. So that is taking a known sample, when you open a bank account for example, and then you call up and you are asked to identify yourself and you speak what the prompt tells you to say and the computer says, "yep, this is the dude, talk to him." I haven't seen it, they have been wanting that for quite some time, it means they are pretty close then, if not there already. How long it will take for that to filter down who knows.
Banks are using the technology now.
 
I think we commented past each other and my previous comment addresses a lot, but, are you sure?
Yeah I'm sure. What you cited was the DOJ position until the National Academy of Sciences report came out (2009?) which confirmed that it was junk science. As a result of the report, new standards were issued. If you want to take the deep dive into it I can post all of the reports, etc. Attached is the current draft standard and below is the significant language about how trashy this all was. The report, and quote below, are from 2023. To get a voice identification in now you have to be a certified and accredited in phonetics. You have to use both a spectogram and (wait for it) listen to the output and, based on education, training and experience, be able to discuss the difference in vowel sound pronunciation and other characteristics and give an opinion on whether it is same speaker or not. However, courts will not let that expert say "its a match." They now limit the opinion to "inconclusive, cannot be excluded as the speaker, or excluded as the speaker."

ASR, also mentioned in the standard, is Automated Speech Recognition and gives a numerical score on the probability of a match. As far as I know, it hasn't been admitted anywhere, it is not admissible in Texas.

The science is there to do it, and I'm sure the private sector if they have figured out a way to monetize, they have put the money in it to do so for voice identification. The was an article in Science (the world's premier peer-reviewed publication) related to the study and identification of the timbral properties of certain species of wood in order to identify which woods are better for musical instruments, and, to identify specific wood blanks as having superior timbral qualities for musical instruments. What they could tell you about wood was quite extraordinary.

From the Attached proposed standard (emphasis in bold in mine, the number refer to line numbers it he original).

5.3.1 The so-called "voiceprints" were the product of sound spectrography, a technology 85 carried forward even to the present day, which is still of great utility to speech scientists. 86 The most notable scientific failing of the voiceprint method was that it did not provide 87 examiners with the vocal output (i.e., the audio). This inevitably obscured the phonetic 88 nature of the patterns of acoustic energy and reduced the analysis to a simple pattern-89 matching exercise. Nevertheless, that exercise was initially heralded with outsized claims 90 of success. The article that introduced the voiceprinting method in 1962 reported that 91 phonetically naïve examiners3 were able to identify a target voice with 99% accuracy, 92 even from a pool of a dozen speakers. Not surprisingly, this new methodology soon caught 93 the attention of law enforcement, and was presented as evidence in a number of criminal 94 prosecutions, in the US and elsewhere. 95 96 5.3.2 However, the scientific community remained skeptical. Well-known phoneticians 97 such as Peter Ladefoged and Harry Hollien reported that mere pattern matching (which is 98 all that the young voiceprint examiners were asked to do) was incapable of yielding the 99 astonishing results reported in the 1962 article. Due to the variability present in speech 100 productions from sample to sample, spectrographic template matching is not effective and 101 it is inconsistent in speaker recognition work. In time, phoneticians began to provide 102 expert testimony against the admissibility of voiceprint evidence, and in consequence, a 103 number of lower-court convictions were eventually overturned. In response to these 104 criticisms, another academic linguist, Oscar Tosi, initiated a more rigorous, and 105 procedurally transparent, voiceprint study (Tosi et al. 1972). This yielded less vertiginous, 106 but more scientifically reliable results – 6% false identification errors and 13% false 107 elimination errors, under laboratory conditions. Still, given the high stakes of introducing 108 a still largely unsupported procedure into courts of law, a report issued by the National 109 Research Council4 concluded that the voiceprint method lacked an adequate scientific 110 basis for estimating reliability in many practical situations, pointing out in addition that 111 laboratory evaluations of the voiceprint method showed increasing errors as the conditions 112 for evaluation moved toward real-life situations, such as poor signal-to-noise ratios and 113 dissimilar recording conditions. 114 115 5.3.3 The Federal Rules of Evidence, adopted in 1975, further challenged the voiceprint 116 methodology by shifting the standards for admissibility in favor of practitioners whose 117 "scientific, technical, and other specialized knowledge" can help the trier of fact "to 118 understand the evidence or to determine a fact in issue"5 Phonetically untrained voiceprint 119 examiners, who sought to identify speakers simply by looking at pictures of their voice 120 signals, were left at a marked disadvantage.
 

Attachments

Last edited:
Incidentally, this has been common practice in PA technology for a long time.
Really glad you mentioned this.

It was the standard, not just the practice, for PA and Professional Cinema since at least 1982, all the major players, JBL, included, came with polar plots if they were capable of producing them. Yet when Dr. Toole started at JBL/Harman he had a tough fight to get the consumer audio people to measure off-axis. (It's a really great story he tells about this). They were "on axis +/- 3dB" is all we need.

Controlled-consistent directivity as an important vs. unimportant factor in speaker design has been discussed by AES for decades. Paul Klipsch said it was an essential factor and published on it in the 70s. Directivity Index of certain values was a specified requirement of Holman to have professional cinema speakers certified as THX at least as far back as the late 80s. THX specifications are proprietary so not readily out there to see. -This was what I was referring to in a post up above, where in most scientific endeavors in a discipline there are advancements that carry over from one branch to another. There are no standards for DI in consumer speakers, currently, just "smooth." Holman/THX is the only one I'm award of who created minimum (or max, depending on who you look at it) specs for DI.

There was a fair amount of disagreement at the time Holman came up with those specs about how important directivity was. Still, today, the official position of AES on directivity is that more study is needed on this to come up with precise conclusions dependant upon the listening environment. There is a special technical group that deals with this. https://aes2.org/audio-topics/preferred-loudspeaker-directivity-2/ has the current position and lists 4 or 5 papers in chronological order on the subject.

Like all science, as much as we know on certain things, there is still room for refinement.
 
I think these are actually two different approaches to two different things.

Detecting the sonic signature of something using a machine relies on whatever quantities that are different between samples and that the machine is most sensitive to. This may or may not have anything to do with how people hear.

What Toole is talking about is identifying quantities that humans are most sensitive to, in the context of audio reproduction.

As I read it, Toole is talking about drilling down into specific modes of distortion that may be less easy to measure or that haven't been studied as extensively with experiments using human listeners.

I guess a more detailed machine analysis of Timbre could reveal things to focus on in those experiments, but again, only using human input on what they're hearing.
Agree with all of that. And, as I said, even if a device, machine, or processing software could do it with 100% accuracy, there are admittedly several steps on how to take that to the next step to where you could use it in some practical way to determine if it can reveal something that results in a better/improved predictor of preference.

Part of me says you can pretty much identify any singer you are familiar with right off the bat, no matter how good or bad the speaker/room measures so it will make no difference. On the other hand, subjectively, what makes a high-scoring speaker sound better is the resolution. You can hear, recording dependent, the instruments, vocals better (or as least that is the subjective result of objectively better measuring speakers). So are we also catching more timbral aspects? If so, is that a dependent variable of FR and DI? If it is then it will not make a dime's worth of difference on speaker measurements. If it's an independent variable and you can add or subtract the "something" that been identified during blind testing, then you will know if it objectively sounds better. In the same way they can add, or subtract, distortion, or group delay in blind testing.

Maybe it's been done is some aspect, I'm not aware of it. Maybe there will be more in the 4th Edition. More on timbre with every edition is my recollection.
 
And, as raised earlier, there is the question of strict accuracy of timbre to the original sound, apart from any preference effects. Nonetheless, the *signal* that alters timbre perceptions lies within frequency and amplitude as reproduced over time. Right?

It seems to me that signal accuracy necessary to timbre accuracy within the signal chain from mic input to speaker terminal should be covered in the standard kinds of distortion measurements. Transducers have much larger inaccuracies within which a timbre-altering signal distortion might lurk.

Disagree?
No, agree completely, wish you posted it way earlier :). It's why Frank Sinatra sounds like Frank and not Dino no matter what you hear it on. The transistor radio I got for Christmas in '66 with the hard white plastic ear piece with a 1" speaker, or the single 4" oval dash speaker in the New Yorker, or the Ampex "Hi-Fi" console, or the 3" speaker in the TV. Voice timbre seems to be unaffected by anything.

The transducers may make timbre, voice or instrument, a dependent variable in terms of objective sound quality and thus a dead-end.

I need to go back and review Toole's 3rd edition and search for everything he mentions on timbre/timbral, I know he has written on it dut in a different context. He may have already covered it, but I don't think so.
 
Rick "no way any speaker can propagate sound in a room like a tuba" Denney
I would pay money to hear that. Especially if there was a blind testing of a tuba vs. Sousaphone at the end.
 
FFTs are absolutely essential to voice recognition, and what would allow an algorithm to be able to discern whether an input sound matches some expected characteristic like overtone/harmonic fingerprint (voiceprint?). i agree the user could have spelled out a little more what they have to do with one another. what makes tone/timbre/voice recognition difficult is knowing how to interpret the captured information, the relatively straightforward part is the capturing. and the first step in interpreting it is almost always going involve running an FFT.

Actually, most voice recognizer systems are built using various methods that capture the formants of the voice, often using LPC and then a variety of much more advanced techniques based on an estimate of both the smoothed spectrum of the voice and the changes in the smoothed spectrum. Now, it's been a few years since I was involved in this sort of thing, but the advances by and large are not in the original analysis, but rather in the later processing of autocorrelations,cepstrums, partial loudnesses, etc.
 
Again, Its possible to go from time domain to frequency domain with Fourier and loose nothing no matter what the signal is. It dosnt have to be steady state. look up the FT of an impulse or step, there used in engineering as often as sines. The video was supposed to explain that. What part of that drawing is steady state?
How its done below. Go to 28 minutes to see how sines add up to give a signal thats random numbers, actually looks like it could be music. Yes the transforms are periodic but the period can be 100 years if you want.
Any signal can be broken down into its even or odd frequency components, no matter the length of the signal. The more samples or longer time window you use, the more accurate the frequency resolution. This also goes for Cosine Transforms, Sine Transforms, or with more difficulty, Laplace Transforms. In fact, the choice of tight frames (ortho-normal,1:1 and onto, etc) is pretty much infinite. You can use the KLT, which does the best possible diagonalization via calculating the actual eigenstates of the actual signal, even, although that use is often limited by the need to keep those eigenstates around in order to understand what you just did. :)

HOWEVER, the way to capture timbre is probably not to capture the entire ring cycle in one huge transform, but rather capture it as a stream of overlapped short-term spectral analysis using some sort or other of window. It is these short-term features, which need to be analyzed up to perhaps 200 milliseconds for very low frequencies, and a few milliseconds for the highest frequencies. (The time window of human hearing is not symmetric, it's close-ish to minimum phase, there are IIR parts, ergo what the "window" is is not entirely simple to explain.)

It is this kind of cochlear analysis which will provide some good sense of timbre, including attacks, impulsive signals, and the like, but I agree that a signal absolutely does not have to have stationary statistics in order to take a Fourier Transform of it. According to Gauss, all that's necessary is that it's finite energy/finite length for the Fourier Integral to converge properly. For FFT/discrete time systems, the Fourier Transform (most often done via FFT, of course) is an invertible algebraic transform, full stop.
 
It's why Frank Sinatra sounds like Frank and not Dino no matter what you hear it on.
Hi Travis,
if I'm not wrong, the tonal differences between Sinatra and Dean Martin are characterized by:

- harmonics

- formants

- Envelope

- Vibrato and modulation

- Phrasing and dynamics

A combination of spectral and time frequency analysis should make these differences objectively measurable.

If you like, you can read and experiment a little on this site with the open source from Amsterdam University >>> Praat.



 
Last edited:
Back
Top Bottom