• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Schiit's Jason Stoddard on blind testing

M00ndancer

Addicted to Fun and Learning
Forum Donor
Joined
Feb 4, 2019
Messages
719
Likes
728
Location
Sweden
The paper is mostly a demand for better mindfulness and higher quality research from an industry that tends to value extremely specialized knowledge and an economic small-minded sort of practicality. An engineer is more likely to be able to quote you Newton's laws of motion than to have read any of his writings (@andreasmaaan this is what I meant before when I said that textbooks don't help knowledge—most rip out the idea, formula, fact from the context of its invention, and present it as is, without any acknowledgement of what it took to come upon it—sorry I didn't reply before).

 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,403
The paper is mostly a demand for better mindfulness and higher quality research from an industry that tends to value extremely specialized knowledge and an economic small-minded sort of practicality. An engineer is more likely to be able to quote you Newton's laws of motion than to have read any of his writings (@andreasmaaan this is what I meant before when I said that textbooks don't help knowledge—most rip out the idea, formula, fact from the context of its invention, and present it as is, without any acknowledgement of what it took to come upon it—sorry I didn't reply before).

Amazing post again @pozz, thanks.

On the side point about textbooks, I completely agree, although IMO this is a criticism of how many textbooks are written, not of the format itself. Better textbooks not only give context for the information they contain but also summarise ongoing debates, problems and controversies within the field. (Perhaps this tends to be less common in some fields than in others...)

PS would you consider copy/pasting your thoughts on the paper in the original thread in which it was discussed? Best response to it I’ve seen by far.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
The paper really doesn't explain anything of the sort. It needs to be read carefully. Foremost it's a conceptual piece and manifesto-ish, and provides no direct experimental evidence. The way it presents long and short listening session is also not intuitive, nor does it say that all these sessions let one hear all kinds of heard differences equally.
View attachment 103974
I highlight this passage: "slow listening could take as long as it would take for the subject to learn a new language, maybe more." I will come back to this claim later. For now, let's add some substance to it. Earlier on, there's a sort of industry criticism:
View attachment 103975
The claim around "anyone interested in sound today" is weak, or at least poorly written. I could generally agree, allowing a lot of slack, since the cognitive aspects of hearing are well-known, as is the increasing standard for "life-like sound" when looking at historical commentary on various recording and reproduction technologies (a little technological leap gets people saying that the music is there with them in the room or the recording has come to life, and later on, after the enthusiasm fades, deficiencies are better appreciated). That's not the same thing as claiming that psychoacoustic encoding is ineffective, or that older codecs are worse than newer ones. This is where the paper should have quoted or looked for research on this issue specifically. It's too bad it didn't.

Regardless, let's take the mention of the 400ms grey zone mention at the end of that passage and find a more detailed description:
View attachment 103976
"Phoneme discrimination" comes from linguistics. A phoneme is a unit of uttered language, kind of like a syllable, but related to speaking and hearing rather than written or grammatical language. The difficulty described is like learning to recognize the tonal components of languages like Vietnamese, or to hear the differences in pronunciation of various vowels across regions and accents. One of the standard fields in linguistics is the physiology of language and which parts of the mouth, tongue, throat and nose are used during speech (this research was used to establish minimum standards for intelligibility in communications systems in the 1940s and earlier by Fletcher, e.g., the frequency content of speech). This then used to represent acoustic differences using notation like IPA and technical vocabulary.

Cognitive linguistics steps a bit further back and notes that acoustic differences of vocalization are often not enough to help listeners recognize and pick out phonemes (from the linked paper: "Listeners, who are misinformed about a speaker’s (socio-)linguistic background, are more inclined to perceive the incoming stimulus according to their sociolinguistic expectations than to the acoustic characteristics of the stimulus."). Listeners often need context, like what to listen for, or where the speaker is coming from. It's like trying to understand the accent, grammar and vocabulary of Jamaican or Scottish English if you hail from elsewhere. It's not easy, but after a while in the country it becomes second nature. Same goes for sound and music.

Let's come back to this statement: "Slow listening could take as long as it would take for the subject to learn a new language, maybe more." So, all in all, the paper's emphasis is on learning, the idea that certain perceptions may not accessible to a listener immediately, but may be easily recognizable later. That's straight out of psychology (note that psychoacoustics is considered a branch of psychology, not its own field), and well-designed experiments record not only subject responses but subject responses over long periods. (My favorite study on memory, by Luria, took place over 30 years!) Note that the timescale is not defined beyond making this general claim.

As such the paper really doesn't focus on long listening sessions per se, but on what it has taken, historically speaking, to recognize what are now known as commonplace problems in audio. There is really no basis for concluding that the long term review and impressions-type publications are in the right, or have any validity beyond the accidental or circumstantial. That this paper is used to defend those kinds of uncontrolled listening comparisons is simple misreading.

This is the paper's conclusion:
View attachment 103992
View attachment 103993
The final sentence is the key. It says: don't take shortcuts in recording, reproduction or manufacturing technologies based on a simple idea of psychoacoustics, like accepting lenient standards for lossy compression or distortion or loudspeaker design. The implication that there are potential issues and differences between gear that we are not completely aware of is a pretty fair conclusion. But note that it does not support or anywhere say that those who are claiming to hear differences are in the right. All it says is not to take the easy way out when it is possible to do better, especially if the current research does not have all the answers about what is acceptable or what isn't. Note this, for example:
View attachment 104004
Seems to be pretty clear cut. Using new knowledge, rigorously test existing industry standards and see if they hold up to the science. If you have to use short listening tests, make sure that:
View attachment 104007
Which means that, as a manufacturer or researcher, you can't conclude that your listeners' reports are reliable until you take their frame of reference and abilities into account, and how you might bias the results by having too narrow a focus when designing the experiment.

The paper is mostly a demand for better mindfulness and higher quality research from an industry that tends to value extremely specialized knowledge and an economic small-minded sort of practicality. An engineer is more likely to be able to quote you Newton's laws of motion than to have read any of his writings (@andreasmaaan this is what I meant before when I said that textbooks don't help knowledge—most rip out the idea, formula, fact from the context of its invention, and present it as is, without any acknowledgement of what it took to come upon it—sorry I didn't reply before).

My first read promted these few random layman thoughts.

Harman's research seems to indicate that "simpler" music (f.e. Fast Car) with few instruments is the most effective programme for discriminating differences in AB comparisons (chapter 3).
Does that mean that "complex" music (f.e. orchestral) has no use for assessing (particular) playback deficiencies?
What information does "simpler" music provide in regard to how playback handles (the requirements of) complex music?
It also appears as though listeners were trained to focus predominantly (perhaps exclusively) on frequency response errors. If this assumption is correct then perhaps one may question the effectiveness of those listening sessions and the programme used in assessing other issues (f.e. the effects of intermodulation distortion).

It is likely that I didn't look hard enough but I've never come across mention of any research which compares lab listening to domestic listening, nor have I seen any research addressing the role of familiarity on the subject of identifying playback issues.
In my view and anecdotal experience, long-term equipment assessment in one's familiar listening environment with one's own system and music may reveal differences which are not obvious in AB comparisons in unfamiliar conditions.
Long-term assessment relies on familiarity, references against which to compare any potential changes produced by the introduction of a variable in the system. This by the way is the reason why one should never assess two different new pieces of equipment / variables simultaneously.

I looks to me as though this might be what Genelec's thought-provoking piece is trying to address.

Time for another re-read.
 

Wombat

Master Contributor
Joined
Nov 5, 2017
Messages
6,722
Likes
6,463
Location
Australia
Academic stuff.

Far removed from the everyday, real world, home listener.
 

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
PS would you consider copy/pasting your thoughts on the paper in the original thread in which it was discussed? Best response to it I’ve seen by far.
There was a thread on the slow listening paper? Must have missed it.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,701
Likes
37,441
My first read promted these few random layman thoughts.

Harman's research seems to indicate that "simpler" music (f.e. Fast Car) with few instruments is the most effective programme for discriminating differences in AB comparisons (chapter 3).
Does that mean that "complex" music (f.e. orchestral) has no use for assessing (particular) playback deficiencies?
What information does "simpler" music provide in regard to how playback handles (the requirements of) complex music?
It also appears as though listeners were trained to focus predominantly (perhaps exclusively) on frequency response errors. If this assumption is correct then perhaps one may question the effectiveness of those listening sessions and the programme used in assessing other issues (f.e. the effects of intermodulation distortion).

It is likely that I didn't look hard enough but I've never come across mention of any research which compares lab listening to domestic listening, nor have I seen any research addressing the role of familiarity on the subject of identifying playback issues.
In my view and anecdotal experience, long-term equipment assessment in one's familiar listening environment with one's own system and music may reveal differences which are not obvious in AB comparisons in unfamiliar conditions.
Long-term assessment relies on familiarity, references against which to compare any potential changes produced by the introduction of a variable in the system. This by the way is the reason why one should never assess two different new pieces of equipment / variables simultaneously.

I looks to me as though this might be what Genelec's thought-provoking piece is trying to address.

Time for another re-read.
The reason simpler music is better is because our hearing becomes so deficient with more complex material.

Somewhat related is multi-channel. Mono is the way to best discriminate between speakers. Stereo is a bit less good. Multi-channel is much less good. Maybe with enough channels quality isn't much of an issue. And maybe stereo really is some kind of special artificial way to best appreciate music. Mono doesn't give us enough, stereo gets us lots, and as an musical fashion statement maybe it is its own best art of sound for human consumption of recorded music. While an illusion it gives us spatial perception than mono cannot, and while not accurate it allows us to hear deep into the quality of the recording in a way multi-channel obscures.
 

Wombat

Master Contributor
Joined
Nov 5, 2017
Messages
6,722
Likes
6,463
Location
Australia
How much can our auditory system process beyond short-term survival response? I find concentrated listening to be short-lived and mostly unrelated to overall enjoyment of content or good gear.
 

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
How much can our auditory system process beyond short-term survival response? I find concentrated listening to be short-lived and mostly unrelated to overall enjoyment of content or good gear.
I find I top out at about an hour of concentrated listening. I live for that hour, though. Most listening is casual and pleasant, I guess, and it's nice to have good gear for it, but that hour is just wow whenever I have the time for it.
 

anmpr1

Major Contributor
Forum Donor
Joined
Oct 11, 2018
Messages
3,739
Likes
6,449
The reason simpler music is better is because our hearing becomes so deficient with more complex material.
One problem with casual 'listening tests' is that nothing can be repeated consistently. This was David Hafler's criticism of what Gordon Holt was doing, years ago, during the '60s. Gordon ranked Dave's amp high subjectively, but was subsequently admonished for it by Hafler. I don't think Holt was expecting that, or perhaps even understood what David was telling him.

In the mid '70s Bob Carver and some Stereo Review editors ran a controlled test for the audibility of crossover distortion, injecting set values into test tones and simpler music. Gordon (who to be fair was one of the least tweaky of the tweaks) went off on Bob and SR stating that 'everyone knows' that real music is much more susceptible to the effects of distortion, even at low percentages. Therefore, what can test tones tell anyone about how an amp handles complex signals? But Gordon never asked himself, or anyone else as far as I could tell, how it was that he 'knew' that.

Later, when the (in)famous ABX device was first marketed and making some waves Gordon wrote a Stereophile editorial happily announcing that finally it will be proven that the 'golden ears' were right. He assured readers that once they can test with the device they will show the world how they really hear what they claim to hear. However, once the device arrived in Santa Fe, and once they found out that nothing could be proven it didn't work, the usual suspects came up with all the 'arguments' against DBT, arguments that are still common in the tweako press and Internet.

Anent this particular thread, one of their 'arguments' was to posit an intrinsic methodological difference between expected results from both 'short term' and 'long term' listening; the corollary being that you cannot 'slice' apart a musical phrase (the ABX protocol) and then consistently expect anyone to identify differences (that were, in their view, obviously present) --because the test itself somehow destroys the 'holistic' musical experience, thus invalidating the test. The more inventive of these 'theorists' invoked the name of Werner Heisenberg in order to explain their confusion theory. Of course such thinking was both magical and an example of question begging.

Cognitive linguistics steps a bit further back and notes that acoustic differences of vocalization are often not enough to help listeners recognize and pick out phonemes ...Listeners, who are misinformed about a speaker’s (socio-)linguistic background, are more inclined to perceive the incoming stimulus according to their sociolinguistic expectations than to the acoustic characteristics of the stimulus."). Listeners often need context, like what to listen for, or where the speaker is coming from. It's like trying to understand the accent, grammar and vocabulary of Jamaican or Scottish English if you hail from elsewhere. It's not easy, but after a while in the country it becomes second nature. Same goes for sound and music.

Although your 'speakers' were not referencing loudspeakers, it is possible to extrapolate. One 'problem' with speaker reviews is their general short term approach. One has a set loudspeaker used as a reference, and when inserting another of different design/brand into the chain, sonic differences will be pretty obvious. However, like understanding a foreign language (which often requires context, especially with a tonal language, or a language with a lot of homophones), once basic familiarity is established with the new loudspeaker one will begin to hear it 'normally'. That is, one will generally adapt to the new loudspeaker's sonic peculiarities.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
The reason simpler music is better is because our hearing becomes so deficient with more complex material.

I don't disagree with this. What I am suggesting is that "simpler" music although better for AB which relies on memory may not be effective/adequate for revealing some issues.

Somewhat related is multi-channel. Mono is the way to best discriminate between speakers. Stereo is a bit less good. Multi-channel is much less good. Maybe with enough channels quality isn't much of an issue. And maybe stereo really is some kind of special artificial way to best appreciate music. Mono doesn't give us enough, stereo gets us lots, and as an musical fashion statement maybe it is its own best art of sound for human consumption of recorded music. While an illusion it gives us spatial perception than mono cannot, and while not accurate it allows us to hear deep into the quality of the recording in a way multi-channel obscures.

I don't disagree with this either. Stereo listening is in my view indispensable for evaluating speaker preference and stereo-related effects but less good than mono for assessing speaker-related issues (it even creates issues).
This is not new. In the '70s the BBC Research Department was already performing blind listening tests in mono, sometimes in their anechoic chamber.
They also rolled out stereo pair test samples to sound engineers for long-term listening.
I've taken the folowing extract from one of their white papers:

9 LOUDSPEAKER EVALUATION

9.1 introduction
The obvious and definitive means of evaluating a loudspeaker is of course by listening to it.
An expert listener auditioning known programme material can learn a great deal from a listening test.
If all of the sound balancers who use a particular loudspeaker declare it to be excellent, then by definition it is excellent.
In the author's experience at least, such universal approbation is rare.
Although a group of users in an organisation like the BBC usually show remarkable accord in their evaluations, they tend to use adjectives like 'woolly, ‘hard’, or ‘chesty’, and nouns like 'honk', 'quack', or ‘lisp’.
One can often hear what they refer to, but such quirks can rarely be identified by objective measurement, and are very poor guides indeed to any design modifications that might effect significant tonal improvements.
(Very rarely, complimentary expressions like 'clean' or 'uncoloured' are applied; perhaps one reason for the rarity of these is that a perfect loudspeaker should presumably have no perceptible characteristics of its own.)


What is required, of course, is a well-defined relationship between subjective peculiarities, measurable deviations from 'ideal' acoustic output, and oddities in physical behaviour.
A 'dreadful quack at 800 Hz' should be confirmed by a disturbance in the otherwise serene acoustic time-frequency-acceptability plot, and by an agonised writhing at 800 Hz to disturb the otherwise exemplary piston-like movement of the diaphragm.


Reality is otherwise.
'Good' loudspeaker drive units appear to exhibit just as complex mechanical and acoustic behaviour as 'bad' ones.
The author is currently engaged in a project to try to find some relationship between the subjective, acoustic, and mechanical facets of loudspeaker behaviour.
This has been undertaken in the knowledge that previous attempts during four decades have not yielded a final solution.
Results (positive or negative) will be published in due course.
Two reference works only are listed relating to this subject, each includes an extensive bibliography.


9.2 Subjective evaluation
Experience shows that comparative judgements of loudspeaker quality can be made more consistently than absolute ones.
An absolute assessment of a new design is something which emerges gradually out of weeks or months of use in control rooms.
Often, a pair of new loudspeakers sent out for 'field trial' will be received with cautious approval, yet returned after a month or two with a list of criticisms detailing points that have emerged only gradually from continuous use.
For comparative tests, a reference loudspeaker is of course needed.
This is provisionally selected during the early stages of commercial production as being a typical unit of acceptable quality; once production is well established, a new reference may be adopted as a clearer picture emerges of what is 'typical'.
In fact, at least three such units are selected in normal BBC practice, to provide a working standard for acceptance testing: a spare (which is carefully stored): and a standard by which the manufacturers can assess the consistency of their output, whether by listening or by measurement.
An established standard is also of course the only reasonable reference available in appraising a new design.


In listening tests, it is important that the listener should begin with as few preconceived ideas as possible.
For example, a look at a response plot may cause him, consciously or otherwise, to listen for some expected peculiarities. Normally, an A/B switch is provided, and the loudspeaker to be used as reference is indicated.
The loudspeakers are placed behind an acoustically transparent but optically opaque curtain, especially if any aspect of the units under test might be visually identifiable.
To help eliminate room effects, the test may be repeated with the loudspeaker positions interchanged.
If several units are to be tested, it is useful to include one twice — anonymously — to test the listener's consistency.
(Experienced listeners expect this.)


Finally, it is essential that the listener delivers his judgement before any additional information is given to him; not (one would trust) that he might 'cheat', but rather that he might re-interpret what he thought he had heard in the light of further knowledge.
Subsequent discussion may well prove valuable, but must be subsequent.


Formal tests involving a number of listeners may need further care, particularly if, as is likely, they permit less in the way of personal communication between subjects and test organiser.
Past experience suggests that a particular hazard is the use of descriptive terms whose meaning seems obvious to everyone, but which can actually mean different things to different people.
 
Last edited:

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
How much can our auditory system process beyond short-term survival response? I find concentrated listening to be short-lived and mostly unrelated to overall enjoyment of content or good gear.

I don't doubt that critical assessment and listening for pleasure are two different tasks with very distict goals and requirements.

I don't think that long-term listening is some sort of panacea, or that it is better to assess all aspects. Mono has advantages, as does AB and even pink noise for assessing particular aspects. But a given track may unexpectedly reveal an issue that would not be apparent with f.e. any of the material used in Harman's testing. It can be some nastyness when a violin hits a particular note or how much better you can hear the decay of a piano and the room ambience that was unapparent in a certain recording which you've listened to many times before. And unlike AB sessions you are most likely to be relaxed and not focused on any particular aspect.
Many people use particular tracks to assess soundstage effects but I don't as I find that soundstage is a byproduct and will be best triggered by the absence of gross issues.
 
Last edited:

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
Does that mean that "complex" music (f.e. orchestral) has no use for assessing (particular) playback deficiencies?
Pretty sure they used classical music too. I can't remember what Toole said exactly. Something like, sparse tracks aren't that good generally because, just probabilistically speaking, they may not excite the problem areas of speakers. He gave an anecdote where he was invited to listen to a prototype (IIRC by another company) that was subject to a lot of internal argument, some saying it sounded good, others bad, and the measurements (which Toole said were low-resolution) showed nothing in particular of issue. He found that only one particular note on one particular female vocal track excited a resonance. When it became obvious, they remeasured the speaker according his requirements and found it plain as day. So program material definitely matters. I think the conclusion was that for effective testing you have to use a variety of genres and spectral densities.

In multichannel blind testing done by Francis Rumsey (non-Harman), the most important quality that listeners (from pros to lays) tended to use for their judgments was timbre, regardless of the program material.
It is likely that I didn't look hard enough but I've never come across mention of any research which compares lab listening to domestic listening, nor have I seen any research addressing the role of familiarity on the subject of identifying playback issues.
"Comparing lab listening to domestic listening": Well, unless you want to get into subject priming (which is a constant topic in psychology), acoustic properties are dominant. Harman's listening room was based on research of living room FR and reverb, for example. Actually they explicitly criticized prior approaches that considered only anechoic or steady-state responses without considering ordinary listening circumstances and the strong role of reflections therein (all old topics of discussion here).

"Any research addressing the role of familiarity on the subject of identifying playback issues": The Harman stats gathered on listener experience and training vs. reliability of reports are just that. This isn't a Harman-only claim btw. It's pretty common to report about subject performance and ability. Here, for example: "The high performers [i.e., those who could discriminate the smallest differences] were less variable in their performance than the low performers." So performance variability is a definite metric that should be captured. Not just, you know, results.
In my view and anecdotal experience, long-term equipment assessment in one's familiar listening environment with one's own system and music may reveal differences which are not obvious in AB comparisons in unfamiliar conditions.
Long-term assessment relies on familiarity, references against which to compare any potential changes produced by the introduction of a variable in the system. This by the way is the reason why one should never assess two different new pieces of equipment / variables simultaneously.
As above, if you're trained and know what to listen for, you'll find the difference. There's no way you can control for everything, especially things like mood and setting, so you have to rely on your listeners.

Let's get back to the Genelec paper. The main thing to take into account is audience: it was written for audio professionals, not home listeners. Specifically, it's addressing itself to the existing testing and assessment methods, as put forward by standards for example. It does not reference Harman or explicitly criticize short-timescale ABX tests apart from noting that if your listeners do not know what to listen for, they will miss things. And it is very well possible that things can be missed, Lund's example being psychoacoustics codecs. In that sense it's not accurate to contrast Lund vs. Toole. They aren't at odds. In a related lecture Lund goes on to say that the tolerances between individual monitors which manufacturers struggle to achieve are swamped by room responses (and so Genelec offers GLM software and introduced the 8xx1 series of monitors to address the gross effects of directivity and room reflections). These examples venture far beyond the sorts of "electrical" component-level differences between nominally similar gear that obsess audiophiles and define the marketing lines for some manufacturers.

The main idea of his paper is to get working professionals to understand more of the background of perception and model their work and products such that they address the consumer's listening circumstances and eventual experience, his starting point being human physiology and the cognitive aspects underlying listening, such as the ability to listen to a whole auditory environment and zero in on a certain event or aspect at will, or that localization (perception of imaging/soundstage) takes, biologically speaking, a long time to consciously register and involves a lot of head movement (i.e., a partially inconsistent listening position and changing subjective frequency response). The criticism is that the industry as such has not produced products that address such basic physiological/psychoacoustic facts.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
"Comparing lab listening to domestic listening": Well, unless you want to get into subject priming (which is a constant topic in psychology), acoustic properties are dominant. Harman's listening room was based on research of living room FR and reverb, for example. Actually they explicitly criticized prior approaches that considered only anechoic or steady-state responses without considering ordinary listening circumstances and the strong role of reflections therein (all old topics of discussion here).

I can see the relevance of designing a room based on "based on research of living room FR and reverb" and "considering ordinary listening circumstances" for preference / tasting sessions but I don't see any particular merit in doing so if the goal is to assess speaker performance.
Pro audio listeners will probably feel less confident when assessing in a room that sound more like a domestic sitting room than a control room.

And even though Harman's room may have been modelled as much as possible on an average sitting room in the scope of long-term listening its acoustic properties are still unfamiliar and thus introduce a significant variable.

"Any research addressing the role of familiarity on the subject of identifying playback issues": The Harman stats gathered on listener experience and training vs. reliability of reports are just that. This isn't a Harman-only claim btw. It's pretty common to report about subject performance and ability. Here, for example: "The high performers [i.e., those who could discriminate the smallest differences] were less variable in their performance than the low performers." So performance variability is a definite metric that should be captured. Not just, you know, results.

Do you think that this refers to listener training or to familiar listening conditions (system, room, music)?

I agree that training is paramount but that is a different subject.

As above, if you're trained and know what to listen for, you'll find the difference. There's no way you can control for everything, especially things like mood and setting, so you have to rely on your listeners.

So you disagree that changing one variable in your system in your (domestic/sitting or studio/control) room with your usual test tracks will make differences more obvious than if you do the same with a completely unfamiliar system in an unfamiliar room with unfamiliar music?

Let's get back to the Genelec paper. The main thing to take into account is audience: it was written for audio professionals, not home listeners. Specifically, it's addressing itself to the existing testing and assessment methods, as put forward by standards for example. It does not reference Harman or explicitly criticize short-timescale ABX tests apart from noting that if your listeners do not know what to listen for, they will miss things. And it is very well possible that things can be missed, Lund's example being psychoacoustics codecs. In that sense it's not accurate to contrast Lund vs. Toole. They aren't at odds. In a related lecture Lund goes on to say that the tolerances between individual monitors which manufacturers struggle to achieve are swamped by room responses (and so Genelec offers GLM software and introduced the 8xx1 series of monitors to address the gross effects of directivity and room reflections). These examples venture far beyond the sorts of "electrical" component-level differences between nominally similar gear that obsess audiophiles and define the marketing lines for some manufacturers.

The ethos of audiophilia is to hop on to a lifelong journey of upgrades and to obsess about minuscule differences. The average audiophile is ignorant of the science involved and generally not trained to listen for issues and anomalies in spite of her/his often high listening acuity, he is usually preference-driven. That makes him is an easy prey and internet has made matters worse.
I have no knowledge of the pro audio world but I would expect most people to be less obsessed and better trained though perhaps not particularly more knowledgeable, though there'll surely be some audiophiles amongst them.

For most audiophiles listening is their sole means of assessing performance. Learning how to interpret measurements requires a level of investment that many are not prepared and/or willing to put forth.
Ignorance and the fact that good measurements don't often correlate with their preference and/or magazine reviews lead to mistrust in measurements. Alt-objectivists telling them that they are wrong, biased and that non-ABX'ed differences are hallucination doesn't help either.

It's been a while since I last read the piece, I need to re-read and confront it with your analysis, but this snippet taken from an entry in the Genelec blog does seem to refer to the importance of familiarity in long-term listening as well as the importance of training:

The popular English expression “acquired taste” characterises the way most sensing occurs after we reach the age of two, and later in life, we are so influenced by prior experience that unbiased judgment is no longer possible. Perceptual bandwidth, the speed by which we are able to sense, reduces during childhood. So as adults, we largely hear what we expect to hear and see what we expect to see. Sensory reach-out mechanisms are primarily used to probe the environment just enough to reassure ourselves that the outside world is going according to plan.

Great listeners have learned to live with those basic limitations of being human, and still make coherent judgments, by getting time on their side. We cannot evaluate a multitude of variables both accurately and quickly, but we can keep most of them constant and pre-checked, then use acquired reach-out skills to gauge the one or two in question. Active reach-out components of listening include substantial efferent pathways of the auditory nervous system (i.e. nerve fibers that carry signals away from the brain to our bodies) and lead to the brain's ability to tune the middle and inner ears in realtime over a range of 80 dB; and overt behaviours such as head and body movement. Like when learning a language, studies indicate that listening-training should ideally start at a young age. It is never too late, though learning takes longer as we age.

Excellent mastering engineers know their room and equipment intimately, they work at defined listening levels and are aware of taking breaks before fatigue becomes a factor. This creates an ideal setting for evaluating content systematically but, it may well have taken years to get there. A subjective evaluation of equipment also takes either plenty of time; or an introduction to the device under test in a well-known environment, acoustically and electrically.

https://www.genelec.com/-/professional-listening#
 
Last edited:

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
I can see the relevance of designing a room based on "based on research of living room FR and reverb" and "considering ordinary listening circumstances" for preference / tasting sessions but I don't see any particular merit in doing so if the goal is to assess speaker performance.
Kind of strange idea if you follow the acoustics. Speakers react wildly differently given room sizes and reverberant conditions. I don't follow why it doesn't make sense to design a room for speaker auditions based on measurements of home living rooms. How else should one do it?
Do you think that this refers to listener training or to familiar listening conditions (system, room, music)?
Listener training and familiarity with the task at hand. Remember that it's not a consumer evaluation, but the engineer's/designer's, the ones who are making the gear. The prevailing idea and conversations about speaker design specifically are still of abstract, decontextualized problems. The gear in itself without specific consideration of the system or chain. There has to be some research to bridge the gap between what the consumer hears at home, under very different and less-than-ideal circumstances, to what's being done in a factory whose conditions may or may not be lablike (i.e., pristine and organized) depending on budget, and which may only be able to carry out partially blind tests. If the psychoacoustics is well-understood and the problems clearly identified (e.g., through simulation) before the prototype is built and listening tests are required, then directing the course of later practical research and QC will be much cheaper and require less labour.
So you disagree that changing one variable in your system in your (domestic/sitting or studio/control) room with your usual test tracks will make differences more obvious than if you do the same with a completely unfamiliar system in an unfamiliar room with unfamiliar music?
My only issue is that you phrased it very vaguely. Which variable? I'm sure you don't mean that laboratory research into listening is somehow invalid because it doesn't take place at home (e.g., for stuff like localization ability or thresholds for distortion detection, which we like to discuss here). Am I testing a new DAC? If so, I'm less listening to music than to make sure there's no weird or obvious glitches or unexpected clipping as I'm running it since I already looked at the measurements:) Same goes for most other gear other than speakers. I'll just plug it in and go. Obvious problems will present themselves real fast (because I know the sound of clipping and other common stuff like hum, buzz, digital dropouts and so on). With speakers, I'll use tracks I know at home and so forth after doing some basic setup and measurement—I don't think that's a problem or even a point of contention. Rather, the situation is reversed. Think from the perspective of a manufacturer or designer: you are sending a product you know very well into many strangers' homes. They will be the ones to use it. Doubtless a few will make semiformal "assessments", but most will plug that monster in and let it rip. The goal is to ensure that they have an experience with no weird hiccups (electronics) or hear great sound (speakers, headphones). Arguably one of the main problems to date has been that so much of the engineering work to date has focused on driver designs and the like, which are minutia in comparison to speaker/room interactions, a deeply unsolved problem.
Alt-objectivists telling them that they are wrong, biased and that non-ABX'ed differences are hallucination doesn't help either.
I really dislike those jerky sorts of posts too. They aren't helpful in the least, for two obvious reasons (at least from my perspective): casual listeners aren't familiar with the techniques of blind testing, which are in no sense trivial, and have no idea what to listen for! It's much easier in the long run to have a discussion and counter the obvious points, after which hearing is magically clarified.
Excellent mastering engineers know their room and equipment intimately, they work at defined listening levels and are aware of taking breaks before fatigue becomes a factor. This creates an ideal setting for evaluating content systematically but, it may well have taken years to get there. A subjective evaluation of equipment also takes either plenty of time; or an introduction to the device under test in a well-known environment, acoustically and electrically.
Note that nowhere does Genelec mention assessments of equipment. They are talking about more serious problems like, in the paragraph I quote, listener fatigue (which, by the time you feel it, means that you have worked too much), and specific work-related tasks ("evaluating content", whether recording, mixing or mastering). This a distinct attempt to bring a new ethos to the industry. Knowing your room and system means not only that you can pick up new differences, but that you are aware what your room obscures, makes unclear, wrongly emphasizes and so forth. You have to be able to place correct judgments, including that you aren't in the right circumstance to make a judgment call at all.

You can generalize the Genelec position by saying that they are trying to massage listeners into greater perceptual awareness. That's certainly praiseworthy. There is a hell of a lot of good that can come from understanding your sensory abilities and their limits, as well as the physics of sound.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Kind of strange idea if you follow the acoustics. Speakers react wildly differently given room sizes and reverberant conditions. I don't follow why it doesn't make sense to design a room for speaker auditions based on measurements of home living rooms. How else should one do it?

From my understanding the shuffler room wasn't designed to sound like a listening room nor are the speakers or for that matter the listeners ideally placed. This is I belive where blind speaker performance assessment takes place.
There is another room which appears to be more adequate for tasting or preference sessions.
But again note that I was referring to Harman's home-like room vs. the listener's own sitting room or control room. Which one do you think that the listener will be more familiar with?

Listener training and familiarity with the task at hand. Remember that it's not a consumer evaluation, but the engineer's/designer's, the ones who are making the gear. The prevailing idea and conversations about speaker design specifically are still of abstract, decontextualized problems. The gear in itself without specific consideration of the system or chain. There has to be some research to bridge the gap between what the consumer hears at home, under very different and less-than-ideal circumstances, to what's being done in a factory whose conditions may or may not be lablike (i.e., pristine and organized) depending on budget, and which may only be able to carry out partially blind tests. If the psychoacoustics is well-understood and the problems clearly identified (e.g., through simulation) before the prototype is built and listening tests are required, then directing the course of later practical research and QC will be much cheaper and require less labour.

I mentioned familiar system, room and music - references -, you are talking about familiarity with the task at hand. The snippet I posted above seems to be referring to the former.
Why do you assume that he's (only) addressing designers/manufacturers?
Was Toole only addressing designers/manufacturers?

My only issue is that you phrased it very vaguely. Which variable? I'm sure you don't mean that laboratory research into listening is somehow invalid because it doesn't take place at home (e.g., for stuff like localization ability or thresholds for distortion detection, which we like to discuss here). Am I testing a new DAC? If so, I'm less listening to music than to make sure there's no weird or obvious glitches or unexpected clipping as I'm running it since I already looked at the measurements:) Same goes for most other gear other than speakers. I'll just plug it in and go. Obvious problems will present themselves real fast (because I know the sound of clipping and other common stuff like hum, buzz, digital dropouts and so on). With speakers, I'll use tracks I know at home and so forth after doing some basic setup and measurement—I don't think that's a problem or even a point of contention. Rather, the situation is reversed. Think from the perspective of a manufacturer or designer: you are sending a product you know very well into many strangers' homes. They will be the ones to use it. Doubtless a few will make semiformal "assessments", but most will plug that monster in and let it rip. The goal is to ensure that they have an experience with no weird hiccups (electronics) or hear great sound (speakers, headphones). Arguably one of the main problems to date has been that so much of the engineering work to date has focused on driver designs and the like, which are minutia in comparison to speaker/room interactions, a deeply unsolved problem.

Let us leave measurements aside for a moment. Not many audiophiles can interpret measurements and even less can perform them or have the gear to do so. besides, a lot of equipment doesn't get measured, here or elsewhere.

We are discussing listening assessment of performance (accuracy/audible issues).
We agree that training improves the efficacy of the listener.
ABX is the standard. Long-term listening can be a complementary method of evaluation for the reasons stated above.

Note that nowhere does Genelec mention assessments of equipment. Note that nowhere does Genelec mention assessments of equipment. They are talking about more serious problems like, in the paragraph I quote

The bit you just quoted ends like this:

A subjective evaluation of equipment also takes either plenty of time; or an introduction to the device under test in a well-known environment, acoustically and electrically.

Note that nowhere does Genelec mention assessments of equipment. They are talking about more serious problems like, in the paragraph I quote, listener fatigue (which, by the time you feel it, means that you have worked too much), and specific work-related tasks ("evaluating content", whether recording, mixing or mastering). This a distinct attempt to bring a new ethos to the industry. Knowing your room and system means not only that you can pick up new differences, but that you are aware what your room obscures, makes unclear, wrongly emphasizes and so forth. You have to be able to place correct judgments, including that you aren't in the right circumstance to make a judgment call at all.

You can generalize the Genelec position by saying that they are trying to massage listeners into greater perceptual awareness. That's certainly praiseworthy. There is a hell of a lot of good that can come from understanding your sensory abilities and their limits, as well as the physics of sound.

Most audiophiles use their ears to select equipment, not measurements. Perhaps there's a chance that better listening assessment methodology and programme and training would be more helpful for them than measurements.
ABX is not only complicated to perform adequately but its usefulness is also often categorically dismissed by audiophiles.

I don't think that Genelec are "trying to massage listeners into greater perceptual awareness" so much but trying to elevate the merits of long-term listening as a complementary approach to ABX which may potential reveal issues not readily apparent with the latter method.


P.S. I don't have to tell you that my knowledge of the subject is limited (you can easily find that out for yourself), but I can tell you that I do recognise that limitation
 

ShiZo

Addicted to Fun and Learning
Forum Donor
Joined
Sep 7, 2018
Messages
835
Likes
556
Indeed, there are certainly other people you can purchase from. I think our products stand for themselves. They are well designed, well executed, and currently are some of the best value for money. If you disagree or have zero interest in our products, I recommend you not buy them.

The first piece of audio equipment I ever bought was the jotunheim. It was so noisy I had to buy a hum x and ifi ipower to try to deal with it. It still was incredibly noisy and sounded terrible. I could literally hear the noise floor at less than half volume. It wasn't grounded either. If I touched the knob I would hear microphonics. I sent it in for a fix and got it back only slightly less noisy. I tried to give them the benefit of the doubt by sending it in and having them unable to fix it, I'll never buy from Schitt again.

After being disappointed I ended up finding this site. I then bought the original rme adi 2 dac and was blown away by how nice it sounded and how good it was at rejecting noise. No matter how loud I turned it up I could not hear the noise floor. All my money spent on the subjective camp had been wasted at that point.

I wish I would have found audioscience before schiit's marketing.

I just don't know how something that badly designed could have made it to market. But I will say the heresy was a good change of pace.
 

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
But again note that I was referring to Harman's home-like room vs. the listener's own sitting room or control room. Which one do you think that the listener will be more familiar with?
I think you're dodging the more pressing question. Obviously the listener will be more familiar with their own home. Why is Harman's approach incorrect? Or why will it produce results that can't be trusted?
I mentioned familiar system, room and music - references -, you are talking about familiarity with the task at hand. The snippet I posted above seems to be referring to the former.
Why do you assume that he's (only) addressing designers/manufacturers?
Was Toole only addressing designers/manufacturers?
If you read the paper or listen to Lund's talks you'll note that the consumer is brought up only a few time, and then only as a sort of goal. What he does address about openly and at length is that certain acoustic phenomena become consciously audible once you are familiar with them (like the point about linguistic phonemes). The link I quoted was to another psychoacoustic study altogether. I just wanted you to have a non-Harman source making the general point that listener reliability is different and that subjects have to be assessed that way as part of the research. It's great insight to for research to clearly state why a certain subject performs poorly, since that would suggest how to make someone perform better. Familiarity with listening circumstances, which you emphasize heavily, I think for personal reasons rather than reasons found in Lund's paper, is a smaller part of the equation than the recognition of audible phenomena and the work it takes to do so well.

Toole sometimes addressed manufacturers, sometimes consumers. His output is very large. His book specifically is meant both for professionals and anyone interested.
Let us leave measurements aside for a moment. Not many audiophiles can interpret measurements and even less can perform them or have the gear to do so. besides, a lot of equipment doesn't get measured, here or elsewhere.

We are discussing listening assessment of performance (accuracy/audible issues).
We agree that training improves the efficacy of the listener.
ABX is the standard. Long-term listening can be a complementary method of evaluation for the reasons stated above.
I think you're missing his specifics in favour of the general point. The paper is not written to support long listening sessions as methods of assessment in lieu of measurements.
A subjective evaluation of equipment also takes either plenty of time; or an introduction to the device under test in a well-known environment, acoustically and electrically.
Point taken (link here for others). I was wrong there. But, the only way that statement works to your end is if you take it in isolation, where it could refer to any old audiophile. The topic of that paragraph was already excellent mastering engineers and the time it took them to gain that excellence. And I still don't think the paper supports your next point, which looks like your central argument:
Most audiophiles use their ears to select equipment, not measurements. Perhaps there's a chance that better listening assessment methodology and programme and training would be more helpful for them than measurements.
This has been the audiophile method for a long time. We'll have to differ here. I don't think excellent hearing comes without a good understanding of the underlying phenomena or gear.

Somewhat related, one of the foundational texts in modern singing was Richard Miller's The Structure of Singing, which combined anatomy with traditional musicology and vocal techniques. The book begins by criticizing traditional approaches which favor a loose vocabulary and unclear or inconsistent methods, and produce poorer singers than if they were informed by a personal knowledge of the actions between throat, lungs, stomach and so on. Adele's technique, self-taught, caused her to have surgery multiple times. By analogy, Lund's example is that professionals who sometimes listen to music 8 hours or more a day undergo fatigue (wherein their decisions become worse) and eventually hearing damage. Some knowledge of the hearing system is sure to prevent it, like how to set levels and why mixing decisions have to be understood in the context of a certain level.

The perceptually self-sufficient approach seems anything but that. It requires other people to figure out how sound works, from physics to biophysics to psychophysics. But as we've seen it's possible to develop whole narratives based on what you don't know without ever feeling the urge to look something up.
 

T.M.Noble

Active Member
Audio Company
Joined
Dec 3, 2019
Messages
277
Likes
1,704
The first piece of audio equipment I ever bought was the jotunheim. It was so noisy I had to buy a hum x and ifi ipower to try to deal with it. It still was incredibly noisy and sounded terrible. I could literally hear the noise floor at less than half volume. It wasn't grounded either. If I touched the knob I would hear microphonics. I sent it in for a fix and got it back only slightly less noisy. I tried to give them the benefit of the doubt by sending it in and having them unable to fix it, I'll never buy from Schitt again.

After being disappointed I ended up finding this site. I then bought the original rme adi 2 dac and was blown away by how nice it sounded and how good it was at rejecting noise. No matter how loud I turned it up I could not hear the noise floor. All my money spent on the subjective camp had been wasted at that point.

I wish I would have found audioscience before schiit's marketing.

I just don't know how something that badly designed could have made it to market. But I will say the heresy was a good change of pace.
I am sorry to hear that you had a bad experience. We definitely have come a long way since then . We are always trying to improve our products and I think our current line is an example of that progress.

You clearly still have a bad taste in your mouth from your previous purchase, something I completely understand. While we do have affordable products considering this market, the cost is not trivial. Every company has it's failures and we are not immune. I can promise that we do learn from those failures and make up for them when we can. We aren't just marketing. We give a shit about this industry.

I will reiterate, I believe our current line of products are the best designed and quality for your dollar and I believe our products speak for themselves.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,701
Likes
37,441
So is this discussion trending toward what sort of listening ear approach can be useful in evaluating speakers? I think the way it is done now by many is near useless though it doesn't have to be that bad.

Listening to pink noise can provide some insight if you have a little experience. If you've heard it over great flat speakers you'll have a comparative idea. If you have a great speaker to switch to on hand as a reference it can be very useful. It beats just listening to some reference tracks in terms of a quick gauge of frequency response. Combined with some perceptually eq'd sweeps below 500 hz you could probably do alright in a general sense getting a handle on a speaker in a given room.

I'm sure there are other techniques one could develop. It is going to rub some people the wrong way who have a stake in how things are done as they currently are.
 
Top Bottom