• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Schiit Magni Heresy

makmeksam

Member
Joined
Nov 13, 2019
Messages
59
Likes
44
Subjectivists frequently say that extremely good measuring amps etc. sounds analytical/clinical. Is it possible that there could be some scientific explanation for feeing something like this? Also, does anyone have an idea of what analytical/clinical sound could mean?
I think ignoring the claim just saying it is placebo or nonsense is not good enough. Can we do better?
 

JohnYang1997

Master Contributor
Technical Expert
Audio Company
Joined
Dec 28, 2018
Messages
7,175
Likes
18,300
Location
China
Subjectivists frequently say that extremely good measuring amps etc. sounds analytical/clinical. Is it possible that there could be some scientific explanation for feeing something like this? Also, does anyone have an idea of what analytical/clinical sound could mean?
I think ignoring the claim just saying it is placebo or nonsense is not good enough. Can we do better?
There are a few reasons.
1, some "good measuring "devices only measure good on some tests and can be revealed with issues with some tests. Eg, high frequency distortion. Lack of low frequency distortion and have high high frequency distortion. This can affect the balance of sound.
2, some " good measuring" devices are at the border line of oscillation. With realistic loads, there can be excessive ringing that affect the performance of the device at lower frequency.
3, many hi-end or subjective favoured devices have extra distortion in the bass and can have high frequency roll off. This can contribute to the difference between the devices.

Actual good measuring devices should sound clean, glossy texture, soft but not less in the high, tight in the lows, nothing stands out more than others. (apply to all devices from dac amp to headphones). Compared to other devices, these can sound less weighty, less fuzzy, less fluffy, less grungy, less "resolving". But when you are used to the sound it's much less tolerable to hear worse devices.
However nowadays, many devices, especially dac ,measure really good all round. So in general minimal difference should be found. Many are listening with their eyes not ears. Once conducting fully controlled blind tests, they will not hear anything difference.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,657
Likes
240,865
Location
Seattle Area
Subjectivists frequently say that extremely good measuring amps etc. sounds analytical/clinical. Is it possible that there could be some scientific explanation for feeing something like this?
There is: they are wrong. :) Don't let them know the identity of the device they are listening to and they no longer can tell which is analytic, which is not.

Also, does anyone have an idea of what analytical/clinical sound could mean?
It means they read that a device has lower distortion and hence, must mean it sounds analytical. Read this all the time in subjective reviews.

I think ignoring the claim just saying it is placebo or nonsense is not good enough. Can we do better?
That is what it is though. We can't use science to make sense of anti-science.
 

JohnYang1997

Master Contributor
Technical Expert
Audio Company
Joined
Dec 28, 2018
Messages
7,175
Likes
18,300
Location
China
Ah. I just remember a guy in a thread on diyaudio said about the tonality is correlated with the model number of the opamps, even the sum of the numbers. And argued lt1115 sounds different from lt1028.
 

makmeksam

Member
Joined
Nov 13, 2019
Messages
59
Likes
44
There is: they are wrong. :) Don't let them know the identity of the device they are listening to and they no longer can tell which is analytic, which is not.
I think this is the first review about this amp.
The reviewer says that the tests were done blind. But he still claims that Heresy (which measures better) sounds thin/lacks body or so compared to Magni 3+ (which measures relatively worse). What do you think is going on here?
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,657
Likes
240,865
Location
Seattle Area
I think this is the first review about this amp.
The reviewer says that the tests were done blind. But he still claims that Heresy (which measures better) sounds thin/lacks body or so compared to Magni 3+ (which measures relatively worse). What do you think is going on here?
His observations about the sound were sighted and based on same factors I mentioned. His blind test is incomplete. He needs to run 10 trials and get 8 right that he can tell the difference. He is just telling us his conclusions that he could tell the difference. How did he establish that?

I have had occasions where I could get the right answer 4 out of 5 times which makes you think I know the difference. But then could not from there on. The differences that I thought I could "clearly" hear were no longer there.
 

Arienne

New Member
Joined
Dec 8, 2019
Messages
3
Likes
2
Subjectivists frequently say that extremely good measuring amps etc. sounds analytical/clinical. Is it possible that there could be some scientific explanation for feeing something like this? Also, does anyone have an idea of what analytical/clinical sound could mean?
I think ignoring the claim just saying it is placebo or nonsense is not good enough. Can we do better?
In my experience, some kinds of harmonic distortion can sound better than a clean signal. I personally find that sometimes I'm more emotionally affected by a signal when it's clipping vs when it isn't. I'm not sure why that would be the case, though. If that's true, I think we should just master the music with it, rather than adding it in after the fact with equipment, and that's what I do when I feel a mix calls for it. The signal reproduction should be as perfect as possible and it's the signal we should be altering, but that's just my opinion.
 

Arienne

New Member
Joined
Dec 8, 2019
Messages
3
Likes
2
His observations about the sound were sighted and based on same factors I mentioned. His blind test is incomplete. He needs to run 10 trials and get 8 right that he can tell the difference. He is just telling us his conclusions that he could tell the difference. How did he establish that?

I have had occasions where I could get the right answer 4 out of 5 times which makes you think I know the difference. But then could not from there on. The differences that I thought I could "clearly" hear were no longer there.

An even more fun blind test would be to record the output of a preferred dirty amp with a good enough ADC and play it back on a clean amp and ask them to A/B that.
 

Arienne

New Member
Joined
Dec 8, 2019
Messages
3
Likes
2
My relationship with Schiit instantly changed the day I learned they made use of an AP555(decisions like that have more to do with mindset than the effort of actually doing such). Even better now considering they're actually putting out a product with express intent to address a segment of criticism they faced in the past. AND on top of that we have TK here openly talking, and taking initiative to bridge more ties in good faith. He behaves like the refreshing marketing they have on their site - that is to say - like a normal human being, and not like some mouthpiece robot.

Going by TK Noble's presence here and his outlook. Only an insane person wouldn't give him due respect.

I've been following Schiit for years and have either owned or loaned a great deal of their products, and they're pretty good in general, but for a long time, I don't think they were customer-centric. They don't design the crazy shit they design for people to buy, in my opinion. They do it because it's fun. They enjoy inventing ridiculous technologies and evaluating their pros and cons. Selling the results seemed almost like a fun bonus for most of their history. It's nice and all when someone does things because they're passionate about them, but for a long time that passion led them to respond to criticisms of their products as artists would respond to criticism of their art, rather than as a business should respond to the criticisms of their customers, and I think that's where they got lost for a while. This new product line, new level of engagement and new focus on transparency reflects a shift a little bit away from doing it for fun towards product focus. The fact that they have a business has finally started setting in for them. I think that's a good thing. I look forward to seeing the new products they introduce now that they've realized what business they're actually in.
 

makmeksam

Member
Joined
Nov 13, 2019
Messages
59
Likes
44
In my experience, some kinds of harmonic distortion can sound better than a clean signal. I personally find that sometimes I'm more emotionally affected by a signal when it's clipping vs when it isn't. I'm not sure why that would be the case, though. If that's true, I think we should just master the music with it, rather than adding it in after the fact with equipment, and that's what I do when I feel a mix calls for it. The signal reproduction should be as perfect as possible and it's the signal we should be altering, but that's just my opinion.
I totally agree to this. If the the amp or the dac is used to color the sound with distortion, they will be adding the distortion universally to all tracks. This may improve some tracks but may ruin others that most likely could be many.

For people who like coloring the sound, why not have a transparent system and use software to tweak the sound! You can do tweaks with arbitrary precision in the digital domain with software. Also, this way the sound can easily be changed anytime you like. If you like the so called tube sound for today's evening, go ahead and apply a software filter for that. If you want a transparent and accurate sound on a different day remove the filter and enjoy transparent music. Whether these kind of software filters exist is a different concern!
 
Last edited:

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,484
Likes
4,111
Location
Pacific Northwest
His observations about the sound were sighted and based on same factors I mentioned. His blind test is incomplete. He needs to run 10 trials and get 8 right that he can tell the difference. He is just telling us his conclusions that he could tell the difference. How did he establish that?
I have had occasions where I could get the right answer 4 out of 5 times which makes you think I know the difference. But then could not from there on. The differences that I thought I could "clearly" hear were no longer there.
8 of 10 right is about 94.5% confidence, only 5.5% chance to get at least that many right by guessing.
4 of 5 right is about 81.3% confidence, 18.7% chance to get at least that many right by guessing.

With subtle differences that are hard to hear, near one's threshold of audibility, listener fatigue sets in faster so test performance becomes less reliable. In the latter case (4 of 5) the odds are still in your favor so it's more likely than not that you can hear those differences; it's not that they're "no longer there", but they're harder to detect. With 10 trials, getting at least 6 right still beats the odds at 62.3% confidence. If you can do that consistently, it's more likely than not that you are hearing a real difference.

The standard 95% threshold commonly used in ABX makes a high precision, low sensitivity test. That's appropriate for some cases. But detecting subtle differences calls for a lower threshold to make a low precision, high sensitivity test.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,657
Likes
240,865
Location
Seattle Area
The standard 95% threshold commonly used in ABX makes a high precision, low sensitivity test. That's appropriate for some cases. But detecting subtle differences calls for a lower threshold to make a low precision, high sensitivity test.
That is a position I have held as well. However, in practice I have found that even after getting 4 out of 5 right, I can go to 20 trials and my score keeps reducing showing that I did not at all detect the right difference I was zooming into.

Conversely, when I do detect the difference, I am able to get 100% right answers, sans one or two trials where I got distracted, or voted incorrectly.

So as a practical measure, I like to hold myself to the near 100% limit. Values lower than that may pass the typical criteria but don't represent the truth. They certainly don't hold the truth when people say in sighted tests the difference is night and day. If something is sterile for example, you better get that right 100% of the time in blind tests. So even 8 out of 10 would too generous of an allowance. The result better be 100 out of 100.
 

Tks

Major Contributor
Joined
Apr 1, 2019
Messages
3,221
Likes
5,497
With 10 trials, getting at least 6 right still beats the odds at 62.3% confidence.

Still within margins of luck, 6/10 is essentially a dice roll, in the same way no one would consider 4/10 the opposite of "beating the odds" and calling it "failing the odds" or something to that effect.

Given enough trials, the closer you trend toward 50% the more you can say nothing is being heard.

If someone was for instance trending toward 0%, as in, picking out a sound wrong all the time, that would actually mean something is being heard. The further your deviation from 50%, the more likely you're demonstrating there is more than guess work dice rolls occurring. The issue is, the trails need to be run more and more, regardless of even if fatigue theoretically doubles linearly per trial. The fact of the matter is, if fatigue is that much of an issue, then that also speaks to the effect of being able to hear the difference only lasts inappreciably long. And normal listening would then also default after a few minutes to the state of "not being able to tell the differences" anyway for all people.

If I am unclear, let me give you an example.

Lets say I get out of bed every day, and perform one trial within 1 minute of waking up, and in that first trail, I am always 100% correct, and that trail is listening to 2 sound clips and being right 100% of the time, and theoretically I am testing whether distortion artifacts can be detected at -140db. But every single trail after those two sound clips exhibits no pattern, and I eventually run 20, or 30, or 40 trials, all trending toward 50% the more trials are conducted.

So I have now theoretically proven we can hear such distortions -140db down. But what value is that for us if no one else is ever able to demonstrate anything widely different than the hypothetical conclusion: "That after waking up our hearing is mega sensitive, sensitive enough to hear -140db down, but only for the first few seconds of waking up". And after decades we get someone who can hear maybe 145db down for maybe 40 seconds, or another person that can hear 135db down maybe for 55 seconds. But all of these people then default to hearing as most of us do after those first few seconds after waking up?

Knowing that fatigue hits, isn't a detriment to the test, it only further demonstrates the reality of the matter. And if fatigue for hypothetically hearing -140db down sets in after a few minutes after waking up. Who would care when subjectivists say "oh well I was tired, but I know I can tell the difference". That's like me saying "well I know I can tell the difference if you gave me bat ears". So what?

But this is me being charitable, the actual reality of the landscape is these people can't tell these differences EVER. Well maybe they can, but they're all unlucky enough to be always tired, or always under pressure to perform (yet somehow this pressure conveniently never exists when the tests are sighted for some reason eh?).
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,484
Likes
4,111
Location
Pacific Northwest
That is a position I have held as well. However, in practice I have found that even after getting 4 out of 5 right, I can go to 20 trials and my score keeps reducing showing that I did not at all detect the right difference I was zooming into. ...So as a practical measure, I like to hold myself to the near 100% limit
This is a great goal, but it doesn't give your hearing acuity due justice. Subtle differences near the thresholds of audibility have reduced test results. For example: if we start with A and B obviously different, then gradually reduce the differences, people's ABX test scores doesn't suddenly go from 100% to 50% (random guessing). As the difference between A and B approaches their threshold of audibility, their scores have a range where they drop below 100% yet are still consistently above 50%. They are hearing something real but it's too subtle to get right every time. Setting the threshold near 100% calls these a fail, which is incorrect.

... Values lower than that [100%] may pass the typical criteria but don't represent the truth. They certainly don't hold the truth when people say in sighted tests the difference is night and day. ...
The definition of "truth" is fuzzy because it's based on statistics, and it can't have both high precision & high sensitivity at the same time. Wherever we set the threshold, it only trades false positives for false negatives. At 99% you will get false negatives. At 51% you will get false positives. Neither is inherently "right" or "wrong", it depends on the goals of your testing.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,484
Likes
4,111
Location
Pacific Northwest
Still within margins of luck, 6/10 is essentially a dice roll, in the same way no one would consider 4/10 the opposite of "beating the odds" and calling it "failing the odds" or something to that effect.
...
If I am unclear, let me give you an example. ...
Your example is about cherry-picking, which is worth discussion.

"trust" can be a reason to require higher test confidence than would normally be needed. If a person does consistently get 6 of 10 right, the odds are he is hearing something real. However, this confidence being only slightly better than guessing makes it easy to cheat. 62.3% confidence means 37.7% chance to score that well by guessing. Even if he isn't hearing a difference, just keep taking the test until he passes. At 37.7% that shouldn't take more than 3 tries. Then he can cherry-pick that one test and say he's hearing a difference. But this cherry-picking is only fooling himself. What is the point of that? It undermines the goal of education & fun.

So using 95% confidence has the benefit of making it harder for people to pass one test by accident and cherry pick it. They'll have to take the test about 20 times, more or less. But the drawback is that this high confidence has false negatives, people who consistently outperform guessing and really are hearing something, but scored as "fail" by the test threshold.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,657
Likes
240,865
Location
Seattle Area
There is a difference when examining someone else's data based on statistical analysis and my own data. When I am wrong, I know it and won't use statistics to show otherwise. That is what I meant in the context of 4 out of 5 right and then failing left and right. In that case, I thought I 100% knew the difference by the fifth trial. Not being able to "perform" after that meant that what I thought was going on, was not.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,484
Likes
4,111
Location
Pacific Northwest
For me, hearing acuity near the thresholds of perception isn't always as clear-cut. Sometimes I know (100% confidence) that I know. Sometimes I know (100% confidence) that I don't know. But in some cases I think I might know, but I'm not sure, so I have to rely on confidence percentiles. Over the years I've found that when this happens it's when the difference is near thresholds of perception. In these case, my take-away is: "this difference is potentially audible".
 
  • Like
Reactions: Tks

Tks

Major Contributor
Joined
Apr 1, 2019
Messages
3,221
Likes
5,497
Your example is about cherry-picking, which is worth discussion.

"trust" can be a reason to require higher test confidence than would normally be needed. If a person does consistently get 6 of 10 right, the odds are he is hearing something real. However, this confidence being only slightly better than guessing makes it easy to cheat. 62.3% confidence means 37.7% chance to score that well by guessing. Even if he isn't hearing a difference, just keep taking the test until he passes. At 37.7% that shouldn't take more than 3 tries. Then he can cherry-pick that one test and say he's hearing a difference. But this cherry-picking is only fooling himself. What is the point of that? It undermines the goal of education & fun.

So using 95% confidence has the benefit of making it harder for people to pass one test by accident and cherry pick it. They'll have to take the test about 20 times, more or less. But the drawback is that this high confidence has false negatives, people who consistently outperform guessing and really are hearing something, but scored as "fail" by the test threshold.

I am 100% fine with the side arguing in favor of impossible things, of them cherry picking their BEST evidence and scores. I want to see what their best case scenario has to offer. If they can cherry pick, I don't mind in the slightest as it still qualifies as some scientifically conducted evidence. But when I say cherry pick. I don't mean running 100 trials, and not publishing 98 of them for example. I hope people pick studies that people find fairly and rigorously conducted, and then we can go over the value of each portion of the study.

I have no problem with tiring people out until they collapse if needed. So as long as that is accounted for in the study. There is nothing wrong with this in my book so as long as each side can agree at what thresholds fatigue is having an effect. Having people score 4/5 is also data if they can do it consistently. But then we have to run more specified studies surrounding the parameters that allowed them to achieve 4/5 for example.

The problem with people who are claiming "i can hear a difference" is the fact that sort of natural language has no formalized mathematical meaning when they say that. If they can specify what "I can hear a difference" actually means then we can perhaps understand each other better. What ends up actually happening is we will assume "I can hear a difference" means "You can hear a difference in any practically conducted scientific or reality-based home-tested scenario"? But then when the tests are done, the subjectivist has to clarify "The pressure got to me" or "I was hungry" and other such excuses. Which in my book are fine. But they should have told us what conflicts exists when they say "i can hear the differences between cables" actually are. Don't tell us after the fact you agreed you were fine to take the test, and then the test is invalid because you forgot that being hungry would have such an effect on you.

In the majority of cases, when both objectivist and subjectivst agree on terms for testing, it is nearly ALWAYS the case the subjectivst is having qualms about the event after the fact. It's fine if you want to say "oh but this can lead to false positives". Well then why did you agree to these aspects of the test, and why don't you simply propose your own parameters and then we can hash out if they will satisfy both sides?

I am very sorry for going on, and on, and on. But I cannot stress the massive miscommunication due to the limits of natural language, that occur when two parties are debating or testing the validity of their claims. This is why having as clear of an understanding what each side means when they say things is far more important than virtually any test that I can imagine. This is why many highly educated people prefer to use formalized forms of conveying ideas, and not just natural language (scientific tests are an extension of trying to remove natural language interpretations of results, which is why things like numbers are so valuable since all but the insane agree equally on what they mean with no room for interpretation).

95% confidence ratings need to be used if a strict deductive affirmation is going to be claimed. If we're going to make inductive statements, then 95% confidence is uneeded as we're only testing the amount/frequency something is or isn't the case.

I can think of a test that accounts for fatigue and such really quickly. For people that claim they hear cable differences, I can instantly take away the only excuses subjectivsts still have left and invoke majority of the time, (fatigue, and pressure), and simply swap cables in their own homes anytime they're out of the room, without them knowing it's even swapped. And all they would have to do is simply say when they detect a the cables were changed. They wouldn't know when it happens, nor what cable is which (though they get to choose the cables in the start they claim they can hear differences between). So no fatigue or claims about how they're under pressure. Anytime they don't say a cable isn't changed, isn't counted against them. The only thing that will be graded is when they claim the cable was changed from the last listening session they engaged. Otherwise they can conduct their daily live unobstructed.

But even there, they will conjure new excuses, and will say "they felt pressure whenever they had to second guess themselves" or "my hearing might've gotten worse". Totally unaware we're not testing how much of a difference they can hear, simply testing whether what they hear was actually due to a cable change or not.

At some point, you reach a phase where walls of practicality are hit. And the amount of excuses simply exhausts both sides, and finally a conclusion is made out of simply so much probability being stacked against one side - that it's pointless to even entertain the topic anymore. (It gets comparitively unwieldly as the length of this post, and by that point, only but the most OCD acedemically inclined are still left wondering about the truth).
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,484
Likes
4,111
Location
Pacific Northwest
... 95% confidence ratings need to be used if a strict deductive affirmation is going to be claimed. If we're going to make inductive statements, then 95% confidence is uneeded as we're only testing the amount/frequency something is or isn't the case. ...
It's a common mis-perception that higher confidence thresholds are somehow more reliable or closer to the truth. Higher confidence levels (like 95%) do not make the test better, they simply reduce false positives at the cost of increasing false negatives. Lower confidence levels (like 55%) do not make the test worse, they simply reduce false negatives at the cost of increasing false positives.

Any confidence threshold higher than 50% is potentially valid, depending on the testing goal. BS.1116 recognizes this on page 22 where it says, There is of course no "correct" significance level. This reflects the fact that the ideal significance/confidence threshold varies depending on the purpose of the test. That purpose tells you what kind of error is worse - false positives, or false negatives. Then you choose a threshold that reduces it, at the cost of increasing the other. Or strike a balance between them.
 
Top Bottom