• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

ZMF Caldera Headphone Review

Rate this headphone:

  • 1. Poor (headless panther)

    Votes: 48 27.0%
  • 2. Not terrible (postman panther)

    Votes: 84 47.2%
  • 3. Fine (happy panther)

    Votes: 29 16.3%
  • 4. Great (golfing panther)

    Votes: 17 9.6%

  • Total voters
    178

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,054
Likes
36,441
Location
The Neitherlands
All this would show was that EQing a song will be audible, which nobody is disputing. The specific issue is if the higher frequency dips in the Caldera are caused by driver/cavity modes (Amir suggested this), how much of that is audible due to psychoacoustic filtering, which we know happens for room modes at higher frequencies. The only way to test that would be with an EQ on the Caldera itself.

Of course you can do the exact same procedure with EQ that corrects the response of the Caldera to that particular fixture and use a Caldera.
Unfortunately only owners can do this and they might not be interested in such tests and would rather spend their time listening to some music instead.

Given the difference in response it clearly has to do with the pads. Materials, leakage, thickness, foam, angle, interactions with the fixture, driver issues they all come together.
It appears as Dan and Hifiman have less difficulties getting good performance in that part of the frequency range and at least one of Zach's pads seems to perform better in that part of the frequency range. One can theorize all one wants what is the cause but it would have to be investigated.
The thing is... when ZMF customers do not complain and Zach is happy with this result (after 6 years) than its fine and it is what it is.
No one is forced to buy it and everyone can have an opinion about it.
 

L0rdGwyn

Active Member
Forum Donor
Joined
Jan 15, 2018
Messages
295
Likes
676
I wasn't referring to you specifically. I took tube amplifiers only as an example to illustrate a principle, or rather a principled conflict between different approaches, i.e. between subjective and objective. I thought that was obvious.

Here, let me show you how that might not have been obvious.

I think you are mixing up different things. No one is saying you or anyone else can't like tube amps (or a purple car when following your analogy) If you're happy with how your tube amp sounds, no one can say you're wrong. The problem arises when you want to claim that your subjective tube-experience trumps measurement data. @amirm recommendations are based almost entirely on measurement data, which is the only objective way to assess the characteristics of a HiFi device. Everything else becomes subjective and individual. It does not mean that it is wrong for that individual, but it is not possible to generalize based on the individual subjective experience.

Your argument seems to shift. On the one hand, you claim that you do not attach much importance to the reliability of the Harman curve, or that it is not verified. Then you claim that even though it's solid, some people don't like the curve because...well, because they don't like it.

My argument has not changed. If it is going to be made an industry standard - and yes that potentially means less choice as more headphone manufacturers adapt to it - then it should be validated. Anyone who is going to argue that less research is better than more before trying to adopt a practice standard is going against the ethos of this forum. And you still don't seem to understand a basic analogy even though I diluted it into a single sentence - if you standardize frequency response, you are implying less variety in the market and forcing a standard onto consumers that they didn't necessarily ask for. If you did a poll today on every website that discusses headphones in any regard, showed them the Harman data and asked "do you agree that every headphone in the market should be tuned to this frequency response going forward?" do you think a majority would say yes? Do you think consumers want less choice in the market, not more?

You object to the Harman curve being held up as a standard. But it is only one of several criteria. there are other criteria such as distortion, comfort and fit etc.. It seems that you worry that Harman will lead to a situation where freedom of choice disappears. It's probably a small probability. Worse is if you prefer a standardless state, because there is no alternative standard. The non-standard morass is the normality in HiFi.

Deviation from Harman on ASR automatically means a negative review without EQ, even if the rest of the headphone's parameters are acceptable by ASR standards.
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,054
Likes
36,441
Location
The Neitherlands
If you did a poll today on every website that discusses headphones in any regard, showed them the Harman data and asked "do you agree that every headphone in the market should be tuned to this frequency response going forward?" do you think a majority would say yes? Do you think consumers want less choice in the market, not more?



https://www.reddit.com/r/headphones/comments/kjibfz
 

solrage

Member
Joined
Oct 4, 2020
Messages
69
Likes
31
Of course you can do the exact same procedure with EQ that corrects the response of the Caldera to that particular fixture and use a Caldera.
Unfortunately only owners can do this and they might not be interested in such tests and would rather spend their time listening to some music instead.
I do own the Caldera and plan on doing the test myself next time I have a chance. Not sure when that will be, but I'm curious as to the results.
 

Resolve

Active Member
Reviewer
Joined
Jan 20, 2021
Messages
212
Likes
531
Deviation from Harman on ASR automatically means a negative review without EQ, even if the rest of the headphone's parameters are acceptable by ASR standards.
Not necessarily - it depends on the desired narrative. Some can be "recommended with EQ", while others will be "not recommended without EQ", and maybe there are reasons behind that. Ultimately it's a distinction without a difference, but this is one of several areas where Harman checklist reviews could be more consistent. Still, I don't think it really matters since it's quite clear what the parameters are for a positive outcome there. I would just suggest as I have done in other places that its worth digging into the AES papers, since there's more to the research than just this.
 

Resolve

Active Member
Reviewer
Joined
Jan 20, 2021
Messages
212
Likes
531
Seal for DT770(32) and HD800S

seal-dt770-32.png


seal-hd800s.png

Yeah I was surprised by the article, because the DT770 (and other beyers with a similar design) are actually some of the most difficult to get a consistent seal. There are other headphones that perform far more consistently across the board like the HD 800 S. Not sure why they chose closed-back designs in general. Still, the outcome is about what I expected.
 

L0rdGwyn

Active Member
Forum Donor
Joined
Jan 15, 2018
Messages
295
Likes
676
Not necessarily - it depends on the desired narrative. Some can be "recommended with EQ", while others will be "not recommended without EQ", and maybe there are reasons behind that. Ultimately it's a distinction without a difference, but this is one of several areas where Harman checklist reviews could be more consistent. Still, I don't think it really matters since it's quite clear what the parameters are for a positive outcome there. I would just suggest as I have done in other places that its worth digging into the AES papers, since there's more to the research than just this.

Right, my point is a headphone that does not adhere to Harman will not be recommended without EQ, even if it is strong in every other area.
 

L0rdGwyn

Active Member
Forum Donor
Joined
Jan 15, 2018
Messages
295
Likes
676
Alright, I cannot argue any more, I will be turning off notifications. Anyone who wants to quote me, I offer you the last word.

Regardless of the different viewpoints, ultimately this is about enjoyment of music, so I hope you all are able to accomplish that in your own way. Adios!
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,679
Likes
241,092
Location
Seattle Area
Personally, I'm fine with Harman being considered a "standard." What I'm less fine with is assuming that everything that deviates from that standard no matter how or by how much is bad or flawed.
That's not remotely my position in reviews. I allow variations with ease. This is Dan Clark Expanse:

index.php


There is clear deviation in the 80 to 300 Hz. This is what I said in the review:

"Listening with stock tuning, I enjoyed warmth that it added to my reference female vocals. Defeating it using EQ caused the vocals to stand out more, with slightly more spatial qualities. But the tonality comparatively could be said to be a bit bright. I preferred it without EQ. Moving to other clips, I occasionally would hear a bit of tubbiness which was made better with EQ. This was in cases where the tubbiness was already in the music and the boost in the response of the headphone exaggerated it a bit. Overall, I would say 70% of time of I preferred no EQ. For the others, if I didn't have EQ, it would still be delightful to listen to headphone. We are talking small differences here."

As I have repeatedly said, my listening tests are there to verify measurements and I did precisely that above. Taken all into account, I gave Expanse the highest award. So deviations are not at all an automatic negative score. Not remotely so.

I did the same for Caldera:

"First impression out of the box was inoffensive but pretty boring sound. There was almost no deep/sub-bass response so that was the first low hanging fruit to fix with equalization:..That alone made a remarkable difference. But the job needed to be completed with the three other filters to fill those holes. Once there, I liked the effect very much but thought the sound was a big bright so took down the peaks a bit. With the complete package, the transformation was dramatic as you can imagine. There was impressive bass and excellent detail and quite good spatial effects."

You can even see me tuning beyond the Harman curve a bit by ear. The headphone once again got highly recommended with the EQ.

Notice the part I have bolded. No way I can give a positive review to a headphone when measurements show deficiencies and my listening tests completely confirm that. As research predicts. Are you asking me to lie about what I measure and hear???
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,679
Likes
241,092
Location
Seattle Area
I don't disagree with you, and I do believe in the scientific method and verifiability, which is why I suggested the Harman research should be verified before being put into practice or measured against as an industry standard.

Here's an analogy that maybe illustrates better my feeling - say Ford over the past five years did a significant amount of research on buyer preference of car color. What they found was 64% of car buyers preferred their car to be black. Then a popular car reviewing publication began dolling out bad reviews if the car was in any color but black given the new research.
Headphones have identical characteristics to cars in that manner: their look/color. The Caldera got rave reviews from me in that respect. More so, I have said that it has influenced many people to buy it, the actual performance notwithstanding. So your analog makes no sense.

A proper analogy would be car's actual performance. That is what frequency response of a headphone shows. You could argue that a car can be great if it is anemic when first push the pedal and then delivers its power. If research shows that most people don't like that (and that is the case here, i.e. turbo lag is not liked), then I as a reviewer get to critique said problem when competitors don't suffer from that.

Putting analogy aside, majority of people like the target that I use in my reviews. Only 1/3 as many people like it otherwise but that is only with reference to bass/treble ratios. There is no research that backs the response of Caldera with its chewed up lower treble.

Let's say we are just talking about bass then. When three times as many people like it according to the target, then why should we all have to use EQ and you not when you represent 20% of the listeners???
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,679
Likes
241,092
Location
Seattle Area
Right, my point is a headphone that does not adhere to Harman will not be recommended without EQ, even if it is strong in every other area.
Not true. Here is Sennheiser HD650 review:

index.php


And my subjective remarks:

"The HD650 sounds like it measures. That is, it has a very good sound and tonality out of the box -- better than any other headphone I have tested. No wonder that I am often surprised how good it is when I go back to testing it for a few seconds after using another for a long time. You can however do much, much better with just a bit of equalization:....

There are a lot of sexier headphones out there than this old design but the HD650 not only holds its own still but with equalization produces an excellent high fidelity experience. If you are anti-equalization, you can still enjoy a lot of what this headphone has to offer.

Happy to recommend the Sennheiser HD650 headphone."

Indeed I frequently recommend it to people.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,679
Likes
241,092
Location
Seattle Area
They did not control for inter-individual variation, which we know is likely to be an issue for the DT 770 Pro : https://www.rtings.com/headphones/1-5/graph/7913/consistency-l/beyerdynamic-dt-770-pro/440
Once again, they tested for such variations. You honestly want them to dismiss that testing and go and look at some review site's findings???

That aside, once again you are a victim of not reading the paper. They start with this:

"We started by measuring the frequency responses of
headphones to use as an inspiration for the perceptual
space of the stimuli in the listening test. The headphones
for this study were selected amongst the most
popular closed-back over-the-ear consumer devices on
the market at the time of writing, namely:

Apple AirPods Pro Max
Bang & Olufsen HX
Beyerdynamic DT 700 Pro X
EPOS Adapt 600
Jabra Elite 85h
Shure AONIC 50
Sony WH-1000XM4
SoundCore Space Q45"

The fact that they didn't use the same headphone as Harman's is a good thing, not bad. We are seeing that Harman's conclusions didn't rely on that specific surrogate headphone. In addition Beyerdynamic headphones are used in other headphone research such as:
The subjective effect of BRIR length perceived headphone
sound externalisation and tonal colouration

Ryan Crawford-Emery, Hyunkook Lee

"2.3. Listening Test Procedure
There were two listening tests conducted:
externalisation and colouration. Both tests were held in
a quiet, acoustically controlled listening room. The
stimuli were presented over a pair of Beyerdynamic
DT880 Pro headphones in a random order.
"

And:
Magnitude and Phase Response
Measurement of Headphones at the Eardrum

Anders T. Christensen, Wolfgang Hess, Andreas Silzle and Dorte Hammershøi

"We measured transfer functions of six headphones,
shown in Fig. 1.
The full model names are:
1. Koss HP/1
2. T.Bone HD800
3. Sennheiser HD600
4. Beyerdynamics DT990 Pro
5. STAX SR-007 MKII
with STAX SRM-007tII tube amplifier
6. STAX SR-404
with STAX SRM-007tII tube amplifier"

So your assertion that use of Beyerdynamics headphones means something is wrong is without merit.

They further performed statistic analysis:

"These models
were compared using ANOVAs. It was found
that Site did not have a significant effect on ratings
[c2(1) = 1:09394; p = :296], but the effect of Headphone
curve did [c2(31) = 1128; p < :001].
"

If fitment issues were a significant factor, they would not have achieved this level of statistical significance. And certainly not arrive at their conclusions regarding bass performance.

As the French expression says, “c’est l’hôpital qui se fout de la charité”.
When you don't answer a simple question of whether you have read a piece of research, the conclusion in every language is "NO."
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,054
Likes
36,441
Location
The Neitherlands
index.php

and below, measured on a different type of fixture the $ 400 Audeze MM100 (Sonarworks)
index.php

check out the response above 200Hz (because a different target is used below 200Hz).
The MM100 (my guess) seems to have some leakage here judging from the typical behavior in the low bass.
Then again... we can see the double dip (just at different frequencies and depths) in almost all Audeze and Hifiman headphones.
index.php
 
Last edited:

next2nothing

Member
Joined
Nov 30, 2023
Messages
22
Likes
9
There is no research that backs the response of Caldera with its chewed up lower treble.

Then make one. Again, its not the job of the designer to pre-chew things for you or create a study about the choices during the design process, designers job is to make something people like, which he apparently succeeded with.

As a presumably scientific thinking guy you could try to find out how the designer managed to create a headphone that many reviewers, listeners and buyers describe as the most dynamic driver-like sounding planar allrounder, with natural sound signature. Instead of dismissing it as esotheric mumbo jumbo, maybe try to understand what this people mean and why they come to such conclusion. I would highly appreciate a serious scientific breakdown. There must be more to it then marketing and nice craftsmanship. Just maybe. Out of curiocity, it might be worth looking into. And I believe that even your crowd would appreciate such attempt by knowledgeable experieinced engineer. Very likely much more than another "hey folks, its not tuned to harman which we all like so much yadda yadda"- borefest.

But I am sure that you will choose the low hanging fruit-approach of zero scientific curioucity claiming someone is spreading "FUD" again and the holy bible of Harman forbids us to think in that direction. So here we are with rather senseless 30 pages of nothing.
I am not offended by the review, as someone asumed (I actually would enjoy an intelligent criticism or even a roast lol). The review conclusion is rather positive- people who in the market for a 3.4k headphones usually know how to EQ so it wont prevent the potential target audience from buying it. Its the dogmatism, arrogance and complete lack of scientific curiocity that are really remarkable.

Regarding previous posts, yeah its very nice of you to address things said as a "high school level", instead of actually addressing the points made, like research data I asked for after assumptions you made and things like very limited size of Harman research from standpoint of socilogy research standards and issues with the richness and of presented data and independend evaluation of it.
You must be absolutely outthere if you think that one singular research with arguably small sample size and overall low richness of data, from a competing company, done by the company employees, with company employees being the focus group of the research, without independend evaluation and cross- studies by other groups, can become industry standard any time soon ( that apart of many other other factors why this just wont happen). The study is a nice foundation for things to come, and something that many people on the "audiophool" boards like headf-fi and sbaf happily use to interpret their findings, this board on the other hand is the definition of getting high on your own supply.
 
Last edited:

MayaTlab

Addicted to Fun and Learning
Joined
Aug 15, 2020
Messages
956
Likes
1,593
Once again, they tested for such variations.

*sigh* They didn't test for inter-individual variations. At this point I don't think that you have any clue what that means to be honest.

You honestly want them to dismiss that testing and go and look at some review site's findings???

Rtings does test for in situ inter-individual variations at low frequencies.

That aside, once again you are a victim of not reading the paper. They start with this:

"We started by measuring the frequency responses of
headphones to use as an inspiration for the perceptual
space of the stimuli in the listening test. The headphones
for this study were selected amongst the most
popular closed-back over-the-ear consumer devices on
the market at the time of writing, namely:


Apple AirPods Pro Max
Bang & Olufsen HX
Beyerdynamic DT 700 Pro X
EPOS Adapt 600
Jabra Elite 85h
Shure AONIC 50
Sony WH-1000XM4
SoundCore Space Q45"

I don't see how that has anything to do with the question of selecting the most appropriate pair of replicator headphones ?

The fact that they didn't use the same headphone as Harman's is a good thing, not bad. We are seeing that Harman's conclusions didn't rely on that specific surrogate headphone.

If you want to speak in very broad and not particularly precise terms, maybe yes ? But I think that it's debatable. These two curves, extracted from the Zenodo link mentioned earlier, were the highest rated in the experiment :
Two highest rated curves.jpg

And the difference in rating between them is statistically not significant (ie should be interpreted as "the same").

The mean of the five highest rated curves that they showed in figure 2, if used as a target, results in somewhat different error curves for headphones such as the HD600 or HD800 than the ones you'd obtain on a Gras fixture using the Harman target, which for reasons developed here and later in that thread would mean that the findings were, in fact, somewhat different.

In addition Beyerdynamic headphones are used in other headphone research such as:
The subjective effect of BRIR length perceived headphone
sound externalisation and tonal colouration

Ryan Crawford-Emery, Hyunkook Lee

"2.3. Listening Test Procedure
There were two listening tests conducted:
externalisation and colouration. Both tests were held in
a quiet, acoustically controlled listening room. The
stimuli were presented over a pair of Beyerdynamic
DT880 Pro headphones in a random order.
"

DT880 =/= DT 770 Pro. They chose more wisely.

And:
Magnitude and Phase Response
Measurement of Headphones at the Eardrum

Anders T. Christensen, Wolfgang Hess, Andreas Silzle and Dorte Hammershøi

"We measured transfer functions of six headphones,
shown in Fig. 1.
The full model names are:
1. Koss HP/1
2. T.Bone HD800
3. Sennheiser HD600
4. Beyerdynamics DT990 Pro
5. STAX SR-007 MKII
with STAX SRM-007tII tube amplifier
6. STAX SR-404
with STAX SRM-007tII tube amplifier"

So your assertion that use of Beyerdynamics headphones means something is wrong is without merit.

That study was only concerned with in-situ measurements and not preferences for target curves using replicator headphones. You're just quickly scanning through AES' library using the keyword "Beyerdynamic", regardless of whether it's relevant to the discussion at hand ?
And again, DT 770 Pro =/= DT 990 Pro.

They further performed statistic analysis:

"These models
were compared using ANOVAs. It was found
that Site did not have a significant effect on ratings
[c2(1) = 1:09394; p = :296], but the effect of Headphone
curve did [c2(31) = 1128; p < :001].
"

If fitment issues were a significant factor, they would not have achieved this level of statistical significance.

I'll defer to those more knowledgeable with statistics in that regard, but I am not certain that your assertion here is a necessary conclusion to make with that sort of data - and to be frank I don't think that you understand it any more than I do.

And certainly not arrive at their conclusions regarding bass performance.

cf above.

When you don't answer a simple question of whether you have read a piece of research, the conclusion in every language is "NO."

Reading doesn't mean "skimming over the lines", Amir. I mean, you shouldn't take it from me but from Harman themselves, but you don't seem to want to actually read the articles, so as it is difficult to have a rational conversation with you, have a nice day.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,679
Likes
241,092
Location
Seattle Area
I'll defer to those more knowledgeable with statistics in that regard, but I am not certain that your assertion here is a necessary conclusion to make with that sort of data - and to be frank I don't think that you understand it any more than I do.
Well, if you don't know what an ANOVA is, what p value is proper, etc. you are definitely not qualified to comment on these studies. Considerable amount of Harman research is in the statistic analysis. You will be lost without this knowledge. I cover some of this in my youtube video on reliability of listener performance and cover even more advance topics such as F Value:


But really, you need to go learn this topic when we are dealing with multiple factors in a study and trying to determine correlation.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,679
Likes
241,092
Location
Seattle Area
Reading doesn't mean "skimming over the lines", Amir. I mean, you shouldn't take it from me but from Harman themselves, but you don't seem to want to actually read the articles, so as it is difficult to have a rational conversation with you, have a nice day.
Both Dr. Toole and Olive are professional friends that I have known for some 15 years. I have had the honor and pleasure of meeting them on a number of occasions, listening to their talks and communicating with them privately. Here is a picture of Dr. Olive when I was at their place some 10 years ago when he was talking about their listener training and having us take the test:

Harman Reference Room - small (2017_11_08 18_13_51 UTC).jpg


I have not only read the research I post (they are all from my personal library), I have re-read them time and time again. I suggest you get to know who you are arguing with before making comments like this.

You have this dogmatic idea that just because there are variations in headphone testing, little can be trusted. This is absurd and goes against clear conclusions drawn by the people who have conducted the research. Yes, in the context of writing a paper and wanting to be politically correct, they soften their stance and put in a bunch of disclaimers. But meet them in person and you don't read any of that in their stance and convictions.

So yes, you do need to read the papers. You need to read them multiple times. And if you can at all do it, you need to meet the researchers and listen to them personally. While no one can ever do enough, I have done plenty of these and hence the reason I have the strong convictions I have. I am not just parroting a few graphs without any first person intuition of what it all means.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,679
Likes
241,092
Location
Seattle Area
As a presumably scientific thinking guy you could try to find out how the designer managed to create a headphone that many reviewers, listeners and buyers describe as the most dynamic driver-like sounding planar allrounder, with natural sound signature.
Our "scientific thinking" mind says to ignore such "reviewer" and buyers: https://seanolive.blogspot.com/2008/12/part-2-differences-in-performances-of.html
ListenerPerformance.jpg


These people are not even consistent with themselves in controlled testing as I explained in my video I post above.

You *have to* understand and internalize this. You can have all the anecdotal evidence you want. It doesn't amount to anything. We exist precisely because we don't follow such folklore. We follow what we can defend. None of those people defend what they say in any kind of controlled testing.

Playing by your rules, my own listening tests do NOT agree with your summary. I described the sound as "boring." If reviewers opinion matter to you then you must integrate that in your conclusion and therefore invalidate the claim you are making.
 
Top Bottom