Not really. This is a modern take on a purist version of science, mostly down to a mistaken view of Karl Popper's work. Whilst I'm a big fan of Popper, he was, like many philosophers of science, mostly thinking about physics. Even then it is more nuanced. Science is much wider than falsifiable theory. In the end your theory needs to be falsifiable, and tested. But there is a long road ahead before you get there. Data collection is required, and is part of the scientific process. Experiment and publication of experimental results is absolutely science. And you must collect data without thought as to what you think the resultant theory might be, otherwise you taint the data.
Real science takes an unknown amount of time. Amir is in data collection, and at the same time using established science to create evaluations, all the while noting discrepancies with existing science. The fact that there isn't a new thesis right this very moment does not make this any less science. Indeed the fact that there isn't any such thesis is what makes it proper science, and not a psuedo-science exercise. This is a really important point. The last few decades of modern science as performed in universities has been tainted by this. There is a corrosive drive to be publishing novel, exiting, and most of all justify your next round of research funding because you are productive. This has led to the reproducibility crisis. It is clear that science would be well served by a reduction in a desire for yet another novel thesis, and much more dispassionate data collection and curation. That and experiments that don't just test a new theory, but test exiting theories. The lack of testing of exiting theories is the elephant in the room for a huge section of modern science. Peer review is not supposed to be just getting a few mates to sign off on you latest paper for publication. It is supposed to be testing of these results to ensure that they are reproducible. This is sadly very rare. Almost no journal will publish such "null" results, so there is no incentive to ever perform such a test. That has led to a morass of published work that turns out to be unreproducible if there ever is a need. Testing someone else's work to verify reproducibility is exactly science. Indeed it is now clear it is more important than just coming up with an initial new theory. Ideas are cheap. Truth is priceless.
Right now Amir is absolutely doing science. Ironically, the theory in the gun-sights at the moment is the Olive score. Nobody has ever attempted to verify the Olive score. IMHO that significantly diminishes the scientific value of the Olive score. The fact that the input data used to generate the score is not freely available to researchers makes the value of the score even less scientific by modern standards, to the point where many journals would refuse to publish the paper now. Not to diminish the work, it was done at a different time to different standards. But by a modern standard of science, ASR is more defensibly science that the Olive score. Fully disclosed methodology and measured data from experiment makes for robust science.
https://www.smbc-comics.com/comic/theory
OK, sure, to the extent that doing science involves measuring and collecting data, then Amir is "doing science," and I think my comment reflected this. I certainly didn't mean to dismiss the importance of this work, which is downright heroic.
In your comment you are conflating two issues: the reproducibility crisis in science and a more fundamental question of what properly constitutes 'science.' It's possible that there is an issue with how science is (mis)conceived that is adding to the reproducibility crisis, but this is a thesis that would need to be supported itself.
(Note for anyone reading who hasn't heard of the
reproducibility crisis in science, it comes out of research that attempt to replicate the results of significant published research which failed to do so. It's scandalous because it has led to false beliefs about important aspects of reality, for example in medicine).
The reproducibility crisis is a complicated problem that involves issues of incentives, social behavior, statistics, and I'm sure much more. The core of the issue, that many scientific results are not reproducible, still includes the notion that a central thesis for a paper is necessary for a scientific result to be presented, because this is the thing that is not being 'reproduced.'
The problems with the incentive structure in science (where there is a premium on novel results which leads to selection bias on which papers get published) strikes me as a vexing problem, because it's an emergent problem based on how complicated social processes causes effects that are unintended, it's a multi-faceted problem, no individual or group is responsible for creating or fixing the problem, and the incentive structures probably reflect deeply held cognitive biases by humans. It's unclear to me how broadening the what is acceptable as a 'publishable' scientific result would affect this crisis. Null results are accepted as being valid scientific finds, it's just that they are less exciting and this critical aspect of the 'scientific process' is being shortchanged.
You mention the issue of not coming up with a thesis before collecting data, and this sure does seem to a necessary part of science. The idea is that a thesis would emerge from collected data.
But there are also issues with crafting a thesis after data has been collected, because you can use statistics to show what looks like a meaningful result that is actually coincidental. There is some movement in science to 'pre-register' the thesis you will be testing to try and mitigate these types of biases. This addresses the issue of the lack of incentive to publishing null results.
But as regards to my comment on ASR and science, I was getting at the idea that, call it the final step of science, there is a convention that tends to focus on not merely data collection but some kind of 'thesis', 'result', 'conclusion', 'main idea' that comes out of the research.
As far as I can tell Amir is not attempting to present such a thesis at this point, but is instead doing careful measuring and collecting a dataset. Such a dataset could support scientific investigation along multiple directions.
I could see some kind of social science being done:
How messaging in the hi-fi audio industry obscures and confuses consumers ability to choose high performing audio equipment, a qualitative study '
Or
'How published results of measured audio performance affect the satisfaction levels of individuals who own the gear tested'
Or
' Quantifying the economic loss resulting from misinformation presented in the audiophile press'
At this point I'm not seeing how the speaker measurements collected so far either confirm or refute the 'Olive Score' because there is no corresponding preference testing being done. If the original data set was available, you could examine it to see if the statistical correlation really holds for the data. Or you could propose new data to collect from the original speaker set, measure them again, and then if you had the listening test data you could perhaps extend and strengthen the Olive Score approach.