• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Spotify choked my music enthusiasm (kind of) but.. they're back! :)

pavuol

Major Contributor
Forum Donor
Joined
Oct 2, 2019
Messages
1,567
Likes
3,959
Location
EU next to warzone :.(
Last Christmas I got emotionally week and payed for Spotify Premium (limited offer/3 months for reduced price). Being a person who wants to actually "really use something" when you payed for it (don't want to say "extort" ;) I must admit, I have skipped quite a bunch of songs during that quarter (several dozens GB of traffic when checked back). I was also in a mood for searching some hifi/reference/test/showcase tracks (call it what you like) so I added several playlists with this orientation. Well, I found many interesting songs/artists but it got also somehow frustrating because many notorious songs were reappearing in these playlists.
That being said, I missed one particular feature that would ease my situation/user experience, that is the possibility to enable some kind of tagging (either automatic or manual) of songs that have been already played. I know, it might also get counterproductive. Let's imagine you forget to hit pause and leave Spotify "unattended" playing for several hours so it would tag quite a lot of content that you actually haven't listened to. Well this could be cured for example by enabling only manually skipped songs for "autotagging feature". OK, enough of my minority wishes ;) I had to play with cards I had been given. So there was only one convenient way to achieve my aim - to manually tag every "once listened" song with the heart button (which remembers your choice within your account and also adds every tagged song to your personal "Liked playlist"). So I was playing music, enjoying a huge music database and "liked" songs happily, when all of a sudden this message appeared:

Screenshot_20200112-162953--.jpg


I thought wtf is this? In this age of "infinite" storage and massive datacenters crazy Swedes set some strange limit? I checked the numbers and the barrier seemed to be at the number of 10 000 (cumulative for both liked songs and albums). Well as my subscription expired I also needed time to have a little rest from this frenetic like/skip, skip/skip marathon ;) So I returned to some laid back listening of my local music and haven't used Spotify for quite some time. Recently I have looked there again for some new content and voilá the limitation is gone.. Don't know if they only raised the bar or set it to "unlimited", like gmail..

Ok that's all as to my funny experience with this service.

Oh, for God of SNR's sake! I just realized I write this in a review department. OK, here is some quick list of pros/cons:

good:
- consistent sound quality throughout content (my subjective feeling ;) / choices being auto-low-normal-high-very high / very high being exclusive to premium substription / as for me, I'm also happy with "high"
- quite pleasant user interface (black style, good for eyes and oleds as well), obviously you need some time to get used to its logic
- reasonable amount of options in settings (crossfade/gapless/automix/explicit content/volume normalization/autoplay/device connection/ car mode/cellular download/ etc.)
- number of offered ways for discovering new content (daily mix, release radar, discover weekly, new album releases, "similar" artists, local playlists, best of artists, "radio"...) - this gets more and more personalized according to your taste, I guess it is quite similar to other streaming services
- very usable service also in non-premium mode if you can live with:
/ ads
/ limited skipping (I think 6 skips for an hour or so)
/ listening to full albums/playlists in shuffle mode only (in full album listening it inserts "random" songs from other artists as well)
- in premium you can download and listen offline content (limited only to your storage I guess)

neutral (decide for yourself):
- limited "social" aspect - you can browse and search user playlists, but not contact them (there is a possibility to link your facebook account though which I have not tried as I don't have any - I am already on ASR, I can't be everywhere ;) After FB linking I can imagine it opens further possibilities
- instead of just a heart button I'd like to see a 5-star rating like in iTunes (ok, this is only my subjective deviation against the crowd, I know..)
- reasonable price with regular limited offers/discounts/family plans

bad:
- I don't listen to local music scene (shame on me) and in non-premium mode it keeps playing me ads for some crap local playlists with packed with brand new mainstream shite..
- during "rush hours" it can get a little laggy (less responsive)

That's all folks! If you feel the "review" part is not detailed enough, this guy can say some more useful points to it:

PS: If I can's sleep I may add some more thoughts to this post or edit grammar mistakes (that was a joke, life is too short for that)
 
Last edited:

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
+1 for Žižek.
 

purplerain

New Member
Joined
Jul 22, 2020
Messages
4
Likes
1
Location
US
Spotify has removed the 10,000 limits in May if I remember right.
BTW, I just wanna mention one more pro: collaborative playlist. When you go to a small gathering and a group pf friends can make a joint playlist together. That's something quite meaningful for me.
 

Spider

Member
Joined
Aug 22, 2020
Messages
60
Likes
34
Location
Denmark
Good & funny review, but the audio quality on Spotify Premium is less to be desired, I am trying not to swear.:D
 

fieldcar

Addicted to Fun and Learning
Joined
Sep 27, 2019
Messages
826
Likes
1,267
Location
Milwaukee, Wisconsin, USA
the audio quality on Spotify Premium is less to be desired
You sure about that? I can never tell the difference between 320AAC and FLAC.

Study illustrating 192Kbit MP3 and AAC vs wav:

I still cant get over how Opus @ ~100Kbit tests incredibly close to transparent. Amazon music started using Opus for all lossy streams. Here is another study for you.
 

Spider

Member
Joined
Aug 22, 2020
Messages
60
Likes
34
Location
Denmark
Hi @fieldcar, I see what you mean, but there is too much to read, for me to take an informed stand right now, but I will most defiantly read it, but not today, the time is 00.30 here in Denmark.
I have Spotify my self, but I don't use it much only for Podcasts.
If we are talking Music I believe anyhow that I can hear the difference between 320 MP3 or AAC & 1411 kbps FLAC files.
I have never been put to the test, but your point is very interesting, I will read the material you provided tomorrow & who knows maybe I will agree with you, but I doubt it, Thanks.
 

fieldcar

Addicted to Fun and Learning
Joined
Sep 27, 2019
Messages
826
Likes
1,267
Location
Milwaukee, Wisconsin, USA
Hi @fieldcar, I see what you mean, but there is too much to read, for me to take an informed stand right now, but I will most defiantly read it, but not today, the time is 00.30 here in Denmark.
I have Spotify my self, but I don't use it much only for Podcasts.
If we are talking Music I believe anyhow that I can hear the difference between 320 MP3 or AAC & 1411 kbps FLAC files.
I have never been put to the test, but your point is very interesting, I will read the material you provided tomorrow & who knows maybe I will agree with you, but I doubt it, Thanks.
It's great to remain skeptical. I still archive everything in flac, but I've been toying around with ez audio converter and HE-aacv2 and opus with my purchases from qobuz. It's amazing how far lossy audio has come.

If you get a chance, try doing an ABX test in foobar2000.
 

Spider

Member
Joined
Aug 22, 2020
Messages
60
Likes
34
Location
Denmark
Hi @fieldcar, I am not trying to amuse you, but I am an amateur "audiophile", I don't know what an ABX test is, would you care to elaborate.
I am still in the process of reading & understanding the articles you gave me.:facepalm:
 

Spider

Member
Joined
Aug 22, 2020
Messages
60
Likes
34
Location
Denmark
Hi @fieldcar I found this, this is what you are talking about right?


Many experiments have proven that audible differences that listeners can hear between audio sources are sometimes the product of imagination. These illusions can be strong, durable, shared by many listeners, and consistently associated with the knowledge of the audio source that is listened to.

A Double Blind listening Test (DBT) is a listening setup that allows to confirm that a given audible difference is indeed caused by the audio sources, and not just by the listener's impressions.

In an ABX double blind listening test, the listener has access to three sources labeled A, B, and X. A and B are the references. They are the audio source with and without the tweak. For example the wav file and the MP3 file. X is the mystery source. It can be A or B. The listener must guess it comparing it to A and B.

But if the listener says that X is A, and that X is actually A. What does this prove ?
Nothing of course. If you flip a coin in my back and a state that it's heads, and I'm right, it doesn't prove the existence of my para-psychic abilities that allow me to see what's in my back. This is just luck, nothing more !
That's why a statistical analysis is necessary.

Let's imagine that after the listener has given his answer, the test is run again, choosing again X at random 15 times. If the listener gives the correct answer 16 times, what does it prove ? Can it be luck ?
Yes it can, and we can calculate the probability for it to happen. For each test, there is one chance out of two to get the right answer, and 16 independant tests are run. The probability to get everything correct by chance is then 1/2 at the power 16, that is 1/65536. In other words, if no difference is audible, the listener will get everything correct one time out of 65536 in average.
We can thus choose the number of trials according to the tweak tested. The goal being to get a success probability inferior to the likelihood, for the tweak, to actually have an audible effect.
For example if we compare two pairs of speakers. It is likely that they won't have the same sound. We can be content doing the test 7 times. There will be 1 chance out of 128 to get a "false success". In statistics, a "false success" is called a "type I error". The more the test is repeated, the less type I errors are likely to happen.
Now, if we put an amulet besides a CD player. There is no reason that it changes the sound. We can then repeat the test 40 times. The success of probability will then be one out of one trillion (2 to the power 40). If it ever happens, there is necessarily an explanation : the listener hears the operator moving the amulet, or the operator always takes more time to launch the playback once the amulet is away, or maybe the listener perceives a brightness difference through his eyelids if it is a big dark amulet, or he can smell it when it is close to the player...

Let p be the probability of getting a success by chance. It is generally admitted that a result whose p value is inferior to 0.05 (one out of 20) should be seriously considered, and that p < 0.01 (one out of 100) is a very positive result. However, this must be considered according to the context. We saw that for very suspectful tweaks, like the amulet, it is necessary to get a very small p value, because between the expected probability for the amulet to work (say one out of a billion, for example), and the probability for the test to succeed by chance (1 out of 100 is often chosen), the choice is obvious : it's the test that succeeded by chance !
Here's another example where numbers can fool us. If we test 20 cables, one by one, in order to know if they have an effect on the sound, and if we consider that p < 0.05 is a success, then in the case where no cable have any actual effect on the sound, since we run 20 tests, we should all the same expect in average one accidental success among the 20 tests ! In this case we can absolutely not tell that the cable affects the sound with a probability of 95%, even while p is inferior to 5 %, since anyway, this success was expected. The test failed, that's all.

But statistic analyses are not limited to simple powers of 2. If, for example, we get 14 right answers out of 16, what happens ? Well it is perfectly possible to calculate the probability that it happens, but mind that what we need here is not the probability to get exactly 14/16, but the probability to get 16/16, plus the one to get 15/16, plus the one to get 14/16.
An Excel table gives all needed probabilities : http://www.kikeg.arrakis.es/winabx/bino_dist.zip . It is based on a binomial distribution.

Now, how to setup the listening test so that its result, if positive, is really convincing ? There are rules to observe if you don't want, in case of a success, have all your opponent laugh at you.

Rule 1 : It is impossible to prove that something doesn't exists. The burden of the proof is on the side of the one pretending that a difference can be heard.
If you believe that a codec changes the sound, it is up to you to prove it, passing the test. Someone pretending that a codec is transparent can't prove anything.

2. The test should be performed under double blind conditions (*).
In hardware tests, this is the most difficult requirement to meet. Single blind means that you can't tell if X is A or B otherwise than listening to it. Double blind means that nobody in the room or the imediate surrounding can know if X is A or B, in order to avoid any influence, even unconcious, on the listener. This complicates the operations for hardware testing. A third person can lead the blindfolded listener out of the room while the hardware is switched. High quality electronic switches have been made for double blind listening tests ( http://sound.westhost.com/abx-tester.htm ) : a chip chooses X at random, and a remote control allows to compare it to A and B at will.
Fortunately, in order to double blind test audio files on a computer, some ABX programs are freely available. You can find some in our FAQ.

3. The p values given in the table linked above are valid only if the two following conditions are fulfilled :
-The listener must not know his results before the end of the test, exept if the number of trials is decided before the test.

...otherwise, the listener would just have to look at his score after every answer, and decide to stop the test when, by chance, the p value goes low enough for him.
-The test is run for the first time. And if it is not the case, all previous results must be summed up in order to get the result.
Otherwise, one would just have to repeat the serial of trials as much times as needed for getting, by chance, a p value small enough.
Corollary : only give answers of which you are absolutely certain ! If you have the slightest doubt, don't answer anything. Take your time. Make pauses. You can stop the test and go on another day, but never try to guess by "intuition". If you make some mistakes, you will never have the occasion to do the test again, because anyone will be able to accuse you of making numbers tell what you want, by "starting again until it works".
Of course you can train yourself as much times as you whish, provided that you firmly decide beforehand that it will be a training session. If you get 50/50 during a training and then can't reproduce this result, too bad for you. the results of the training sessions must be thrown away whatever they are, and the results of the real test must be kept whatever they are.
Once again, if you take all the time needed, be it one week of efforts for only one answer, in order to get a positive result at the first attempt, your success will be mathematically unquestionable ! Only your hifi setup, or your blind test conditions may be disputed. If, on the other hand, you run again a test that once failed, because since then, your hifi setup was improved, or there was too much noise the first time, you can be sure that there will be someone, relying on statistic laws, to come and question your result. You will have done all this work in vain.

4. The test must be reproducible.
Anyone can post fake results. For example if someone sells thingies that improve the sound, like oil for CD jewel cases of cable sheath, he can very well pretend to have passed a double blind ABX test with p < 0.00001, so as to make people talk about his products.
If someone passes the test, others must check if this is possible, by passing the test in their turn.


We saw what is an ABX test, with the associated probability calculus, that is perfectly suited for testing the transparency of a codec, or the validity of a hifi tweak. But this is only the ABC of statistic tests.
For example, in order to compare the quality of audio codecs like MP3, in bigger scaled tests, ABC/HR test are used (see http://ff123.net/abchr/abchr.html ), that are more sophisticated. Each listener has two sliders and three buttons for every audio codec tested. A and B are the original and the encoded file. The listener doesn't know which one is which. C is the original, that stands as a reference. He must give, using the sliders, a mark between 1 and 5 to A and B, the original getting 5 in theory.
A probability calculation allows then not only to know if the tested codec audibly alters the sound, but also to estimate the relative quality of the codecs for the set of listeners involved, and this, still under double blind conditions, and with a probability calculus giving the relevance of the result. These calculus, according to the needs of the test, can be performed with the Friedman method, for example, that gives a ranking for each codec, or also with the anova one, that gives an estimation of the subjective quality perceived by the listeners on the 1 to 5 scale.

Note that this kind of statistical analysis is mostly used in medicine, and that to get an authorization, any drug must prove its efficiency in double blind tests (both the physicians and the patients ignore if the pill is a placebo or a medication) against placebo (the drug must not only prove that it works, but that it works better than a placebo, because a placebo alone works too), and the decision is based on mathematical analyses such as the one we just saw. Thus they are not quickly made guidelines for hifi tests. They are actually general testing methods used in scientific research, and they remain entirely valid for audio tests.

Links



(*) The double blind setting may be replaced by a carefully set simple blind setting. I saw two accounts of simple blind listening tests that failed, proving that, when done carefully, a simple blind setting is enough to fool the listener.
http://www.hometheaterhifi.com/volume_11_4...ds-12-2004.html
http://www.hydrogenaudio.org/forums/index....f=21&t=7953
  • Last Edit: 2010-08-08 11:40:42 by Pio2001
  • More...

What is a blind ABX test ?​

Reply #1 – 2006-04-14 08:43:06​


Interpretation of a blind listening test

Of course ABX test are not infaillible.
Chaudscisse gave an excellent summary of the drawback of ABX testing in a french forum : http://chaud7.forum-gratuit.com/viewtopic....&start=450#5543
However, since even for french native speakers the text is almost incomprehensible, I'll have to make a summary.

Most often, it is admitted that an event whose probability of not occuring is smaller than 1/20 is "statistically significant". No interpretation, this p value is the result of a mathematical calculus relying only on what have been observed. Former results from similar tests, the quality of the test, and other statistic calculations are not taken into account. However, these events have an influence on the probability that the observed difference is real.
  • Number of testers : Studies made with a small umber of listeners are more sensitive to mistakes occuring in the test setup. Wrong stimulus presented, mistakes copying the results etc. For this reason, when the result depends on one or two people, conclusions must be cautious.
  • Predictability level : there are more chances to have got a success after N tests have been performed, than performing only one test. For example, if we want to test something that has no effect, the result that we get will be decided by chance only. Imagine that 20 people run independant tests. According to chance, in average, one of them should get a "false positive" result, since a positive result is by definition something that occur no more than one time out of 20. The p calculation of each test does not take this into account.
  • Multiple comparisons : if we select two groups in the population, using one criterion, there will be less than 1 chance out of 20 to get a "statistical difference" between the two. However, if we consider 20 independant criterions, the probability to get a significant difference according to one of them is much higher than 1/20.
    For example, if people are asked to rate the "dynamics", "soundstage", and "coloration" of an encoder, the probability to get a false positive is about thrice as high as with one criterion only, since there are three possibilities for the event to occur. Once again, the p value associated with each comparison is inferior to the real probability to get a false positive.
[/i]

The original text is much longer, with some repetitions, and other ideas that I didn't translate, because they are not directly related with ABX tests reliability.

I would like however to add an important point. The interpretation of the p value.
It is by convetion admitted that p<5 % is an interesting result, and p<1% a very significant one. This does not take into account the tested hypothesis itself.

If we are testing the existence of Superman, and get a positive answer, that is "Superman really exists because the probability of the null hypothesis is less than 5%". Must we accept the existence of Superman ? Is it an infaillible, scientific proof of its existence ?
No, it's just chance. Getting an event whose probability is less than 5% is not uncommon.
However, when a listening test about MP3 at 96 kbps gives a similar significant result, we accept the opposite conclusion ! That it was not chance. Why ?
Why does the same scientific result should be interpreted in two opposite ways ? This is because we always keep the most probable hypothesis. The conclusion of an ABX test is not the p value alone, it is its comparison with the subjective p value of the tested hypothesis.

Testing MP3 at 96 kbps, what do we expect ? Anything. We start with the assumption that the odds of success are 1/2. The ABX result then tells us that the odds of failure are less than 1/20. Conclusion, the success is the most probable hypothesis.
Testing the existence of Superman, what do we expect ? That he does not exists. We start with the assumption that the odds of success are less than one in a million. The ABX result then tells us that the odds of failure are less than 1/20. Conclusion, the failure is still the most probable hypothesis.

That's why, in addition with all the statistical bias already mentionned above we should not always take 1/20 or 1/100 are a target final p value. This is correct for tests where we don't expect a result more than another, but for tests where scientific knowledge already gives some information, smaller values can be necessary.
Personnaly, in order to test the existence of Superman, i'd rather target p<1/100,000,000

Examples of false positive results :
Regular ABX, 12/13 right answers by chance.
Sequencial ABX, many results with p < 0.01


This topic can be discussed here : http://www.hydrogenaudio.org/forums/index....topic=43516&hl=
  • Last Edit: 2006-04-14 11:17:21 by Pio2001
  • More...

What is a blind ABX test ?​

Reply #2 – 2010-08-08 11:46:52​


Addition of two introducing sentences that better define the goal of blind listening tests.

Jump to: General Audio

 

fieldcar

Addicted to Fun and Learning
Joined
Sep 27, 2019
Messages
826
Likes
1,267
Location
Milwaukee, Wisconsin, USA
Hi @fieldcar, I am not trying to amuse you, but I am an amateur "audiophile", I don't know what an ABX test is, would you care to elaborate.
I am still in the process of reading & understanding the articles you gave me.:facepalm:
No worries at all. The article that you posted about sums it up for the purpose of the test.

If you have foobar2000, you can add a plugin that lets you sample and test by ear to compare the tracks without knowing what is what. There are articles and YouTube videos on how to set it up.

No need to learn it all in one day. The more time you spend on here, the good people will help you on that audiophile journey as they helped me. Have fun. That's what it's all about.
 

Spider

Member
Joined
Aug 22, 2020
Messages
60
Likes
34
Location
Denmark
No worries at all. The article that you posted about sums it up for the purpose of the test.

If you have foobar2000, you can add a plugin that lets you sample and test by ear to compare the tracks without knowing what is what. There are articles and YouTube videos on how to set it up.

No need to learn it all in one day. The more time you spend on here, the good people will help you on that audiophile journey as they helped me. Have fun. That's what it's all about.
Thanks a lot, @fieldcar, will do, I think everyone is so friendly at ASR.
 
Top Bottom