Limitations of blind testing procedures

tomelex · Jun 30, 2017

Cosmik said:
What I was trying to point out was that a black box can be made to do anything to the sound, even transform everything that goes into it into a beautiful chorus of tweeting birds. The listener's preference for what comes out doesn't tell you anything about the box's performance as an amplifier. If there is some stipulation that the black box must meet the "definition of an amplifier" then that is a difficulty, as no box meets the true definition of an amplifier because that would mean perfection.

I reckon it would be possible to make an 'amplifier' that measured normally with sine waves, but did strange things to music, widening the stereo separation, adding 'echo' to stuff, adding 'chorus' effects. Maybe it could Shazam for known audiophile music, and emphasise the things that audiophiles are known to like. At a superficial level it could go all out to elicit preference votes from audiophile listeners. It could randomise what it does, always aiming to stimulate humans' preferences for novelty.

After living with it for a year a listener might grow to despise it, but that wouldn't matter for the two hour preference-based listening test.

A $100 DSP board might be "preferred" to a $100,000 conventional amplifier, but what audiophile would actually admit it, and buy the DSP board? The truth is, the preference is not for the sound, but for the knowledge of what the hardware is. Listening tests are restricted to known selections of hardware so the listeners can't 'get it wrong'.

As a person who rejects the whole idea of listening tests, I could not be 'fooled' in this way. I know that I might prefer the sound of a DSP effect (for a couple of hours, anyway), but I wouldn't base my judgement of the quality of the hardware by listening preference alone - the original claim being that audiophiles are ultimately only interested in "preference".

I thought the whole idea of DBT was basically to see if one could detect a difference. It's simple, if one detects a difference, repeatedly, say 8 times in a row, it means that one can detect a difference, thus the amps are different. Nothing to do with preference really.

However, after living with two black boxes in your house for audition to choose one as your personal amp, for days, weeks, months and swapping between them at will, and in the end you could not detect a difference, then, then, wait for it, we reveal that one amp was a Pioneer and the other a Mark Levinson......... ------now, which amp are you gonna buy...... me, the one that was less expensive, because I am weird in that I reward value for money when it comes to stuff like electronics and lawnmowers and handguns* and stuff like that.

*it's a Yank thing

Reginald Addison · Jun 30, 2017

Certainly there is a reward risk factor. Moreover it is more difficult to detect small differences.
If you a fatal disease you would accept a less effects e treatment. That is true even I'd the treatment might kill you.
A 40% cure rate for certain types might be considered a miracle.

Jakob1863 · Jun 30, 2017

amirm said:
The only way to know with 100% certainty that the coin is fair is to measure it.

100% certainty isn´t available (i should say most likely

), because there still is measurement uncertainty; in the past it was called "measurement error" but todays stance at it is to call it "uncertainty".

The method of testing that with trials this was gives us probabilities of whether it is, or is not fair. It gives no guarantee.

We can´t emphasize that fact to much, that´s why i wrote that ever so often that it´s all about probabilities....

And to that end, you need to decide what is good enough probability. It becomes a philosophical question in a hurry. If I am taking a drug, is 90% chance good enough that it is effective and hence I should trust my life to it? I would say no. What if it is a bet that the cost is that I lose the coin? Then 90% is fine. What if it is your audio reputation? What should the odds be then?

Of course, and one of the reasons why i was surprised as Blumlein88 denied the notion that audio tests are sort of behavioral experiments and the results dependent on the experimenter(s) (among other factors).

For five events, you get 0.5*.5*.5*.5*.5 = .031 or put inversely, the chances of this coin being unfair is 97%.

The probablity to get 5 successes in 5 trials per chance is p=0.031, but that isn´t equivalent to 97% probability of being a unfair coin.

The "standard" in the industry is to assume anything over 95% is good enough. Personally I have learned that it is not. I like to get near 100%.

The SL=0.05 criterion is a historical convention (introduced by R.A Fisher) and later retained due to the fact that calculating probalities was quite tedious in the precomputer/precalculator era and the table work had to be restricted to a few choices.
But, as seen above in the case of 5 trials the actual SL is already lower at 0.0315, and as we always want to do additional experiments its a nice compromise wrt the effort. But only justifiable if the detection ability (and/or the effect size) is already very high, otherwise the probability of committing Beta errors will "skyrocket" .

Take these results of real ABX testing:

07:27:54 : Test started.
07:28:25 : 00/01 100.0%
07:28:40 : 00/02 100.0%
07:28:52 : 01/03 87.5%
07:29:04 : 02/04 68.8%
07:29:17 : 03/05 50.0%
07:29:38 : 04/06 34.4%
07:29:50 : 05/07 22.7%
07:30:35 : 05/08 36.3%
07:30:46 : 05/09 50.0%

Notice how I got five successes indicated in blue but then went off the rails and could no longer get the correct answer. So I like to see minimum of 8 trials to get comfort.

Thanks for the opportunity to discuss a common misconception.
The example above is a 9 trial ABX, so the sample space contains 512 elementary events. To get the number of the favorable events we have to sum up the elementary events that contain 5 successes in a row (if we would allow 5 successes overall the number would be even higher) which is 29, so the probability to get a result that contains 5 successive hits is already higher at p=0.0566.
Furthermore if you had choosen your example from, lets say a pool of 10 ABX, then we have to calculate the probability to get at least one result containing 5 successive hits when doing 10 of these ABXs, which is p=0.44 .

That is apparently quite different from the situation where only one 5 trial A/B was done.

Now you may say we should include the results of the other parties with yours. To do that we need to know that that the test conditions were identical. If they used different setup then we can't do that. They needed to have come to your house and run the test as you did. Otherwise their odds of being right or wrong is just 50% so we can't do anything with that.

Although i´d agree for a lot of conclusions that couldn´t be based on our results, i beg to differ wrt to the question if inclusion/combination of results is justified. If the null hypothesis that we test against states, that it is just random guessing, then inclusion is justified and the resulting probability that both experimental results were due to chance is p=0.000977.
As it would be in the example of our coin send to 5 people, it doesn´t matter if they all tossing it at high noon in the same place or all at different times at different places.
I´d even argue that the fact that it were 5 people doing it at different places with different systems provide additional evidence that the results were not due to a systematic error.

amirm · Jun 30, 2017

Jakob1863 said:
The example above is a 9 trial ABX, so the sample space contains 512 elementary events. To get the number of the favorable events we have to sum up the elementary events that contain 5 successes in a row (if we would allow 5 successes overall the number would be even higher) which is 29, so the probability to get a result that contains 5 successive hits is already higher at p=0.0566.
Furthermore if you had choosen your example from, lets say a pool of 10 ABX, then we have to calculate the probability to get at least one result containing 5 successive hits when doing 10 of these ABXs, which is p=0.44 .

That is apparently quite different from the situation where only one 5 trial A/B was done.

The example I posted is one that I had saved. I have actually done trials where I got the first five correct and then had no success to speak of from there. It is based on that experience that I say that if you want to make your point that there really, really is an audible difference as you are, then you better get it all right or almost all right. And with lots of trials. After all, isn't that the statement you made? That you and others heard it so it "must be there?"

Here are some other failed actual results I have saved on my computer after searching some more right now:

14:50:24 : Test started.
14:51:40 : 01/01 50.0%
14:52:05 : 02/02 25.0%
14:52:33 : 03/03 12.5%
14:52:57 : 03/04 31.3%
14:53:22 : 03/05 50.0%
14:54:11 : 03/06 65.6%
14:54:54 : 04/07 50.0%
14:55:21 : 05/08 36.3%
14:56:02 : 06/09 25.4%
14:56:55 : 07/10 17.2%
14:57:24 : 08/11 11.3%
14:58:17 : 08/12 19.4%
14:59:37 : 09/13 13.3%
15:00:44 : 10/14 9.0%
15:03:01 : 10/15 15.1%
15:04:11 : 11/16 10.5%
15:04:32 : Test finished.

So it is readily possible to get sequences of success and while the above is not one where that happened right at the start, I have had that happen. And I consider the above total failure to detect the difference than 90% confidence in the truthfulness of the hypothesis.

This is what I like to see:

19:04:40 : Test started.
19:05:27 : 01/01 50.0%
19:05:54 : 02/02 25.0%
19:06:19 : 03/03 12.5%
19:06:35 : 04/04 6.3%
19:06:57 : 05/05 3.1%
19:07:16 : 06/06 1.6%
19:07:43 : 07/07 0.8%
19:08:15 : 08/08 0.4%
19:08:37 : 09/09 0.2%
19:09:05 : 10/10 0.1%
19:09:30 : 11/11 0.0%
19:10:05 : 12/12 0.0%
19:10:09 : Test finished.

----------
Total: 12/12 (0.0%)

Note that it is important to have such records if you want to be convincing. Events like this can become fish stories where the mind exaggerates/forgets what really went on as far as number of trials, failures, etc.

Net, net, I am not convinced that you did indeed hear a reliable difference. You need to take notes and run the test enough times to give very high confidence in results and you simply don't have. And I would again, start with a test of difference when that is what is at dispute more than preference.

amirm · Jun 30, 2017

Jakob1863 said:
Although i´d agree for a lot of conclusions that couldn´t be based on our results, i beg to differ wrt to the question if inclusion/combination of results is justified. If the null hypothesis that we test against states, that it is just random guessing, then inclusion is justified and the resulting probability that both experimental results were due to chance is p=0.000977.
As it would be in the example of our coin send to 5 people, it doesn´t matter if they all tossing it at high noon in the same place or all at different times at different places.

But that is not the test you conducted. You may have tossed your coin in the wind outside and they did it inside.

It is simple rule of statistics that results cannot be combined unless the test is identical. You sending the amp to others to test as they pleased is not that.

In the case of an amplifier they may have presented an entirely different load to it than you did. And that change in loading made the amp act differently than it did for you.

What if they had all come back back with different outcome then yours? Would you have been just as anxious to include their results in yours?

amirm · Jun 30, 2017

Jakob1863 said:
100% certainty isn´t available (i should say most likely ), because there still is measurement uncertainty; in the past it was called "measurement error" but todays stance at it is to call it "uncertainty".

I addressed this. Namely, you only need to be accurate enough to be below the randomness that other factors present such as air turbulence, the surface that the coin hits, randomness in how it is flipped, etec. Those are the things that set the upper bound.

It is like a DAC. Thermal noise and such stop us from ever achieving accurate rightmost bit in a 24-bit DAC. So there is no reason to accurately represent that bit in the core of the DAC.

Jakob1863 · Jul 2, 2017

amirm said:
But that is not the test you conducted. You may have tossed your coin in the wind outside and they did it inside.

In this analogy even that wouldn´t be problematic unless it is known that a fair coin isn´t fair anymore if used in the wind outside.

It is simple rule of statistics that results cannot be combined unless the test is identical.

In general this assertion is simply incorrect, see for example the use of meta analysis.
To examine specific topics it is sometimes mandatory, but an argument why it has to be in our example is lacking.

You sending the amp to others to test as they pleased is not that.

In the case of an amplifier they may have presented an entirely different load to it than you did. And that change in loading made the amp act differently than it did for you.

The participants got the two preamplifiers to evaluate them following their usual routine in doing such tests. None of them used 1000 feet long interconnects or use amplifier with input impedance so low to cause any problems.

Otoh, if using preamplifiers in different conditions (within their given operating limitations) leads to different sound qualities, why wouldn´t that qualify. Isn´t that the issue at stake?

What if they had all come back back with different outcome then yours? Would you have been just as anxious to include their results in yours?

We have to tell all results.

Jakob1863 · Jul 2, 2017

amirm said:
I addressed this. Namely, you only need to be accurate enough to be below the randomness that other factors present such as air turbulence, the surface that the coin hits, randomness in how it is flipped, etec. Those are the things that set the upper bound.

<snip>

No, you haven´t....

Measurement values incorporate uncertainty, which can be expressed (usually as an interval), if "you only need to be accurate enough..." that means you´re using a model of reality. That´s another source of uncertainty, combining that with measurement uncertainty doesn´t evoke being correct overall.
Therefore usually experiments were done to provide more evidence and that might or might not confirm that .....

But, beside this nice thought experiment, if you get a coin, you don´t have all the measurment gear at hand that you´d need, but to do a couple of coin toss trials is always possible.

amirm · Jul 2, 2017

Jakob1863 said:
In general this assertion is simply incorrect, see for example the use of meta analysis.
To examine specific topics it is sometimes mandatory, but an argument why it has to be in our example is lacking.

What if I wired the amp wrong and tested one in mono and the other in stereo? And you wired it right. Still think you can combine results?

As to meta analysis, I have not seen it be used to take two statistically invalid results to create one that is.

Jakob1863 · Jul 3, 2017

amirm said:
What if I wired the amp wrong and tested one in mono and the other in stereo? And you wired it right. Still think you can combine results?

Amirm, i´m a bit wondering about your argueing, as you imo constantly shift the topic. First it was different setup, now it is failed setup.
As both preamplifier samples only provide one pair of rca input jacks (in the typical vertical arrangement) and one pair of rca output jacks (also arranged in the typical vertical arrangement) input and output seperated by a few inches, your scenario is quite unlikely.
But of course, as said right from the beginning, control can´t be as tight in this sort of test as it can be in other tests.

Otoh, if you want to imagine all possible sort of problems in a test, just because it wasn´t explicitely mentioned that they didn´t happen, i think that most likely no test at all will be good enough and that would include your tests as well. A foobar protocol in no way ensures that you hadn´t done something wrong.

As to meta analysis, I have not seen it be used to take two statistically invalid results to create one that is.

Isn´t that shifting again?
What do you mean by "statistically invalid results" ?
If you mean "statistically not significant" then the opposite is true; as the data is combined in a meta-analyis power is substantially higher and therefore quite often the result highly significant.

amirm · Jul 3, 2017

Jakob1863 said:
Amirm, i´m a bit wondering about your argueing, as you imo constantly shift the topic. First it was different setup, now it is failed setup.

Your focus in these discussions has been that of statistics. That is the topic we are still discussing include what is a fair coin which has no similarity at all to your amplifier tests. In that manner but much closer to audio, I was presenting you with a hypothetical which invalidates the notion that the results can be combined.

Isn´t that shifting again?
What do you mean by "statistically invalid results" ?
If you mean "statistically not significant" then the opposite is true; as the data is combined in a meta-analyis power is substantially higher and therefore quite often the result highly significant.

Your friends only did one test. There was no conclusion you could draw out of that. You can't combine that outcome with anything else and call it similar to meta analysis. Meta analysis attempts to draw conclusions from a set of valid tests. Otherwise itself is invalid.

Anyway, I will repeat my original opinion. In situations where the objective data says there should not be difference in sound of audio equipment, it is best to test for that hypothesis using sufficient number of trials. The process should be documented properly as to remove any doubt as to memory of the test. Doing preference test with a few samples is not going to generate valid, defensible outcomes.

If you have the equipment still, I encourage you to do a difference test and run it a dozen times while taking note. Have a loved one change from A to B. Other than the one amplifier test I have shown, I am not aware of any formal tests where someone has passed such tests. Yours may be one and would be something to discuss.

Jakob1863 · Jul 4, 2017

amirm said:
Your focus in these discussions has been that of statistics. That is the topic we are still discussing include what is a fair coin which has no similarity at all to your amplifier tests. In that manner but much closer to audio, I was presenting you with a hypothetical which invalidates the notion that the results can be combined.

You brought the coin to our discussion, but unfortunately with an example that really did not resemble what we have done . You stated, sending three coins to three people wouldn´t mean anything, even if all three got heads. In reality we used - to stay with the coin analogy - _one_ coin and send it to _five_ different people.

So, in fact you argued with a quite distorted version of "the real thing" .
Then you argued that 5 hits (means SL=0.031) weren´t good enough because you sometimes got 5 successes in 8 - 12 trial ABX, which isn´t a convincing argument due to the statistics involved. I´ve explained why.
In your post you wanted to have a minimum of _8_ trials to have comfort.
Then you asserted that statistical theory prohibited to combine experimental results if the experiments weren´t identical, which is obviously wrong.
I provided an example (i.e. the meta analysis) to illustrate that it is wrong in the stated general meaning.

Then you introduced the term "statistically invalid" which is at least ambiguous, i asked for an explanation, but got no answer and you raised the needed number of trials to 12.

I hope you understand why i am a bit irritated about our discussion.

Your friends only did one test. There was no conclusion you could draw out of that.

Please supply an argument.
The five participants did one trial each, that sums up to a five trial test. Statistically it doesn´t matter if one listener does 5 trials or five people do one trial each. The latter is even slightly advantegeous because the trials are really independent and as they used different systems the generalibility is better either.

You can't combine that outcome with anything else and call it similar to meta analysis. Meta analysis attempts to draw conclusions from a set of valid tests. Otherwise itself is invalid.

As i did not combine it with "anything else" i can´t follow; again please provide an argument why the tests weren´t valid.

Anyway, I will repeat my original opinion. In situations where the objective data says there should not be difference in sound of audio equipment, it is best to test for that hypothesis using sufficient number of trials. The process should be documented properly as to remove any doubt as to memory of the test. Doing preference test with a few samples is not going to generate valid, defensible outcomes.

Opinion is fine, but to supply underlying arguments would be better. If we just exchange opinions we will not gain better insight and readers most likely weren´t either.
What we have done were directional paired comparisons, which are _discrimination_ tests.

If you have the equipment still, I encourage you to do a difference test and run it a dozen times while taking note. Have a loved one change from A to B. Other than the one amplifier test I have shown, I am not aware of any formal tests where someone has passed such tests. Yours may be one and would be something to discuss.

I like especially the "have a loved one" piece.......

Overall to demand another test and even another test protocol, because one doesn´t like the result is of course human but will it ever end at some point?
Wrt these sort of tests, there were already some, where some listeners passed the tests, see for example the PCM/DSD comparison test by Blech/Young (which otoh, as i´ve written quite often, illustrates that even a large sample experiment, in most parts well planned and executed, at the end wasn´t satisfying)

amirm · Jul 4, 2017

Jakob1863 said:
Please supply an argument.

How about this one: I am way past my interest level in arguing.

I said my peace in my last post. I will repeat again: if you are confident you have found differences in those two amps, do another test of difference, do it double blind, document it carefully and then post the results. That would be something meaningful to chew on.

tomelex · Jul 4, 2017

The primary limitation in audio DBT compared to sighted testing is the in-ability of the person taking the test to believe how biased they are based on brand and cost and other sighted factors. In other words, their belief system, all that they hold sacred, is challenged.

Most objectivists, on the other hand, know damn well the limitations of our senses. That's why on another forum, when different tests were put out there to try, all the heavy breathers spouting their hearing skills would never do the online tests, while the objectivists did. You cant argue with that fact.

Jakob1863 · Jul 5, 2017

amirm said:
How about this one: I am way past my interest level in arguing. I said my peace in my last post. I will repeat again: if you are confident you have found differences in those two amps, do another test of difference, do it double blind, document it carefully and then post the results. That would be something meaningful to chew on.

Taking the emergency exit while in slight argumentative distress is totally dignified ......(just a bit of teasing

)

But in any case, we have to solve the mistery that one listeners probability to get 5 hits (by random guessing) when doing 5 trials is supposed to be 0.031 while 5 listeners chance to get 5 hits (by random guessing) when doing 1 trial each is allegedly 50%.

I´m afraid but i don´t know any argument that could support the assertion, as normal reasoning suggests that each listener has a 50% chance to get a success (by random guessing) and therefore the probability that all 5 get a success (by random guessing) is also 0.031 (0.5 power 5).

The additional question, if it is appropriate to combine nonidentical experiments, is essential for any sort of review of multiple experiments and is therefore addressed for example by:

"Studies are rarely identical replications of one another, so including studies
that are diverse in methodology, measures, and sample within your meta-
analysis has the advantage of improving the generalizability of your conclusions
(Rosenthal & DiMatteo, 2001)."
(Noel A.Card; Applied Meta-Analysis for Social Science Research, The Guilford Press, 2012)

The experimental approach we used is called "directional paired comparison" (forced choice), because it wasn´t sufficient to prefer one of both, but it had to be a specific one to succeed.
And although there might be preference without a real difference (see for example discussions around time error/presentation order effects) there is a countermeasure provided by randomizing the label of both DUTs between the participants, so it couldn´t favor an identical choice of all listeners.

Discussion is always needed as there most likely no flawless experiment exists and a lot of experimental decisions are of a subjective nature or given by external factors.

Jakob1863 · Jul 6, 2017

tomelex said:
The primary limitation in audio DBT compared to sighted testing is the in-ability of the person taking the test to believe how biased they are based on brand and cost and other sighted factors. In other words, their belief system, all that they hold sacred, is challenged.

I don´t get it why that is a limititation of the "DBT" . Do you mean it in a sense that a person has to realize first that he is prone to bias effects before even thinking about taking a controlled listening test?

Most objectivists, on the other hand, know damn well the limitations of our senses.

I would often question the assertion that most of the people demanding "DBTs" are really objectivists and are therefore unwilling to realize the limitations of controlled experiments of the usual kind.
And that has a long tradition. As an example, Les Leventhal wrote an article /1/ that was published in the JAES in 1986 about the risk of committing beta-errors (which means to _not_ reject the null hypothesis although it is wrong) if running the usual 16 trial ABX. The response letter of Shanefield, Nousaine and Clarke /2/ didn´t express exactly appreciation to get the chance to improve the test schemes but instead reluctance. Dan Shanefield wrote imo the most professional answer part in asserting that, while Leventhal was correct, a difference wouldn´t be of importance if not detected under the test conditions. I was wondering at that time when reading the exchange about the slight hostility (only slight, because rules in a Journal like the AES prohibit any heated exchange) especially from Nousaine and Clark because it didn´t really fit the assumption that the goal was to find the truth.

The discussion went further in the letter pages of stereophile where the rules weren´t as strict. /3/
A couple of years later, knowing a bit more about the basics of experimental work, i realized that Leventhal wasn´t introducing a new revolutionary idea but did only report something known for ages (i.e. the concept of statistical power analysis) and therefore i was even more surprised by the reactions.

So it seems that it is quite often not about objectivism but about defense (or promotion) of just another belief systems.

/1/ Les Leventhal, Type 1 and Type 2 Errors in the Statistical Analysis of Listening Tests, JAES Volume 34 Issue 6 pp. 437-453; June 1986
/2/ Daniel Shanefield, David Clark, Thomas Nousaine, Les Leventhal; Comments on "Type 1 and Type 2 Errors in the Statistical Analysis of Listening Tests" and Author´s Reply, JAES Volume 35 Issue 7/8 pp. 567-572; July 1987
/3/ https://www.stereophile.com/content/highs-lows-double-blind-testing-page-2

Edit: added references

Purité Audio · Jul 6, 2017

My belief system states that if I am listening to two components ,level matched and unsighted and I cannot hear difference between the two, then there isn't any difference and I can move on.
In audio terms I am looking for large unequivocal improvements in SQ, not differences improvements.
Keith

Jakob1863 · Jul 6, 2017

Purité Audio said:
My belief system states that if I am listening to two components ,level matched and unsighted and I cannot hear difference between the two, then there isn't any difference and I can move on.

As i can´t know what you hear, i can´t assert the opposite but you face the same problem as everyone else in (unsighted or not) listening.
Don´t you hear a difference because there isn´t a (perceivable) difference or are you just biased?

In audio terms I am looking for large unequivocal improvements in SQ, not differences improvements.
Keith

That makes two of us, therefore i am not a friend of fast switching testing with short snippets of music, because the practical relevance is questionable although there are undoubtely advantages wrt test efficiency and research hypothesises where that approach is well applicable.

Purité Audio · Jul 6, 2017

You don't have to fast switch, you can listen to each sample for as long as you like ,you just can't know which you are listening to.
Keith

March Audio · Jul 6, 2017

Jakob1863 said:
As i can´t know what you hear, i can´t assert the oppisite but you face the same problem as everyone else in (unsighted or not) listening.
Don´t you hear a difference because there isn´t a (perceivable) difference or are you just biased?

.

I think someone that accepts the reality of sighted listening, ie massive potential for bias, has less invested in the outcome of an un-sighted test. They understand their aural and psychological limitations in a way the typical audiophile doesn't. They dont make up excuses, such as "im too stressed to tell the difference cos I'm being tested" or "its an unfamiliar environment and I lose the ability to hear". In simple terms, I dont find that the typical blind advocate has anything to prove, they are just realists - unlike many audiophiles Im acquainted with.

Does that opinion of mine prove blind advocates are unbiased, no not really. All I know is that every single time I put ardent audiophiles under the most unobtrusive of controls, they totally lose their ability to discern differences they claim to find under sighted conditions.

Even if their claims of stress due to the listening conditions are true (which personally I dont buy into), all it shows is that the differences they suddenly cant hear are totally trivial. Do you really buy into the idea that they go significantly deaf because they are being tested?

Limitations of blind testing procedures

Addicted to Fun and Learning

Member

Addicted to Fun and Learning

Founder/Admin

Founder/Admin

Founder/Admin

Addicted to Fun and Learning

Addicted to Fun and Learning

Founder/Admin

Addicted to Fun and Learning

Founder/Admin

Addicted to Fun and Learning

Founder/Admin

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Master Contributor

Addicted to Fun and Learning

Master Contributor

Master Contributor

Similar threads