I am quite familiar with the law of large numbers/CLT, having used it to describe the acoustic response of rooms above transition frequencies (number of modes rapidly increases, creating a stochastic response). I hope you well know that it only applies to random events since we know their probabilities which would be the convergent value.
Most importantly, the law is called large numbers because it must not remotely be used in small number of observations. Flipping a dice 3 times and getting heads doesn't mean the next three will be tails to balance them and result in 0.5 probability. Gamblers fallacy will bite you in the behind hard here!
Mind you, when the underlying issue is noise, we do filter. My measurements are filtered to 1/12 octave because the noise there is electric and environmental noise and we are not interested in that. And we pick 1/12 so that we don't filter out what we still want to see. With averaging you have no control over the bandwidth of the filter, nor is strength.
If you want to do something here, pick geometric mean. At least there, you get one of the graphs as the center one and not be subject to smoothing that averaging would do. Problem with this scheme is that it is a) more work and b) when I used it, it didn't generate better results. See this measurement I did with B&K 5128:
Here is the GeoMean:
Notice that it has very high resolution unlike any averaging (even the little wiggles are preserved in 200 Hz region). But that still didn't benefit understanding of this headphone.
My approach is to find a much more reliable representation of the system by measuring multiple times and comparing left and right channels at two frequencies. There is no pretending that there is some magic dust in the form of simple averaging to give us good results out of bad measurements.