Interim Report (or Memo)
I have temporarily suspended several aspects of the original plan.
1. Apply XTC to all angles and all heights. And headtrack it all.
:: I realized that this has little significance. The purpose of head tracking for speakers and IEMs (or headphones) is fundamentally opposite. When speaker-based XTC and head tracking are combined and applied without deviation, it results in a state similar to IEM (or headphone) XTC. However, because this state is essentially fixed to an almost unattainable level, IEMs and headphones, instead, introduce an element of instability (such as ITD variations and response deviations), which actually helps to enhance spatial perception.
However, since I don't turn my head and am already satisfied with a perfect state without existing listening deviations, the need for head tracking has diminished somewhat.
(BTW, if you'd like to casually experience head tracking, you could try using the head tracking feature in the Samsung Buds series. Although it applies its own HRTF, one Korean user has successfully enhanced Samsung's spatialization by applying only the early reflection section of the room file, as I suggested. This improved the spatial effect, and head tracking also worked. I anticipate that if you cancel out the built-in HRTF or room response using either the ear microphones or measurement tools and then apply the XTC binaural room impulse response, you should be able to experience head tracking even in that state.)
2. I will test the combination of Atmos and XTC.
:: I'm still debating whether to invest in an Atmos system. However, the most important question is whether I will frequently enjoy Atmos sources and content. And to that, I feel the answer is no.
I don't really play games, and to be honest, I rarely watch movies either. This has made me even more doubtful about the necessity of an Atmos system.
While Atmos wasn't specifically designed for this purpose, I'm still curious about the results of combining Atmos with XTC. As I mentioned before, I had quite an interesting experience with basic multichannel sources (5.1, 7.1), so I think I'll try it at some point in the future. Though, unless it's BRIR-based, it would be difficult to attempt Atmos with XTC.
So, I changed my plan.
Recently, I made a small modification as an experiment. I tried to mimic the On-Axis response, directivity, and Early Reflections based on the Spinorama data of my speakers. Of course, I know we don’t exclusively listen to On-Axis, and this is just a portion of the data.
However, since I can separate direct sound from reflected sound and adjust it as I like, and because the early reflections start after 40–50ms in a large nearfield space, I thought, why not tweak the nuances of the reflections a bit? (Of course, I can't precisely control specific directions, including vertical reflections.) It was just something I did for fun to see what would happen.
(
The following is just my personal experiment, so please don't take it too seriously. )
The blue line represents my On-Axis direct sound, and the red line represents the target, which is the Genelec 8361 (I figured, if I’m doing this, why not aim for something high-end?). While my direct sound already has a consistent tonal balance, I wanted to tweak it a bit further.
I was curious whether slight roll-offs and minor boosts would make a significant perceptual difference in listening. (By the way, I exaggerated the vertical scale on the graph to emphasize the differences.)
And this adjustment would apply to the entire time domain, affecting both the direct sound and the reflections.
And this is another graph. In the Spinorama data, we see the DI (Directivity Index) between the listening window and early reflections, but I created a separate On-Axis DI.
The red line represents the assumed On-Axis DI of the speakers I've been using, and the light blue line represents the assumed On-Axis DI of the Genelec 8361. (As expected, the Genelec shows more uniformity, given its high price.)
The correction based on this graph will only be applied to the reflection regions, roughly after 40ms, since there are no early reflections before that point.
And once the corrections are applied, the final result would look like this, with each time domain adjusted separately.
I'm curious what kind of changes I would perceive after listening to this setup. Although I have limited experience with the "The Ones" series, I wonder if the sound will have even a hint of that "Genelec" character, similar to the speakers I've been using. I'm particularly interested in whether I’ll be able to detect such subtle differences.
And I was quite surprised. While I had been looking at a graph with a vertically exaggerated scale, the actual EQ adjustments were quite minimal. Despite that, the reflections sounded much smoother and more refined, and the overall sound had a slight "Genelec-like" character, as if about 3% of that signature was added.
Though, who knows—it might be because I was thinking about Genelec while listening, which could have influenced my perception. lol
Despite that, the fact that there was such a noticeable change in the sound was surprising. If I break it down more precisely, the off-axis response differences that could affect the front, back, left, right, and even crosstalk (though with my listening distance, the difference was only about 1–2 degrees based on my calculations) could offer interesting results with more detailed adjustments. It might be fun to experiment further with these nuances. It’s a shame that I can’t tweak the vertical aspect, though.
So, just as I was about to perform XTC again with this response, a different idea came to mind a few days ago.
The first idea was to more boldly adjust the ratio between direct sound and reflected sound.
The space is quite large, and with the direct sound being so dominant, the reflections are barely audible even in the custom IEM BRIR configuration with its extremely low noise floor. Previously, I had only adjusted the reflections by about +3dB.
However, when I tried boosting the reflections by +9dB or even +15dB or even more, I found that, while this didn’t physically change the distance, it gave a deeper, more immersive effect, allowing me to perceive the space more clearly. It’s a subtle concept, much like the critical distance, but it was something I had overlooked.
Whether you increase the reflections or decrease the direct sound, the result is the same, but personally, I found it easier to control by reducing the direct sound.
So, with these fun adjustments, the possibilities expanded, and now I have a second idea in mind.
Generally, the listening setup will resemble the one shown in the attached image. What I envisioned, however, was a type of array.
Exactly. While I can’t fully recreate a plane wave, if I simulate it by filling the necessary angles in the front (approximately 0 to 60 degrees) with consistent sound, it would create the perception of a more cohesive sound field. This approach would make the sound feel more like a surface or "plane" rather than a point source.
Of course, since my listening distance is long and the space is large, even a 30-degree stereo image feels like a surface. However, I had a similar experience when I previously manipulated HRTF data to create a comprehensive HRTF set for new reverb experiments. (This is a different topic, but reflections are also influenced by HRTF, so not only do we hear the 30-degree angles in front, but also the reflections from all directions after bouncing off the walls. My goal was to simulate all angles.)
At first, when I only used a few angles, the result felt somewhat incomplete. However, as I started to fill more angles—eventually covering them densely, even vertically—I noticed that the experience transcended simple localization or directional cues. It became less about specific points in space and more about creating a vast, immersive environment. The sound field expanded into something that felt like a large, continuous space, rather than a collection of individual sound sources.
So this time, instead of focusing on reverb, I'm planning to apply this approach to the direct sound region (before 40ms). Based on my previous experience, I anticipate that this will result in a large, unlocalized "surface" of sound, where no single point can be pinpointed.
Additionally, by reducing the ratio of direct sound and perhaps adding a roll-off to the direct sound, I could enhance the sense of depth and distance even further. This approach could create a more immersive and expansive auditory experience.
The downside, however, is that I didn’t record the sound using an array setup like that. Fortunately, beyond a distance of 2 meters, the differences between HRTFs at various distances aren’t significant. Therefore, I expect that if I use the recorded angles as they are, it should be quite similar to how it would sound if I had arranged the array for recording in the first place.
And since the early reflections are significantly delayed and this adjustment will only be applied to the direct sound, I'm even more confident that it won’t make a big difference. Aside from air attenuation, there shouldn’t be any major variation, and the distance changes within the array won’t be significant enough to warrant considering attenuation. Without early reflections, the sound would be perceived equally, so this approach should work well.
However, this also seems like it’s going to be quite a laborious task.
0 degrees (-10 vertical)
0 degrees (0 vertical)
0 degrees (+10 vertical)
0 degrees (+20 vertical)
…
60 degrees
I will need to combine over 50 different responses, with 5-degree increments horizontally and 10-degree increments vertically.
But I am excited because I’m confident that once I integrate everything, it will create an incredibly large, cohesive sound field. And applying XTC to that wave will make it even more interesting.
It feels like after doing all of this, I will have exhausted almost everything that can be done with digital DSP processing and synthesis. There's nothing more to do.