• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Replicate imagined speaker sound to headphones. (With Impulcifer)

5.png

4.png


I put the right delay on the Direct Sound I had before and finally made a decay suitable for the concert hall.

0.png


It's very rich and clear. The words rich and clear are hard to coexist, at least in the in-room. The limits are clear.

1.png


2.png


3.png


But the moment you experience that impossible, your brain seems to melt.
Not only that, if you apply HRIR (with ASW/LEV angle Far distance Reverb) with IEM, not headphones, the noise floor is even lower and the dynamic range I hear is incredibly high because the sound insulation is so good.
I also watch it with my breath when everyone in the audience is quiet, and it's really quiet.
But when the song starts, the singer comes directly in front of me in the middle of that large concert hall, blooms the sound directly in my ear, and accordingly, the decay in that big space wraps around me.

:eek:
 
There are very few impulcifer users on ASR, but I thought I'd give a short progress report anyway.
I've been proofreading and cleaning up HRIR/BRIR since last year.
Recording with a binaural microphone in your ear is very effective, but it is also very unreliable, and I've seen other people calibrate their measurements when their own body characteristics are not properly stimulated.
So I've been thinking more and more over the past year to make mine more perfect, more accurate, and closer to that feeling in real life where every part of my body is stimulated.

And this neatly clears the way for full headtracking later this year.

1.png

2.png


The No Smoothing response and the normal 1/3 smoothing.
No imperfections in the speakers and room, no filtering by early reflections, etc.

1721452228311.png


That's what a flat response looks like in an anechoic chamber, with the low frequencies boosted to room gain for normal listening
I don't apply any roll-off to the highs.
They will naturally roll off with distance, depending on the reflections my synthesize.


3.png

4.png



And since there are no variables in the impulse and all responses,, It also works perfectly in XTC(Manual XTC based on Bacch algorithms).
And I'm looking at the responses I have (which I actually recorded), and the HRTF, HRIR, BRIR for different kinds of angles, different heights, and I'm looking at the
I analyzed some kind of relative correlation, and I also tested all the things that I could accurately move the speakers by relative difference (for example, 30 degrees front to 120 degrees rear). ---> All responses are interwoven, and that means you have to proofread both the ITD, the ILD, and the entire response. The process is tedious, but it works well. Perfectly.
I can also manipulate the size of the space, as I synthesize the reflections I want to hear.


So here's my plan for the rest of the year.

1. Apply XTC to all angles and all heights. And headtrack it all.
:
Head tracking in speakers aims to compensate for changes in listening position, while head tracking in headphones aims to fix the 3D space to help improve reproduction a bit. I don't particularly enjoy VR/AR content, so I don't expect it to make much of a difference (since I don't normally turn my head), but I'll do it anyway.

2. I will test the combination of Atmos and XTC.
:
It's already made watching movies in multichannel (7.1) quite an interesting experience.
In the case of a war movie, it was the difference between watching and listening to the movie in normal 7-channel, but in the case of XTC, I felt like I was there.
And I wonder what it would be like to combine this with Atmos, which is object-based. Because no one else is doing it.
 
7.png

8.png


Current process for Headtracking (All angle)
I'm only half way through. this is insane. =(
Up to the height layer (-10, 0, +10, +20)
 
I was organizing my HRTF's correct ITD, asymmetric ITD, ILD by angle, ILD by height, and I tried Reverb again.
As far as I'm concerned, personally

Real Space Binaural Record myself > Ambisonic, DummyHead Binaural > Stereo Record Reverb > True Stereo Digital Reverb > Stereo Digital Reverb > Mono Reverb
I think it's effective in this order.

But it's hard for me to record all the sounds I want in real life since I've already started to chase down unreal sounds.
So I used Canyon Impulse among the impulses I had in my HRTF that erased all the characteristics of the speaker.
Main Angle is 8degree, 30degree Stereo
And For Boost ASW/LEV angle 45~60degree and 110~120degree, Height 0 +10 +20


1.png


1721739115487.png



The addition of quite strong-looking impulses but actually adversely affecting bands (20 to 30 ms) contribute positively only to spatial formation as space-aware reflections begin without significant impact.

Canyon 02 Soft 30m.jpg


This is the image that was being promoted by the Impulse. It's cool.
However, my preference was to cut Decay so that it would only make way for space perception and my HRTFs to stretch out to approximately 100-150ms, and then to adjust the late reverberation with additional VST as below, which was more unrealistic and ideal.
spatial characteristics of Canyon are recognized, but at the same time, the sound combines the clear direct sound of the anechoic chamber.


1721738297351.png


Like this.
It's very cumbersome to manually control the DSP, but I still feel proud at times like this.
In fact, when we listen to speakers in space, we can improve them through upmixes, but the limitations of space are likely to be felt someday.
It's very ideal, but it's a shame that there aren't that many BRIR (especially manually working Impulcifer) users around the world, including ASR and the Korean community. =(
 
Interim Report (or Memo)

I have temporarily suspended several aspects of the original plan.

1. Apply XTC to all angles and all heights. And headtrack it all.
:: I realized that this has little significance. The purpose of head tracking for speakers and IEMs (or headphones) is fundamentally opposite. When speaker-based XTC and head tracking are combined and applied without deviation, it results in a state similar to IEM (or headphone) XTC. However, because this state is essentially fixed to an almost unattainable level, IEMs and headphones, instead, introduce an element of instability (such as ITD variations and response deviations), which actually helps to enhance spatial perception.
However, since I don't turn my head and am already satisfied with a perfect state without existing listening deviations, the need for head tracking has diminished somewhat.
(BTW, if you'd like to casually experience head tracking, you could try using the head tracking feature in the Samsung Buds series. Although it applies its own HRTF, one Korean user has successfully enhanced Samsung's spatialization by applying only the early reflection section of the room file, as I suggested. This improved the spatial effect, and head tracking also worked. I anticipate that if you cancel out the built-in HRTF or room response using either the ear microphones or measurement tools and then apply the XTC binaural room impulse response, you should be able to experience head tracking even in that state.)

2. I will test the combination of Atmos and XTC.
:: I'm still debating whether to invest in an Atmos system. However, the most important question is whether I will frequently enjoy Atmos sources and content. And to that, I feel the answer is no.
I don't really play games, and to be honest, I rarely watch movies either. This has made me even more doubtful about the necessity of an Atmos system.
While Atmos wasn't specifically designed for this purpose, I'm still curious about the results of combining Atmos with XTC. As I mentioned before, I had quite an interesting experience with basic multichannel sources (5.1, 7.1), so I think I'll try it at some point in the future. Though, unless it's BRIR-based, it would be difficult to attempt Atmos with XTC.
So, I changed my plan.

Recently, I made a small modification as an experiment. I tried to mimic the On-Axis response, directivity, and Early Reflections based on the Spinorama data of my speakers. Of course, I know we don’t exclusively listen to On-Axis, and this is just a portion of the data.

However, since I can separate direct sound from reflected sound and adjust it as I like, and because the early reflections start after 40–50ms in a large nearfield space, I thought, why not tweak the nuances of the reflections a bit? (Of course, I can't precisely control specific directions, including vertical reflections.) It was just something I did for fun to see what would happen.
(The following is just my personal experiment, so please don't take it too seriously. :oops: )

1725523322901.png



The blue line represents my On-Axis direct sound, and the red line represents the target, which is the Genelec 8361 (I figured, if I’m doing this, why not aim for something high-end?). While my direct sound already has a consistent tonal balance, I wanted to tweak it a bit further.
I was curious whether slight roll-offs and minor boosts would make a significant perceptual difference in listening. (By the way, I exaggerated the vertical scale on the graph to emphasize the differences.)

And this adjustment would apply to the entire time domain, affecting both the direct sound and the reflections.

1725523796437.png


And this is another graph. In the Spinorama data, we see the DI (Directivity Index) between the listening window and early reflections, but I created a separate On-Axis DI.
The red line represents the assumed On-Axis DI of the speakers I've been using, and the light blue line represents the assumed On-Axis DI of the Genelec 8361. (As expected, the Genelec shows more uniformity, given its high price.)
The correction based on this graph will only be applied to the reflection regions, roughly after 40ms, since there are no early reflections before that point.


1725524016762.png



And once the corrections are applied, the final result would look like this, with each time domain adjusted separately.
I'm curious what kind of changes I would perceive after listening to this setup. Although I have limited experience with the "The Ones" series, I wonder if the sound will have even a hint of that "Genelec" character, similar to the speakers I've been using. I'm particularly interested in whether I’ll be able to detect such subtle differences.

And I was quite surprised. While I had been looking at a graph with a vertically exaggerated scale, the actual EQ adjustments were quite minimal. Despite that, the reflections sounded much smoother and more refined, and the overall sound had a slight "Genelec-like" character, as if about 3% of that signature was added.

Though, who knows—it might be because I was thinking about Genelec while listening, which could have influenced my perception. lol
Despite that, the fact that there was such a noticeable change in the sound was surprising. If I break it down more precisely, the off-axis response differences that could affect the front, back, left, right, and even crosstalk (though with my listening distance, the difference was only about 1–2 degrees based on my calculations) could offer interesting results with more detailed adjustments. It might be fun to experiment further with these nuances. It’s a shame that I can’t tweak the vertical aspect, though.



So, just as I was about to perform XTC again with this response, a different idea came to mind a few days ago.

The first idea was to more boldly adjust the ratio between direct sound and reflected sound.
The space is quite large, and with the direct sound being so dominant, the reflections are barely audible even in the custom IEM BRIR configuration with its extremely low noise floor. Previously, I had only adjusted the reflections by about +3dB.
However, when I tried boosting the reflections by +9dB or even +15dB or even more, I found that, while this didn’t physically change the distance, it gave a deeper, more immersive effect, allowing me to perceive the space more clearly. It’s a subtle concept, much like the critical distance, but it was something I had overlooked.
Whether you increase the reflections or decrease the direct sound, the result is the same, but personally, I found it easier to control by reducing the direct sound.
So, with these fun adjustments, the possibilities expanded, and now I have a second idea in mind.

1725525308268.png



Generally, the listening setup will resemble the one shown in the attached image. What I envisioned, however, was a type of array.


1725525417776.png



Exactly. While I can’t fully recreate a plane wave, if I simulate it by filling the necessary angles in the front (approximately 0 to 60 degrees) with consistent sound, it would create the perception of a more cohesive sound field. This approach would make the sound feel more like a surface or "plane" rather than a point source.

Of course, since my listening distance is long and the space is large, even a 30-degree stereo image feels like a surface. However, I had a similar experience when I previously manipulated HRTF data to create a comprehensive HRTF set for new reverb experiments. (This is a different topic, but reflections are also influenced by HRTF, so not only do we hear the 30-degree angles in front, but also the reflections from all directions after bouncing off the walls. My goal was to simulate all angles.)

At first, when I only used a few angles, the result felt somewhat incomplete. However, as I started to fill more angles—eventually covering them densely, even vertically—I noticed that the experience transcended simple localization or directional cues. It became less about specific points in space and more about creating a vast, immersive environment. The sound field expanded into something that felt like a large, continuous space, rather than a collection of individual sound sources.

So this time, instead of focusing on reverb, I'm planning to apply this approach to the direct sound region (before 40ms). Based on my previous experience, I anticipate that this will result in a large, unlocalized "surface" of sound, where no single point can be pinpointed.
Additionally, by reducing the ratio of direct sound and perhaps adding a roll-off to the direct sound, I could enhance the sense of depth and distance even further. This approach could create a more immersive and expansive auditory experience.


The downside, however, is that I didn’t record the sound using an array setup like that. Fortunately, beyond a distance of 2 meters, the differences between HRTFs at various distances aren’t significant. Therefore, I expect that if I use the recorded angles as they are, it should be quite similar to how it would sound if I had arranged the array for recording in the first place.

1725525866370.png


And since the early reflections are significantly delayed and this adjustment will only be applied to the direct sound, I'm even more confident that it won’t make a big difference. Aside from air attenuation, there shouldn’t be any major variation, and the distance changes within the array won’t be significant enough to warrant considering attenuation. Without early reflections, the sound would be perceived equally, so this approach should work well.


However, this also seems like it’s going to be quite a laborious task.

0 degrees (-10 vertical)
0 degrees (0 vertical)
0 degrees (+10 vertical)
0 degrees (+20 vertical)

60 degrees

I will need to combine over 50 different responses, with 5-degree increments horizontally and 10-degree increments vertically.
But I am excited because I’m confident that once I integrate everything, it will create an incredibly large, cohesive sound field. And applying XTC to that wave will make it even more interesting.
It feels like after doing all of this, I will have exhausted almost everything that can be done with digital DSP processing and synthesis. There's nothing more to do.
 
1725792814161.png

5 degree intervals from 0 degree to 60 degree horizontal line. 10 degree intervals from -10 degree to 20 degree vertical line.
52 responses in total. Includes all of the lowest ITD 20.8us to 479.2us except mono.

001.png

No Smoothing. (The first early reflection regions after 40 ms are all the same. So the space doesn't change at all, I only added direct negative HRTF.)

002.png


1/6 smoothing for easy viewing.


And this is, as I expected, the sense of status becomes blurred, and the whole front comes in all at once.
It's not the same as plane waves, but it fits well with the purpose of trying to implement something that comes at the same pressure at the angle of the front that I can perceive, and it's very enormous (only direct sound, excluding reflection)

Now... Here's what I need to add to this.
How to match the angle of the back. I don't mind just adding direct sounds. But I don't really want to. And my ASW/LEV is already set in the realm of early reflection, late reflection anyway, so it will be more recognizable when I add these angles directly to the shade (or a little bit of delay_such as hass effect).

And once i've done a really perfect construction of all of this, I think it's going to be over now with XTC.
 
result of carefully calibrating the response.

1726664442725.png



~40ms Pure Direct Sound. no smoothing



1726664486554.png


And because pure responses alone cannot give you acoustic fun, a spectrum of room gain and roll-off in a huge space (after 40ms~)


1726664575267.png


Same response, but ear response full time range. (~1000ms)


1726664621662.png


Also, the HRIR part before 40ms and the BRIR 40ms spectrum after that


1726664669703.png


And finally, when pure direct sound is heard by my ears and body, accurate comparison
 
The seminal writer on the subject of human hearing, Blauert seemed to suggest two modes for distance - the one that most people use because it's easy is the ratio of direct to reflected sound (the Goon Show characters down the well on the BBC being a memorable example) but that isn't the only one possible. You can accurately discern direction and distance in an anechoic chamber. The other mechanism he cited was the temporal relationship of sounds of different frequencies since they travel through air at a different speed. To me the low frequencies are just as important for distance reproduction and if anyone has added a sub and suddenly got the impression of increased closeness/immediacy and scale this is apparent.

Most of the later work we rely on was done by Elizabeth Wenzel et al at NASA and was less inspired by the physicality of hearing (Blauert cut up the organs of hearing themselves!) IMO but introduced the idea of brute force HRTF measurement and convolution which is the basis of most of what followed. Mostly I think in an effort to make auditory-augmented displays for military aircraft though also leading to things like the Gravis UltraSound in PC-land. Some 30 years ago I similated pinnae reflections by mixing the outputs of bucket brigade delays I had calibrated to approximate reflections at the angle of speakers.. and some crosstalk to approximate the head shadow and after considerable work was greated with a lovely stereo image.. across my forehead. o_O That one went back in the drawer and I watched such later developments as the impulse response measurement of acoustic spaces for more natural reverb simulations and more refined treatments - including a major piece of the puzzle - head tracking - as they arrived.
 
Last edited:
The seminal writer on the subject of human hearing, Blauert seemed to suggest two modes for distance - the one that most people use because it's easy is the ratio of direct to reflected sound (the Goon Show characters down the well on the BBC being a memorable example) but that isn't the only one possible. You can accurately discern direction and distance in an anechoic chamber. The other mechanism he cited was the temporal relationship of sounds of different frequencies since they travel through air at a different speed. To me the low frequencies are just as important for distance reproduction and if anyone has added a sub and suddenly got the impression of increased closeness/immediacy and scale this is apparent.

Most of the later work we rely on was done by Elizabeth Wenzel et al at NASA and was less inspired by the physicality of hearing (Blauert cut up the organs of hearing themselves!) IMO but introduced the idea of brute force HRTF measurement and convolution which is the basis of most of what followed. Mostly I think in an effort to make auditory-augmented displays for military aircraft though also leading to things like the Gravis UltraSound in PC-land. Some 30 years ago I similated pinnae reflections by mixing the outputs of bucket brigade delays I had calibrated to approximate reflections at the angle of speakers.. and some crosstalk to approximate the head shadow and after considerable work was greated with a lovely stereo image.. across my forehead. o_O That one went back in the drawer and I watched such later developments as the impulse response measurement of acoustic spaces for more natural reverb simulations and more refined treatments - including a major piece of the puzzle - head tracking - as they arrived.
Thank you for the interesting story.
I've tried to use artificial echoes, but it was quite difficult to really replicate HRTF in the omnidirectional reflection area, which is pouring out in myriad random ways. So it was best to refine and use the actual recorded reflection sound in a large space.
 
Back
Top Bottom