How to achieve height sound effect on a desktop 2.1 system without Dolby Atmos? Simulate a cathedral-like or concert hall height.

mel · Jul 16, 2021

mel said:
The argument for a 2.2.2 system over a 5.1.4 or 7.1.4 appears to be strengthen from the following. However, how to improve the sense of height in audio in specific technical terms, still remains unclear.

The key points:

The pinnae filters the sound in a way that is directionally dependent.

This is particularly useful in determining if a sound comes from above, below, in front, or behind.

Interaural time and level differences (ITD, ILD) play a role

in azimuth perception

but can’t explain vertical localization.

Frequencies above 7kHz are required for vertical sound resolution

Particularly frequencies below 1kHz for lateral sound resolution

Once the brain has analyzed IPD, ITD, and ILD,

the location of the sound source can be determined with relative accuracy.

The average human has the remarkable ability to locate a sound source with better than 5◦ accuracy in both azimuth and elevation, in challenging environments

View attachment 141250

FR curve of both ears:

The pinnae filters the sound in a way that is directionally dependent.

This is particularly useful in determining if a sound comes from above, below, in front, or behind.

Interaural time and level differences (ITD, ILD) play a role

in azimuth perception

but can’t explain vertical localization.

Once the brain has analyzed IPD, ITD, and ILD,

the location of the sound source can be determined with relative accuracy.

The average human has the remarkable ability to locate a sound source with better than 5◦ accuracy in both azimuth and elevation, in challenging environments

Please note the angles are 65 and 104 degrees, not quite 180 degrees apart.
I am unsure whether a super tweeter compensates for HF filter suppression.
- I assume GLM software can easily provide phase inversion and delay characteristics

Rotating chair for height resolution.4.png

https://en.wikipedia.org/wiki/Dummy_head_recording#Technical

Conventional music recording is produced for stereo playback which makes use of only Left and Right playback for speakers and headphones. The implementation of Dummy Head allows the recording artist to make use of three dimensional sound reproduction. This is because through playback via headphones the listener perceives sound as if they were in the position of the dummy. The recording is perceived through the pinnae of the dummy head.

Simulated dummy head recording Takes place through the digital signal processing (DSP) where the signal is sent through a complex mathematical algorithm imprinting limited HRTF information creating the binaural effect. This process is called HRTF-based binaural algorithm.[6]

The dummy head could also be used to imprint positional information on prerecorded sound effects by playing sounds through a loudspeaker in a suitable orientation to the head. For example thunder and birdsong sounds to be played above the dummy head.

Through the manipulation of the parameters, sound engineers could take a monophonic recording of a passing car and make it sound as if it were passing behind them in real time. Recording with an actual dummy head for the same outcome would require a recording booth and a moving speaker, or an array of speakers as well as multiple panning or switching devices.

mel · Jul 16, 2021

mel said:
Please note the angles are 65 and 104 degrees, not quite 180 degrees apart.

I am unsure whether a super tweeter compensates for HF filter suppression.

I assume GLM software can easily provide phase inversion and delay characteristics

https://en.wikipedia.org/wiki/Binaural_recording

Binaural recording is intended for replay using headphones and will not translate properly over stereo speakers.

Conventional stereo recordings do not factor in natural ear spacing or "head shadow" of the head and ears, since these things happen naturally as a person listens, generating their own ITDs (interaural time differences) and ILDs (interaural level differences). Because loudspeaker-crosstalk with conventional stereo interferes with binaural reproduction (i.e., because the sound from each channel’s speaker is heard by both ears rather than only by the ear on the corresponding side, as would be the case with headphones), either headphones are required, or crosstalkcancellation of signals intended for loudspeakers such as Ambiophonics is required. For listening using conventional speaker-stereo, or MP3 players, a pinna-less dummy head may be preferable for quasi-binaural recording, such as the sphere microphone or Ambiophone. As a general rule, for true binaural results, an audio recording and reproduction system chain, from microphone to the listener's brain, should contain one and only one set of pinnae (preferably the listener's own) and one head-shadow.

Not all cues required for exact localization of the sound sources can be preserved this way, but it also works well for loudspeaker reproduction.

Using space to manipulate a sound and then re-recording it has been done through the use of echo chambers in recording studios for many years. In 1959, an echo chamber was famously used by Irving Townsend during the post-production process of Miles Davis's 1959 album Kind of Blue. "[the effect of the echo chamber on Kind of Blue is] just a bit of sweetening. At 30th Street, a line was run from the mixing console down into a low-ceilinged, concrete basement room - about 12 by 15 feet in size - where we set up a speaker and a good omnidirectional microphone."[3]

Using an MRI scanner, Brüel & Kjær and DTU collected the geometries of a large population of human ears. By capturing the full ear canal geometry including the bony part adjoining the eardrum was, this data was post-processed to determine the average human ear canal geometry. Based on this, High-frequency Head and Torso Simulator (HATS) Type 5128, creates a very realistic reproduction of the acoustic properties, covering the full audible frequency range (up to 20 kHz).[6]

There are some complications with the playback of binaural recordings through headphones. The sound that is picked up by a microphone placed in or at the entrance of the ear channel has a frequency spectrum that is very different from the one that would be picked up by a free-standing microphone. The diffuse-field head-transfer function (HRTF), that is, the frequency response at the ear drum averaged for sounds coming from all possible directions, is quite grotesque, with peaks and dips exceeding 10 dB. Frequencies from around 2 kHz to 5 kHz in particular are strongly amplified as compared to free field presentation.[8]

During an interview with Chris Pike from BBC R&D in September 2012, Pike stated that "you may get good spatial impression but timbral coloration is often an issue".[10] The issue of timbral coloration is mentioned in a large amount of spatial enhancement research and is sometimes seen as the outcome of the misuse or insufficient amount of HRTF data when reproducing binaural audio for example, or the fact that the end-user simply will not respond well to the collected HRTF data. Francis Rumsey states in the 2011 article "Whose head is it anyway?" [11] that "badly implemented HRTFs can give rise to poor timbral quality, poor externalisation, and a host of other unwanted results".[11] Getting the HRTF data correct is a key point in making the final product a success, and possibly by making the HRTF data as extensive as possible, there will be less room for error such as timbral issues. The HRTFs used for Private Peaceful[9] were designed by measuring impulse responses in a reverberant room, done so to capture a sense of space, but is not very external and there are obvious timbral issues as pointed out by Pike.[10]

Juha Merimaa's from Sennheiser Research Laboratories in California discusses using HRTF filters and EQ to reduce timbral issues in his paper entitled 'Modification of HRTF Filters to Reduce Timbral Effects in Binaural Synthesis, Part2: Individual HRTFs' (2010).[12] His research found that using HRTF filters to reduce timbral issues did not affect the spatial localisation previously achieved using the data when tested on a panel of listeners. This explains that there are ways of reducing the effects of timbral issues on audio that has been processed with HRTF data, but this does mean further EQ manipulation of the audio. If this route is to be further explored, researchers will have to be happy with the fact that the audio is being manipulated in great amounts to achieve a greater sense of spatial awareness, and that this further manipulation will cause irreversible changes to the audio, something content creators may not be happy with. Consideration will have to be taken into how much manipulation is appropriate and to what extent, if any, will this affect the end users experience.It is important to consider the room that the BRIR and HRTF data has been collected in, as different rooms will influence the end results.

When recording a series of HRTF data, only a limited amount of measurements can be taken for distribution, and the end-users will have to find the best results for themselves. Of course the best HRTF data for any individuals will be the information that would be collect from their own pinna, not something that content creators for mobile applications are currently taking part in. Because of this, timbral issues may be unavoidable while using non-personal HRTF data, or attempting to distribute any audio that has already been affected by spatial manipulation. It may be that the most feasible route to improving spatial awareness in audio is to explore the possibilities of head tracking or other methods of collecting HRTF data at the user-end.

Many mp3 players and tablets are traditionally supplied with low budget earphones and these can cause problems for spatially enhanced audio.

Ideal listening conditions will most likely be experienced with headphones designed and calibrated to give an as flat frequency response as possible in order to reduce colouration of the audio the user is listening to.

Microphones are placed exactly at the ear drum using Primo EM 172 and 235mm being the average earlobe to earlobe distance. The sigmoid form in the canal of Kaan makes up for the missing head to a greater extent.

Lastly, the types of things that can be recorded do not have a typically high market value. Studio recordings would have little to benefit from using a binaural set up, beyond natural cross-feed, as the spatial quality of the studio would not be very dynamic and interesting. Recordings that are of interest are live orchestral performances, and ambient "environmental" recordings of city sounds, nature, and other such subject matters.

mel · Jul 16, 2021

https://en.wikipedia.org/wiki/Stereophonic_sound#Dolby_Stereo

Some films shot in 35MM, such as Camelot, featured four-track stereophonic sound and were then "blown-up" to 70MM so that they could be shown on a giant screen with six-track stereophonic sound. Unfortunately however, many of these presentations were only pseudo stereo, utilizing a somewhat artificial six-track panning method. A process known somewhat derogatorily as the "Columbia Spread" was often used to synthesize Left Center and Right Center from a combination of Left and Center and Right and Center, respectively, or, for effects, the effect could be "panned" anywhere across the five stage speakers using a one-in/five-out pan pot. Dolby, who did not approve of this practice, which results in loss of separation, instead used the Left Centre and Right Centre channels for LFE (low-frequency enhancement) utilizing the bass units of the otherwise redundant intermediate front speakers, and later the unused HF capacity of these channels to provide for stereo surround in place of the mono surround.

Dolby Stereo was succeeded by Dolby Digital5.1 in the cinema, which retained the Dolby Stereo 70mm 5.1 channel layout, and more recently with the introduction of digital cinema, Dolby Surround 7.1 and Dolby Atmos in 2010 and 2012 respectively.

mel · Jul 16, 2021

https://en.wikipedia.org/wiki/Chesky_Records

Chesky Records also offers binaural recordings, which seeks to replicate 3-D stereo sound so that the recording sounds as if the listener is in the same room with the musicians.[7] They capture this sound using dummy head recording.[7] For its recordings, Chesky Records uses acoustically vibrant spaces, including the Hirsch Center in Greenpoint, Brooklyn and St. Paul the Apostle Church located in Manhattan.[7]

mel · Jul 16, 2021

mel said:
Try this simple experiment to notice how diffuse or concentrated directionality can be your 2.1 or higher speaker configuration.

Rotate 90 degrees to left and right, through the speakers to notice differences

Lateral sound resolution below 1kHz

Vertical sound resolution above 7kHz

The 40 dots roughly represent ten degree intervals.

My R channel is two horizontal dispersion speakers (AudioEngine HD3)

32" away.

The L channel is a 24" Soundcore soundbar, standing upright for vertical dispersion.

It seems to me on The Doors, Dolby Atmos, Rider on the Storm:

that the kick drums and bass can be dialed in a very concentrated or diffuse in the R channel, as my R ear points more directly or away.

My guess the strongest directionality is within a five to ten degree range.

The same seems to generally true of the organ in the L channel.

Seems like a lot pans in/out of focus around the center.

Especially with special effects like thunder and rain.

I see full 7.1 channel control as a big benefit over sound coloration.

Feels like I have walked from one side of the stage to the other, after rotating, with closed eyes, from one side to the other. Dexter Gordon:

https://music.apple.com/us/album/cheese-cake/1459439436?i=1459439440

Saxophone in L channel

Drums in R channel.

View attachment 141386

The vertical soundbar has a much more pleasing "smooth sound gradient", which I rate at Better (+2). I rate the horizontal dispersion of the HD3 speakers as OK (0) for sound effects like thunder. Some might find the vertical sound bar too subtle or unintelligible.

I don't know what degree of naturalness The Doors were trying to achieve with sound effects.

Heavy downpouring rain usually has thunder claps that crack sharply from nearly directly above.

The sound is frightening in Colorado.

Can be 120db at 100Hz.

Seems like they used rain and thunder from different sources.

I think this song has a range from great (at beginning) to poor (towards end) examples of natural sounds.

Seems like they were having fun with the sound effects at the end.

I evaluate music as layers of triads:

Power

Color

Pitch

Stagesound Height

Stagesound Width

Stagesound Depth

Rhythm

Melody

Harmony

The -3 to +3 scale is:
-3 Worst
-2 Worse
-1 Bad
0 OK
+1 Good
+2 Better
+3 Best

A related topic is sound diffusion or Genelec "directivity" in their wave guide technology.

Color gradients are well understood. The color gradient diffusion is obvious, unlike sound.

The first picture is a radial gradient.

The second picture is a conical gradient, representing hearing localization.

https://en.wikipedia.org/wiki/Color_gradient

The HD3 (horizontal dispersion) tonal quality is much higher than the vertical soundbar. The balance between smooth dispersion and rich tonal quality might be an implicit compromise that we often overlook. Sound qualities must be prioritized to reach the best compromise.

$440px-Refraction_on_an_aperture_-_Huygens-Fresnel_principle.svg.png$
Sound from an array spreads less than sound from a point source, by the Huygens–Fresnel principleapplied to diffraction.

I evaluate a music recording on triad basis of:

Power
Color
Pitch

Stagesound Height
Stagesound Width
Stagesound Depth

Rhythm
Melody
Harmony

The -3 to +3 scale is:
-3 Worst
-2 Worse
-1 Bad
0 OK
+1 Good
+2 Better
+3 Best

Cost:
-3 -
-2 - $10,000
-1 $7500
0 $5000
+1 $2500
+2 $1000
+3 $500

In a previous post, I mentioned a noticeable compromise between 3D resolution and tonal qualities. I now know this has been a historical impediment to widespread 3D distribution and acceptance. I have plenty of experience compromising tonal for spacial qualities using my REM ADI-2 DAC. I know how to intuitively achieve this, but have never explicitly stated it as numeric parametric EQ values.

How to precisely quantify the spacial/tonal compromise regarding cost requires careful consideration.
- Dolby Atmos systems are very expensive today.
- I have already constrained myself on cost for Genelec GLM/SAM functionality
  - 2.1 - $2500
    - +1, on -3 to +3 scale
    - Would allow Dolby Atmos sound processor
      - Dolby up mixer for height effects
      - XLR outputs
      - Together $5000 would be
      - OK (0, on -3 to +3 scale)
  - 5.1 - $7500
    - -1, on -3 to +3 scale
    - Must compromise by giving up Dolby Atoms sound processor.
  - I need a good pair of headphones to plug into M1 MacBook Pro to judge Atmos benefits
    - If Apple AirPods Max Pro preclude buying any audio equipment
      - Best (+3, on -3 to +3 scale) at $500
      - Bluetooth is a significant compromise
        
        Bad (-1, on a -3 to +3 scale)
      - Equalizer
        
        On Mac Music has ten EQ bands, probably OK.
        
        Not parametric, so not Good.
        
        Uncertain about iPhone Music Accommodations
      - Very few Dolby Atoms specific headphones today.
      - I am a music butterfly. I only listen closely to music on rare occasions.
    - I am not fond of headphones or IEMs.
- The benefit of Dolby Atmos mastering is Best (+3)
  - The recordings are a significant improvement over conventional recordings for 2.1.2 or 5.1.2 systems.

Music compromises
- I have become adept at coloring music or "sound reinforcement" with my RME ADI-2 DAC.
  - As if I were mastering a recording with parametric EQ
    - amplitude, center frequency and bandwidth.
  - I assume Genelec GLM software contains equivalent tools, if not much more elaborate.
  - I deliberately used the RME DAC to enhance spacial qualities, which requires considerable skill.
  - The 1kHz frequency band on my HD3s has always annoyed me.
    - I "domed" or inversely "muted" the "dished sound" in that frequency band to compensate for what I recognized as a deficiency.
      - Adjusting the EQ Q-factor seemed most significant.
  - I had not realized the importance of the 7kHz and higher frequency band.
- I tend to listen to lossless acoustic (jazz, classical) music:
  - In the "ultra near-field" (32 inches or 812mm) at my desk from a Mac M1 laptop
    - Using a RME ADI-2 DAC via USC-C connection to avoid Mac soundboard.
  - On active, micro speakers (AudioEngine HD3)
    - Technically a "powered" rather than true "active" speaker like Genelec
  - At conversation volumes (e.g., 60db)
  - Most often casually in the background, than exclusive foreground attention
    - "butterfly listening"
- My AV receiver has Dolby Digital Plus, Dolby True HD, Dolby Pro Logic IIx
  - I could only connect to the Genelec SAM subwoofer via XLR analog via my two channel RME DAC. Cannot use my Dolby decoder.
  - I would only use 5.1.0 (omit RS and LS) Dolby
    - Dolby Atmos receiver and Atmos-enabled speakers are clearly not worth the benefit that I can obtain otherwise.

https://en.wikipedia.org/wiki/Pentagon

https://en.wikipedia.org/wiki/Binaural_recording

AdamG · Jul 16, 2021

This is not a conversation this is some kind of Monolog with oneself for what purpose I do not know. Op, please send me a pm with a little more explanation of what this thread is really about and possibly I will reopen this thread. For now the thread is closed.

How to achieve height sound effect on a desktop 2.1 system without Dolby Atmos? Simulate a cathedral-like or concert hall height.

mel

Senior Member

mel

Senior Member

mel

Senior Member

mel

Senior Member

mel

Senior Member

AdamG

Helping stretch the audiophile budget…

Similar threads