I had seen the format of the ,WAV file, but it makes little sense to me. I "understand" at a basic level bits and bytes and ASCII characters.
Overview:
The sound data is numeric. For 16 bit the values range from +32767 to 0 to -32767.
Sound waves are air pressure changes (rapidly) over a period of time. The sound waves cause local pressure of the air to rise above (compression) and fall below (rarefaction) the local undisturbed (silence) ambient air pressure.
A microphone mechanically "senses" the pressure variations and outputs a varying voltage, which is an analog of the pressure changes. No gaps or steps or numbers yet. More pressure (compression of the air by the musician) gives a higher positive voltage, less pressure less voltage, negative pressure (rarefaction) gives a negative voltage output.
This voltage from the microphone could be fed to an amplifier and drive the speaker and you hear the sound. This would be like a live singer with a mic.
The problem is to store the output of the microphone so you can take it home and listen to it later.
That has been done by plowing a wiggly groove in vinyl, or putting a varying amount of magnetism (more or less) onto a magnetic tape.
Digital:
An Analog to Digital Converter takes the voltage from the microphone, and, for example (CD rate) measures it 44,100 times per second.
A stream of numbers (digits) comes out of the ADC, 44,100 times a second, each number is the instantaneous voltage of the microphone output, scaled to fit within the 16 bit range, depending a little on the twist of the gain knob.
It is this stream of numbers that go onto a CD or into a WAV file. (there are complications as to how the data is "encoded" on the CD, but it still decodes to the stream of numbers the ADC output).
It is this stream of numbers that is read from a CD or a WAV file and sent on to the Digital to Analog Converter.
The DAC takes the numbers and turns them back into a voltage that changes over time, scaled again, typically within the range of +/-2V, which can be fed to an amplifier, then fed to a speaker, to produce air pressure changes that propagate to your ear, similar to the air pressure changes that tickled the microphone in the beginning.
100 milliseconds of air pressure variation (music), one channel, 4,100 samples
The dots represent the numerical samples that make up the "wave".
The numbers on the left are not the numerical values in the file, they go from 1 to -1. Multiply by 32768 to get an idea of what the actual sample values were.
Last edited: