Audio quality: lossless versus mp3, sample rates, and bits

My main music source is now ripped files on my Linux server using Squeezebox as a server, and the Squeezebox Transporter as a high-end digital to analog converter. When I started this about six years ago, I did a test between lossless and the highest-quality (320 kbps) mp3 files and decided that I had trouble identifying any difference between them. Hard disks were still expensive, an I decided that the disk space savings would tip the choice in favor of ripping to mp3 format. I now deeply regret this decision.

When I updated my loudspeakers, I was disappointed in the sound I was hearing. Whenever the music became loud and congested (many instruments playing at once), everything merged into one blur. Even worse, the sound was "grainy." The visual analog to this is the pixelation that occurs when you blow up a digital image. It is easy to hear on strings; instead of a smooth sound, it sounds raspy. Of course, what you hear depends upon your music source. Lots of Rock, Heavy Metal, etc. music is already distorted, and because the music comes out of loudspeakers when it is recorded, you are hearing the result of an imperfect chain. The iTunes store's best format is just 256 kbps and is of low quality.

But some music stands up well to compression. Things like solo guitars or pianos, or some chamber music. However most symphonic classical music sounds inferior to the original CD if it is compressed. Not only do you hear the artifacts I discussed above, but also, there is a general loss of spatial information and localization. Haziness descends over everything.

I have a huge CD collection, and have almost no room left for any more! For the last few years, I have been ripping any new CDs in lossless format (flac or m4a), and have embarked on the sisyphean task of re-ripping all of the symphonic CDs in my collection. But lately, I have been mostly downloading new music from eClassical.com. I highly recommend that classical music lovers check this link every day for the daily deal from the BIS catalog. They charge by the minute for music, and the daily deal is half price. BIS is the perfect choice for me because of their eclectic and broad content. Almost always the item on the daily deal is something I have not heard, and for about $5 it is cheap enough to give these a try. eClassical's albums come in uncompressed 16-bit flac with no DRM. You can also download a free mp3 version for your portable devices. In addition, more recent albums come in 24-bit flac for about 50% more money.

Here is what you need to know about digital formats and whether it is worth paying extra for 24-bits.

Sample rates

Music on a normal CD is 2-channel signed 16-bit Linear PCM sampled at 44,100 Hz. Let's discuss the sample rate first.

It is hard to believe, but sampling a continuous function (taking its height, for example) n times a second can totally reproduce the signal subject to certain conditions. Wikipedia has a pretty good explanation, which I steal here. You can represent a band-limited signal by its spectrum by taking its Fourier Transform. If you plot the magnitude versus frequency, you get something like

Band-limited signal

Note that there are always positive and negative frequency components, which combine to make a sine wave, for example. In this case, the frequency range is limited to -B < f < B.

When you sample in the time domain, it convolves this spectrum with impulses at +/- the sample frequency. In layman's terms, this replicates the above spectrum around the sample frequency:

sampled spectrum

The green spectra are added in addition to the blue signal. You can see that the three identical spectra could each reproduce the original signal. But what is usually done is to introduce a very sharp cutoff ("brick wall") filter H(f) which recovers just the original spectrum. Notice that the sample frequency f_s must be greater than 2*B or trouble will ensue:

Aliased spectrum

In the top picture, the green spectra clobber the tails of the blue spectra, which is called aliasing. This causes noticeable severe distortion. So, the sampling rate must be at least twice the highest frequency component of the signal. This is called the Nyquist Criterion. For a CD, this means that the highest possible frequency that can be present is about 22 kHz. No one I know (other than dogs) can hear this high.

But the requirement to construct a sharp-cutoff filter H(f) means that the phase response of the filter near the cutoff will oscillate, also causing some distortion. Backing off from the requirement of a sharp cutoff allows filter designers to have both good amplitude and phase responses, but at the expense of the higher frequencies. So effectively, the CD frequency response might be limited to 17 or 18 kHz. I can no longer hear that high. Another way around this problem is to over sample. If you take twice as many samples, the green spectra move out to 2*f_s which allows room for much gentler filters to be used. But the underlying audio quality is still the same.

Sample bits

A CD represents the samples using 16 bits. This means that the sample height can only be one of 2¹⁶ = 65536 values, half of which are negative. This inherently causes some distortion if the actual sample height is between the quantized 65536 levels, and it was especially noticeable in early CDs. This can be ameliorated to some extent by what is called dithering. It works nicely for pure sine waves and at small signal levels. A higher bit rate is used until the final step in the mastering process. Then the sample is randomly rounded up or down so that on average, it is at the exact sample height.

Audio DVDs, for example use 24-bits sampled at 192 kHz. This high sampling frequency allows a very high frequency response (why is this needed? can we tell?) and the ability to use very gentle filters. But the 24-bit (=16,777,216) different sample heights reduce sample height distortion to totally negligible levels and also increase the dynamic range above the 96 dB on a CD. However, it is also very difficult to listen to very high dynamic-range music if there is any background noise, or if the neighbors complain. 24-bit music can have a dynamic range of 144 dB. Note that 3dB is a factor of two in loudness because decibels are on a logarithmic scale.

The downside (aside from cost) to getting 24-bit music is that even at a 44.1 kHz sample rate, an album can take up 1 GB of disk space.

Is a CD good enough?

Before evaluating your Hi Fi system and/or your music sources, you really need to attend a live classical music concert to get calibrated. Last night I attended a concert put on by the Suchoñ Woodwind Quintet with Michiko Otaki playing the piano. This was sponsored by the Oak Ridge Civic Music Association, and I maintain their Web site. It was a stupendous concert, but attendance was severely limited by the snow, slush, ice, and floods on our streets. I had a front-row center seat. To really listen to music, I advise closing your eyes and concentrating on the sound. They played the Poulenc Sextet, one of my favorite pieces, and it was the best performance of it that I have heard, and the first time I heard it live. No matter how loudly they played, every instrument had a separate sound space, i.e., you could follow just that instrument among the other five players. There was no distortion or granularity. Poulenc often breaks the melody up, tossing it from the horn to the bassoon, to the clarinet, to the oboe, and finally to the flute in ascending notes. With your eyes closed, you can hear this, but it takes a few seconds to realize that there is a seamless transition.

In an orchestral concert, again the sound should never be muddled and the string tone should be smooth. But when I close my eyes, I do not hear a soundstage where things are highly delineated because most of the orchestral sections have multiple members. I get a feeling of weightiness and heft to the sound (when appropriate).

For my home system I am talking about lossless flac or m4a rips played via my Transporter, Pioneer SX-1980, and Magneplanar MG-3.7i speakers and my SVS subwoofer. Whether the sound is "good enough" is a hard question because there are good CDs and bad CDs. It is hard to tell whether the recording was bad, or if it is a CD resolution issue. To try and scope the problem, I listened to the start of a bunch of different versions of Wagner's Das Rheingold and Die Valküre. The live recordings from Bayreuth (Karl Böhm) are definitely not great recordings. Of course the orchestra is hidden behind a curtain, which may cause some of this, and the audience is coughing. Marek Janowski's recording has much better sound. Only when the Rheine Maidens sing at once do I get the feeling that I am hearing granulation. There are some good CD (ripped losslessly). They seem to be the newer ones, and I think they must be using super-bit mastering, which adds tailored noise to get more "effective" bits. For example, the recordings of Bach's Orchestral and Brandenburg Suites conducted by Helmut Rilling with the Oregon Bach Festival Chamber Orchestra sound wonderful; so do the remastered recordings of the Mendelssohn symphonies by Wolfgang Sawallisch and the New Philharmonia Orchestra. So the CD format does seem to be viable if done properly.

If you buy the 24-bit version, eClassical also gives you the 16-bit and mp3 versions. I did some critical listening to a recording of Rimsky-Korsakov's Symphony No. 1 conducted by Kees Bakels with the Malaysian Philharmonic Orchestra. The 16-bit version did not sound bad. There were no winceable moments, but it did not sound like a live symphony orchestra. The 24-bit version seemed to remove a layer of haze and made it sound as if the players were better. In the loud parts, the 24-bits were able to maintain the separation of the instruments. It sounded much more like what I hear in person.

From now on, I will never use mp3 on my home system (it is great for mobile use), and when possible, I will buy 24-bit recordings.

Comments

How does super-bit mastering work

If the height of a sample in a 16-bit CD represents the actual sample height (made with more than 16 bits), what happens if the real sample height falls, say half-way betweed the quantized 16-bit sample heights? If you round up, or if you round down, you create an error of half the distance between the levels.

But if the samples kept repeating over and over at the same amplitude, you could round up half the time and round down half the time. This would sort of double the number of amplitudes that could be represented (another bit). If the actual height was 3/4 of the way between the quantized heights, you could round up 3/4 of the time and round down 1/4 of the time, creating another bit of resolution.

Unfortunately, this does not quite work for a few reasons. The decision about which way to round must be selected randomly from a distribution that yields the same sample probabilities. And also, the same actual sample does not usually occur in a repeated fashion. Nonetheless, CD mnufacturers claim that they can get an "effective" 19–20-bits of resolution.