What's new

Why is 44.1kHz still the standard for music ?

Interesting, thanks a lot!

I think I understand better now what you're doing. I have to admit that the last time I looped single wave cycles was on my Ensoniq Mirage, back in the 80ies. It might very well be that I never ran into that issue therefore, working on 8 bits samples by editing paramter values in hexadecimal. 8-)

I just saw this from almost a year ago.

Re: the post just above this one that Dietz is talking about - if I understand right, if you're not going to xfade the loop (which I suspect that you're going to do anyway), you're going to loop at a zero crossing. Ergo it shouldn't matter what the SR is in that case.

Or are you saying you can use the end of the recording at a more exact loop point at a higher SR? I'd have to hear an example of that making a difference, but I still don't understand how it would - because a sine wave below Nyquist is a sine wave below Nyquist.

Someone mansplain me what I'm missing.
 
Last edited:
With well-designed modern converters, 44.1 kHz objectively provides a more accurate reproduction of the source material than 96 kHz does. With only about one millisecond of extra latency compared to a 96 kHz sample rate, decent modern converters working at a 44.1 kHz sample rate can reduce aliasing induced distortion to below -100 dB at 20 kHz, whereas as the ultrasonic frequencies present in material sampled at 96 kHz produce intermodulation distortion throughout the audible range which can easily be as high as -30 dB.

Here is a detailed illustration of why that is the case: https://www.gearslutz.com/board/mas...ot-high-resolution-quot-audio-processing.html

It should be obvious that -100 db of inharmonic distortion limited to frequencies that most people can't even hear would be preferable to inharmonic distortion throughout the audible range at amplitudes that can be as much as 70 dB higher. Why then does 96 kHz not sound terrible compared to 44.1 kHz? Keep in mind, I said 44.1 kHz is objectively more accurate than 96 kHz (within the audible range). People may nevertheless prefer the sound of the extra noise floor produced across the audible spectrum by 96 kHz. The more likely reason is that listeners can't actually hear the differences despite the fact that said differences are objectively measurable. The reason is that ultrasonic content tends to be much lower in amplitude than the content within the audible spectrum, so the intermodulation distortion produced by ultrasonic frequencies is largely masked by the intermodulation distortion produced by frequencies within the audible range.

In a 2010 AES study on sample rate discrimination, expert listeners in blind tests could not discriminate between material recorded at 88.2 kHz and 44.1 kHz in most cases. When they could consistently discriminate with some accuracy, they believed 44.1 kHz to be the higher sample rate, presumably because they thought it sounded better.

The moral of the story is that there isn't much to worry about either way. There are reasons to use high sample rates if you're pitching down, if you need a millisecond less latency, or if you're using non-linear plugins that lack good anti-aliasing filters and/or oversampling. In that case, you may benefit from using an ultrasonic lowpass filter before and after every nonlinear process. In my opinion, considering the tradeoffs and the current state of technology, 44.1 kHz and 48 kHz are superior to higher sample rates in most circumstances, but it doesn't really make much of difference to what people can actually hear. So it's probably best to just work at whatever rate you need to deliver at.
 
Last edited:
With only about one millisecond of extra latency compared to a 96 kHz sample rate, decent modern converters working at a 44.1 kHz sample rate can reduce aliasing induced distortion to below -100 dB at 20 kHz, whereas as the ultrasonic frequencies present in material sampled at 96 kHz produce intermodulation distortion throughout the audible range which can easily be as high as -30 dB.

That's really interesting. So essentially I might be preferring the 96Khz playback on my particular project even though it's actually more distorted, in a similar way to how some people prefer the sound of vinyl to digital?

Alternatively maybe one of my plugins does not work well (re oversampling) at 44.1. This particular project is very simple, containing a handful of sample libraries (all recorded at 44.1) plus the following FX:

NI Replika delay
Logic Graphic Equalizer (from the Vintage EQs collection)
Logic Channel EQ
Logic Chromaverb reverb
Waves Puigchild 670
Waves Abbey Road Mastering Chain
Boz Digital Labs The Wall limiter

Is one of those likely to be the "culprit"?
 
That's really interesting. So essentially I might be preferring the 96Khz playback on my particular project even though it's actually more distorted, in a similar way to how some people prefer the sound of vinyl to digital?
Yeah, it could definitely be that and you wouldn't be alone. If there was already recorded audio that had to be downsampled to match the new rate, there could have been some degradation. Although Logic X's sample rate conversion is really good. Earlier versions of Logic weren't (nor were many other DAWs back then). Downsampling has the potential to cause more artifacts than simply recording at the lower sample rate in the first place, but with more recent software the process is usually pretty transparent.

Alternatively maybe one of my plugins does not work well (re oversampling) at 44.1. This particular project is very simple, containing a handful of sample libraries (all recorded at 44.1) plus the following FX:

NI Replika delay
Logic Graphic Equalizer (from the Vintage EQs collection)
Logic Channel EQ
Logic Chromaverb reverb
Waves Puigchild 670
Waves Abbey Road Mastering Chain
Boz Digital Labs The Wall limiter

Is one of those likely to be the "culprit"?
I don't know about the Logic plugins, but the non-linear Waves plugins I've tested have a bit of aliasing that improves at a higher sample rate. Depending on the material and settings, it may or may not be audible in any particular situation.

The website tells me that limiter has optional 8x oversampling, so it would be easy see if it improves by engaging it.
 
Last edited:
Yeah, it could definitely be that and you wouldn't be alone. If there was already recorded audio that had to be downsampled to match the new rate, there could have been some degradation. Although Logic X's sample rate conversion is really good. Earlier versions of Logic weren't (nor were many other DAWs back then). Downsampling has the potential to cause more artifacts than simply recording at the lower sample rate in the first place, but with more recent software the process is usually pretty transparent.

I don't know about the Logic plugins, but the non-linear Waves plugins I've tested have a bit of aliasing that improves at a higher sample rate. Depending on the material and settings, it may or may not be audible in any particular situation.

The website tells me that limiter has optional 8x oversampling, so it would be easy see if it improves by engaging it.
Yep, I already have the oversampling engaged on the limiter, so maybe it's the Waves plugins!
 
the ultrasonic frequencies present in material sampled at 96 kHz produce intermodulation distortion throughout the audible range which can easily be as high as -30 dB.

Shouldn't those freqs - including difference signals - get filtered out before they're recorded?

Don't get me wrong, 96kHz can kiss my ass, to quote a friend who's a forum member here. But I don't quite understand how intermodulation products generated way up there would make it into the human range.

This is Socratic questioning - I'm not saying I'm right you're wrong nah nah nah, I'm trying to understand the argument.
 
Shouldn't those freqs - including difference signals - get filtered out before they're recorded?

Don't get me wrong, 96kHz can kiss my ass, to quote a friend who's a forum member here. But I don't quite understand how intermodulation products generated way up there would make it into the human range.

This is Socratic questioning - I'm not saying I'm right you're wrong nah nah nah, I'm trying to understand the argument.
New intermodulation distortion appears every time there is a nonlinear process such as saturation or compression. So all plugins that emulate analog gear create intermodulation distortion. Even if you don’t use any dynamics or saturation plugins, it is unavoidable because a lot of it is created by speakers/headphones when your recorded signal is turned back into sound. New frequencies are produced at both the sum and difference between every single frequency in your recorded material. For example, if you have 4 kHz and 10 kHz, they will create distortion at 6 kHz (10 kHz - 4 kHz) and 14 kHz (4 kHz + 10 kHz). Now add 23 kHz as a single ultrasonic frequency and you will also get distortion at 13 kHz (23 kHz - 10 kHz), 19 kHz (23 kHz - 4 kHz), 27 kHz (23 kHz + 4 kHz), and 33 kHz (23 kHz + 10 kHz). So in this very simplified example, we get the following distortions at each sample rate:
  • 44.1 kHz sample rate: distortion at 6 kHz and 14 kHz
  • 48 kHz sample rate: distortion at 6 kHz and 14 kHz, very slight distortion at 13 kHz and 19 kHz
  • 88.2 kHz sample rate: distortion at 6 kHz, 13 kHz, 14 kHz, 19 kHz, 27 kHz, and 33 kHz
  • 96 kHz sample rate: distortion at 6 kHz, 13 kHz, 14 kHz, 19 kHz, 27 kHz, and 33 kHz
The reason the extra distortion at the 48 kHz sample rate is only very slight is because the ultrasonic frequencies are already rolled off most of the way and so can’t interact much with the audible frequencies. Anyway, this is a very simple example, because in reality, no one's music is made up of just three sine waves, there will be frequencies across the entire sampled bandwidth interacting with every other frequency across the bandwidth. Every ultrasonic frequency multiplies the amount of distortion. And the above example is also the result of a very light nonlinear process. Normally this will repeat multiple times, so that every new frequency created by a sum and difference becomes a new frequency in the next round creating yet more new frequencies at the sum and difference between every other preexisting and new frequency. Because the ultrasonic frequencies are so low at the 48 kHz sample rate, the difference between it and 44.1 kHz is probably rarely if ever audible, and may be offset by 48 kHz having a more room for it’s antialiasing filter.

You may think some frequencies that appear in the above example like 27 kHz and 33 kHz aren’t relevant since you can’t hear them, until you realize that at the next round of intermodulation distortion 33 kHz - 27 kHz produces extra distortion at 6 kHz. Oversampling plugins minimize all this intermodulation distortion buildup because they lowpass out ultrasonic frequencies after each process. The buildup at higher sampling rates may not be very significant if you aren’t hitting saturation and dynamics plugins very hard. And as I hypothesized earlier, some people might actually like the build up because a lot of it broadband at very low amplitude may sound more like a cushiony noise floor. At higher amplitudes though it can sound like harsh/brittle dissonant distortion.

Here are some examples of what obvious intermodulation distortion sounds like:



 
Thanks. Yes, I do know what IM distortion is, but that still doesn't clear up what I'm asking.

My question was about recording ("Shouldn't those freqs - *including difference signals* - get filtered out before they're recorded?"). But the same thing applies to 96k processing, and you mentioned oversampling filters taking care of that.

At this point I'm becoming less Socratic and more Pyrrhonist. :)
 
Thanks. Yes, I do know what IM distortion is, but that still doesn't clear up what I'm asking.

My question was about recording ("Shouldn't those freqs - *including difference signals* - get filtered out before they're recorded?"). But the same thing applies to 96k processing, and you mentioned oversampling filters taking care of that.

At this point I'm becoming less Socratic and more Pyrrhonist. :)
With present technology, there is no way to selectively filter out only IMD (to my knowledge). You can only band limit to prevent unnecessary frequencies from multiplying the amount of IMD that occurs. More IMD occurs at every stage of nonlinear processing, including playback. You could band limit your converter at the recording stage so that it rolls off everything above the audible spectrum, but then it would still accumulate with processing. At that point you may as well be recording at 44.1k anyway (in most cases). But say you have some plugins you really like other than the fact that they aren't optimized for lower sample rates, you could work at 96k and simply place a plugin that filters out ultrasonic frequencies before and after every nonlinear process, and then you would achieve IMD on par with the lower sample rates. Tokio Dawn Labs actually created a free plugin for this purpose: https://vladgsound.wordpress.com/2014/12/21/tdr-ultrasonic-filter-alpha-version/

You could additionally (and maybe this is along the lines of what you were thinking), place an analog ultrasonic filter prior to conversion. Seems more trouble than it's worth. Obviously plenty of people work at 88.2k and 96k and get good results without worrying about extra IMD (if they are even aware of it). The empirical evidence suggests even expert listeners with high end monitoring equipment usually can't hear the difference. Although if you use a lot of dynamics processing and/or saturation, you're more likely to have IMD build up to the point its audible. Really fast attack times on compressors and limiters produce a lot of it. That's probably why the earlier analog models of 1176s could never seem to compress as fast as the real ones, I assume that developers just hadn't figured out how to do it yet without it getting ugly.

It may also be worth mentioning at this point that some of what I've said wasn't always the case. For example, it used to be that many converters performed better at higher sample rates simply because it was easier to design them that way at a reasonable price point. The technology has advanced to the point that today prosumer (maybe even budget) converters can usually do any common sample rate well.
 
You can only band limit to prevent unnecessary frequencies from multiplying the amount of IMD that occurs

Right, and I've always assumed that's what happens with decent plug-ins, pretty much the same as with oversampling filters.

That might be what I'm missing.
 
I understood the situation to be that ultrasonic frequencies can cause more Inter modulation distortion to occur, some of it occurring in lower audible frequencies. So you are adding audible Im distortion in order to process inaudible ultrasonic high end. once it’s in the audible range how do you filter it out?

you could band limit the audio before hitting the non linear plugin but then you might as well have been recording at a lower sample rate to begin with.

there will always be some IM present, but using a higher sample rate just multiplies how much of it, and some of it will be audible.
 
I understood the situation to be that ultrasonic frequencies can cause more Inter modulation distortion to occur, some of it occurring in lower audible frequencies.

I think you get this, but if you combine - intermodulate - two freqs, you get two more: one at the sum and one at the difference. So two freqs above human hearing can combine and produce difference freqs within the audible range.

The word "distortion" is a little confusing, because it makes it sound like something you never want, when in fact the combination is often want you do want. Normally the difference freqs in the audible range are in the sound already - i.e. they're recorded.

sumskilz says that sum signals from two sounds in the audible range can combine into the supersonic range and produce difference freqs in the audible range that are -30 (minus 30 below what?), but the amount of power in supersonic freqs is really low to start with.

Anyway, good audio algorithm designers know what they're doing, and I doubt this is an issue to lose sleep over.
 
I agree but I’m also using 48k for the mere fact I don’t want all the overhead for marginal if any difference.

There are so many other ways to mangle the sound in good and bad ways, for me the sample rate is the least of my concerns. I choose 48k because anything I do is likely to either be in video at 48k or more likely encoded into lossy compression anyway at this point before anyone other then me ever hears it. I will almost certainly not be producing a cd at 44.1k. So 48 it is, zero concerns
 
Or are you saying you can use the end of the recording at a more exact loop point at a higher SR? I'd have to hear an example of that making a difference, but I still don't understand how it would - because a sine wave below Nyquist is a sine wave below Nyquist.

Someone mansplain me what I'm missing.

I'll try to explain this again, to the best of my manly abilities.

First, the situations where this makes a difference for me are when I try to loop the sustain part of a sample.
Two situations come to mind:

1- Long loops (several wave cycles to several minutes) :

If the looping creates a "step" in an otherwise smooth waveform, you will hear a click. Nyquist theory doesn't matter, audible range vs ultrasonics don't matter - it is not about those. It is about a click (aka "step") in the wave, and a click is a click, period. Sometimes crossfading doesn't do it for me, at least not as well as having loop end and start sample values match exactly to begin with, which you'll get more chances of getting right if you split the wave into finer slices.

2- Short loops (one to a couple of wave cycles) :

Loop length is obviously quantized to the sample length (like 1/48000 sec. at 48KHz). With high-pitched notes, sometimes none of two consecutive possible loop lengths will sound in tune: that means the loop length is not close enough to a multiple of the note's period. No amount of crossfading will correct that, since crossfading doesn't affects the loop's length. Some sampler give you the ability to fine tune the loop relatvely to the non-looped part, and that works fine. In other cases, a higher sample rate will help, because it quantizes loop length more finely (you get steps of, say, 1/192000 sec. instead of 1/48000 - that's 4x more precise). Here again, it is not a matter of Nyquist theorem or ultrasonics magic - in fact, it is not a matter of harmonics: it is a matter of precise tuning of the fundamental, to which our ears are very sensitive, especially near the middle of the audible spectrum, where the fundamentals of high-pitched notes fall.

Maybe the reason I run into these issues more that the other guy, is that I mainly sample synthesizers. They can (arguably) generate a purer, more stable sound than acoustic instruments, making this kind of imperfections stand out more.

Now, of course it can be debated whether the increased memory takeup is worth this solution.

But you don't need to run your DAW and converters to a high sampling rate in order to play high rate samples, in any case. All this happens whithin the pitching algo of your sampler, which by definition adapts the sample's rate to the DAW/converters' rate anyway (so it's not even more work for it).

And BTW, if we stick to the issues I mention here, you don't even need to sample at a high rate. Upsampled samples would work just as well. Because as I said, it's not about ultrasonic content.
 
Last edited:
So... when companies advertise they recorded their violins at 96k...it's all just a load of marketing rubbish and the samples might actually sound worse than if recorded at 48k ??

So what about 192k ?? Why do people even use this for?? Making an album of waltzes for bats??
 
Interesting debate. For sure, one thing: if you have a bunch of 48k or 44.1k samples, there is vanishingly small benefit to running at 96k -- I say 'small' only to acknowledge those who argue that you may have processing taking place at a higher sampling rate. I think it's a total waste of overhead, storage, processing, brain damage, etc. for samples-only productions.

What puzzles me about the posts above that argue 96k is worse, is that nearly every 'A level' film score and album that records live players is running at 96k.

How come?

So what about 192k ?? Why do people even use this for?? Making an album of waltzes for bats??

Dan Lavry argues that above 96k you not only don't get any benefit, it's actually 'worse.' Check out lavryengineering dot-com
 
What puzzles me about the posts above that argue 96k is worse, is that nearly every 'A level' film score and album that records live players is running at 96k.

How come?

Yes its a strange one! Maybe because the sound systems of modern cinemas might benefit from this??
 
Yes its a strange one! Maybe because the sound systems of modern cinemas might benefit from this??

IDK. The engineers say you record and mix at 96 and it's better, but I've recorded plenty of material that went on the air and in theatres at 48. Just don't tell anyone....
 
If the looping creates a "step" in an otherwise smooth waveform, you will hear a click

That's not a sample rate issue. You can easily get clicks at zero crossings.

Dan Lavry argues that above 96k you not only don't get any benefit, it's actually 'worse.' Check out lavryengineering dot-com

And now his son has taken up the argument for him. :)

Not directed at John: there aren't "more points" to represent the waveform. It's all sine waves.

The technical argument for higher sample rates is only that it puts the brick wall filter ringing out of the audible spectrum.

Remember, a speaker can only go backward or forward.
 
Top Bottom