More on standard deviation. This lesson will cover three more advanced topics concerning standard deviation.
So, right away I'll say, if you're just worried about getting the basics of standard deviation, everything you learned in the last video,
that will really help you with standard deviation. These are relatively specialized rare questions, and
you would only be seeing these if you were really doing every well in math and doing the hardest questions ion the test.
所以,我不会担心这个如果你只是我们rried about the basics of standard deviation, you could just skip this video.
These are very advanced topics. Topic number one concerns how the standard deviation would
change when we include new members to a list, making the list longer. This is a tricky issue for a few reasons.
Let's say we have a set with 20 members, and the set has a mean of 50 and a standard deviation of 5.
So what we have here are known as summary statistics, we know the overall mean, the overall standard deviation, we don't have the list of individual data.
Suppose we are gonna include two more numbers to
bring the total number of members to 22.
假设我们包含80和80作为列表的21秒和第22个成员。当然,这会改变平均值,而且
如果我们想,我们可以计算新的平均值。新的平均值将稍微稍微50。
但所有偏差都发生了变化,因此无法计算新的标准偏差。
所以首先,这是一个微妙的区别,如果我们有摘要数据只是旧的平均值旧标准偏差和
then we add these two new points. We have absolutely no way to calculate the new standard deviation.
现在,如果我们的列表,原始列表20 values, and then we added the 21st and 22nd values, if we had all the values,
then we could calculate the standard deviation.
But even then, that's a calculation the test is not gonna ask you to do. So in either way, you're not gonna have to worry about this calculation.
All we can say, is that if we include new members that are far away from the mean of the set, the standard deviation of the new set will be larger.
So, that's very clear. The two values of 80 are way far away from all the other numbers.
They're really big outliers, so adding really big outliers, that's gonna increase the standard deviation.
That you need to know. We can say a little bit more if we include a pair of
numbers that doesn't change the mean. If we include the numbers that are equally spaced around the mean,
so one is K units above the mean, and one is K units below the mean, then we will not change the mean.
因此,与列表中所有数字的平均值的偏差将保持不变和
we can draw some more detailed conclusions about what the standard deviation will do. Again, let's start out with that same set, 20 numbers, mean of 50,
standard deviation of 5. Suppose we include 40 and 60 as our new members, so first thing to note is 40 and
60, one is 10 above the mean, one is 10 below the mean, they're equally spaced around the mean.
So that means, they're not gonna change the mean. So the mean of the new set is also 50.
Well, right away that's good, that means that none of the other deviations change. Now, let's think about this, 40 and 60 each
has a distance of 10 away from the mean while the standard deviation is only 5. So these are much further from the mean,
in fact each one is two standard deviations away from the mean, each one is further from the mean than the standard deviation.
So adding bigger deviations than the standard deviation will increase the standard deviation of the list.
现在,我们不会实际计算标准偏差的新价值。
It's just enough to realize that if we add 40 and 60 to this set, we're going to increase the standard deviation, because we've
添加的数字从平均值不同于标准偏差。
All right. Reset.
We start out with our set of 20. Now we're gonna include 45 and 55.
Once again, adding two numbers that are symmetrically spaced, five above, five below, so this not gonna change the mean,
we're gonna stick with the same mean, the mean of 50, and now we're adding two numbers that are exactly at the standard deviation on either side.
One is one standard deviation above the mean and the other is one standard deviation below the mean.
Each one has a deviation from the mean exactly equal in size to the standard deviation, so that means they're not gonna change the standard deviation at all.
This new set will have exactly the same standard deviation as the old set so the, the old standard deviation and the new standard deviation both equal five.
So this is the only time you'd need to know the new standard deviation, know the numerical value because it stays the same,
it's the same as the old numerical value. All right, reset back to the set of 20.
Now suppose we include two numbers that are closer to the mean, say 47 and 53. Well, now we add two numbers,
again, symmetrically spaced, one is three below the mean, three above the mean, so this will not change the mean, the mean will stay the same.
And so now, notice that these have a separation of the mean that is less than the standard deviation.
它们更接近标准偏差的价值。这意味着它们会降低标准偏差。
Now we don't need to be able to calculate the new standard deviation, but we need to recognize that the standard deviation would decrease if
we added these two numbers.
Now, we might get curious what pair of numbers could we include that would most decrease the standard deviation?
Well, we would have to include a pair with the smallest possible distance from the mean.
当然,平均值的最小距离是0.你不能有小于0的距离。
If the two new members of the set we include are 50 and 50, these 2 members have distance, each have a distance of 0 from the mean, so
they decrease the standard deviation the most. Of all possible pairs of new members, we can include in a set,
including two new members equal to the mean of the set is the pair that would decrease the standard deviation the most.
测试喜欢询问这个想法。所以在这里,我们可以看到,我们可以看到这两点在平均值。
They're gonna have deviations of 0, and so if you think of the list of deviations, putting two more 0s on that list, that decreases the standard deviation.
And once again, we don't need, need to be able to calculate this new standard deviation, but we just need to recognize that it has decreased the most.
The second topic concerns the standard deviation as a unit of measurement.
What do I mean by this? In very large sets, for example, populations of countries or
everyone who takes the SAT, some kind of gigantic set like that, we may want to specify the position of an individual with respect to the population.
If we're told that the mean of a certain set is 50, and somebody's score is 60, what exactly does that mean?
Yes, that score of 60 is above the mean, is this just something kind of mildly above the mean, or is this a really really impressive score far above the mean?
好吧,以这种方式思考它,如果平均值为50并且标准偏差为20,那么60高于平均值,它是上方平均值的一半标准偏差。
But, presumably, many numbers are gonna be higher than that. It would not be unusual in a set to have numbers that
就一个标准差的mean or even further. So we certainly expect that there be some numbers that are one standard deviation
在平均值之上,甚至可能甚至一个半标准偏差。所以我们期望得分为70和80,所以如果60是的,
它高于平均值,但它不是最高分之一。
因此,这不是,它的良好一面是平均的,而是在这个特定的集合中是一个超级令人印象深刻的分数。
请注意,如果我们更改标准差,会发生什么。相比,如果平均值为50,
and the standard deviation is 2, then 60 is very, very far above the average. Because it's the standard deviation that tells us how meaningful a certain
separation from the mean is, mathematicians generally use the standard deviation of any set as a unit of measurement within that set.
If this individual is 10 units above the mean and the standard deviation is 2, then this individual is 5 standard deviations above the mean.
Now it's hard to emphasize how extraordinary that would be, 5 standard deviations above the mean.
In terms of musical abilities, that would be the best musician in the world. In terms of athletic abilities, that would be the best athlete in the world.
That would be just someone off the charts, like, once every hundred years kind of talented.
That's what, that's how impressive this score would be if it were five standard deviations above the mean.
Here's a practice problem, pause the video and then we'll talk about this.
On a certain test, the score had a mean of 300 and a standard deviation of 25.
If John scored three standard deviations above the mean, what was John's score? Well, the mean is 300, and he scored 3 25s above 300.
嗯,这将是300加75,这是375.所以,事实证明,这样的问题是,
actually involves a very, very simple calculation.
你只是在做,你必须不受问题本身吓倒。
最终主题非常先进,涉及标准偏差的实际计算。
The test will not, repeat, not ask you to calculate the standard deviation of the list from scratch, but on a very advanced question, it could ask
about some detail, some concept related to the details of this calculation.
以下是计算中的步骤。
So, first of all, we start with the list of numbers, we have to find the mean. As we said we subtract the mean and
this creates a second list, the list of deviations. Then we're gonna take that list of deviations, some are positive,
some are negative, we're gonna square that list.
So that will be a list of squared deviations. A third list, that list will all positive numbers because we've squared them.
Now we're gonna take an average of this third list. The average squared deviation, and that number's actually called the variance.
一旦我们有方差,我们就会采取平方根,这是标准偏差。
Okay, so now we'll do a sample calculation. And we'll start with a very simple list, just the integers from 1 to 9.
So this is a nice symmetrical evenly spaced list, of course the number in the middle, 5, that would be both the mean and the median of this list.
所以我们有平均值,所以很容易弄清楚偏差。所以你得到了偏差列表,我只是将这个列表编号一号和
subtract five from every number on the list.
所以,5当然具有零的偏差。少于五个的数字具有负偏差。
The numbers bigger than five have positive deviations. That's the second list.
Now to get the third list, we just square everything. So those are the squares, notice 0 squared is 0,
everything else is positive, so now we have the list of the squared deviations.
Now, we average that third list. The average of that third list is something called the variance.
So the variance, the average of that list happens to be 20 over 3. That's the variance, and that is the variance of the first list,
the first list has a variance of 20 over 3. To find the standard deviation, we take the square route of the variance.
所以我们可以以激进的形式写入,我们可以简化那种激进,如果我们想要,通常标准偏差只是写成十进制。
So, we'll write it in the decimal form. So notice that what we found here, 2.582, this is the standard deviation of
the list from 1 to 9, and also because of what we learned in the last video, this should be the standard deviation of any nine consecutive integers.
因此,任何非连续整数都会具有2.582的标准差。
这是标准的整个计算。在所有血统细节中偏离。
The test will not ask you to repeat that entire procedure. Conceivably on the very hardest quant problems,
it could present some part of that procedure, it could ask about some detail.
Detail. One thing to notice,
incidentally, because we're squaring, numbers that are larger, numbers that have larger deviations make a larger contribution to the standard deviation.
A much larger contribution. The effect of squaring amplifies the eff,
the input of the numbers that have larger deviations.
That's an important thing to notice. You may wonder why the standard deviation is defined in this particular way.
这与主要使用有关。我们可以找到任何名单的标准差
set, but in some ways the standard deviation is designed to accompany the normal distribution, which we will discuss in the next video.
Here's a practice problem. Pause the video and then we'll talk about this.
Okay. So let's talk about this problem. A camp hs 30 girls whose heights have an average of 130 centimeters and
标准偏差为四厘米。假设有四个女孩加入营地,所以将在一起34。
Which set of heights for these four additional girls would most increase the standard deviation of all the girls at the camp?
当然,如果我们希望至少增加标准,如果我们想减少标准偏差,我们就会添加女孩。
Who all had values of 130. They'd all be equal to the mean.
Well, we don't have any like that but notice with C, two of them are equal to the mean and the other two are very close to the mean, so
all four of them are closer to the mean then the standard deviation. So that's gonna decrease the mean.
所以这肯定不会增加。如果我们在B中添加那些,请注意所有四个。
We have deviations of negative three, negative one, one, and three. So those are going to all be deviations that again are less
than the deviation, so that's alc, also gonna decrease the standard deviation. So that's not gonna be right.
非常有趣,如果我们看一下,我们有偏差为负四,负四个,四和四个。
这些数字中的所有四个数字都是远离平均值的标准偏差。所以,事实上,添加这四个女孩将保持标准偏差
exactly four, because we're just adding more standard deviations, more deviations of the same value.
所以,这会保持不变,所以这不是一个增加。所以唯一留下的那个是D,如果我们看D,
我们添加有四个异常值。
Each one of them is two and a half standard deviations from the mean and that's far away from the mean, and so
that's gonna change the mean of the whole camp, it's gonna upset all the deviations. And the net result is that you're gonna have much larger deviations and
a much larger standard deviation. So D is the answer.
In summary, we discussed the effect of the stan,
on the standard deviation of including a new pair of numbers in the set. And notice that we can talk about that most sensibly for
adding two numbers that don't change the mean. We discussed using the standard deviation as a unit to
indicate the position of an individual in a large population, talking about how many standard deviations above or below the mean.
And we discussed the technical dat, details of the exact calculation for the standard deviation.
Read full transcript