Standard Deviation
From DrugPedia: A Wikipedia for Drug discovery
In probability and statistics, the standard deviation is a measure of the dispersion of a collection of values. It can apply to a probability distribution, a random variable, a population or a data set. The standard deviation is usually denoted with the letter σ (lowercase sigma). It is defined as the root-mean-square (RMS) deviation of the values from their mean, or as the square root of the variance. Formulated by Galton in the late 1860s, the standard deviation remains the most common measure of statistical dispersion, measuring how widely spread the values in a data set are. If many data points are close to the mean, then the standard deviation is small; if many data points are far from the mean, then the standard deviation is large. If all data values are equal, then the standard deviation is zero. A useful property of standard deviation is that, unlike variance, it is expressed in the same units as the data.
When only a sample of data from a population is available, the population standard deviation can be estimated by a modified standard deviation of the sample, explained below.
Contents |
[edit] Definition
[edit] Standard deviation of a probability distribution or random variable
The standard deviation of a (univariate) probability distribution is the same as that of a random variable having that distribution.
[edit] Standard deviation of a continuous random variable
Continuous distributions usually give a formula for calculating the standard deviation as a function of the parameters of the distribution.
[edit] Standard deviation of a discrete random variable or data set
The standard deviation of a discrete random variable is the root-mean-square (RMS) deviation of its values from the mean.
If the random variable X takes on N values (which are real numbers) with equal probability, then its standard deviation σ can be calculated as follows:
- Find the mean, , of the values.
- For each value xi calculate its deviation () from the mean.
- Calculate the squares of these deviations.
- Find the mean of the squared deviations. This quantity is the variance σ2.
- Take the square root of the variance.
The standard deviation of a data set is the same as that of a discrete random variable that can assume precisely the values from the data set, where the point mass for each value is proportional to its multiplicity in the data set.
[edit] Interpretation and application
A large standard deviation indicates that the data points are far from the mean and a small standard deviation indicates that they are clustered closely around the mean.
For example, each of the three data sets {0, 0, 14, 14}, {0, 6, 8, 14} and {6, 6, 8, 8} has a mean of 7. Their standard deviations are 7, 5, and 1, respectively. The third set has a much smaller standard deviation than the other two because its values are all close to 7. In a loose sense, the standard deviation tells us how far from the mean the data points tend to be. It will have the same units as the data points themselves. If, for instance, the data set {0, 6, 8, 14} represents the ages of four siblings in years, the standard deviation is 5 years.
As another example, the data set {1000, 1006, 1008, 1014} may represent the distances traveled by four athletes, measured in meters. It has a mean of 1007 meters, and a standard deviation of 5 meters.
Standard deviation may serve as a measure of uncertainty. In physical science for example, the reported standard deviation of a group of repeated measurements should give the precision of those measurements. When deciding whether measurements agree with a theoretical prediction, the standard deviation of those measurements is of crucial importance: if the mean of the measurements is too far away from the prediction (with the distance measured in standard deviations), then the theory being tested probably needs to be revised. This makes sense since they fall outside the range of values that could reasonably be expected to occur if the prediction were correct and the standard deviation appropriately quantified.
[edit] Real-life examples
The practical value of understanding the standard deviation of a set of values is in appreciating how much variation there is from the "average" (mean).
[edit] Weather
As a simple example, consider average temperatures for cities. While two cities may each have an average temperature of 15 °C, it's helpful to understand that the range for cities near the coast is smaller than for cities inland, which clarifies that, while the average is similar, the chance for variation is greater inland than near the coast.
So, an average of 15 occurs for one city with highs of 25 °C and lows of 5 °C, and also occurs for another city with highs of 18 and lows of 12. The standard deviation allows us to recognize that the average for the city with the wider variation, and thus a higher standard deviation, will not offer as reliable a prediction of temperature as the city with the smaller variation and lower standard deviation.
[edit] Sports
Another way of seeing it is to consider sports teams. In any set of categories, there will be teams that rate highly at some things and poorly at others. Chances are, the teams that lead in the standings will not show such disparity, but will be pretty good in most categories. The lower the standard deviation of their ratings in each category, the more balanced and consistent they might be. So, a team that is consistently bad in most categories will have a low standard deviation. A team that is consistently good in most categories will also have a low standard deviation. A team with a high standard deviation might be the type of team that scores a lot (strong offense) but also concedes a lot (weak defense), or, vice versa, that might have a poor offense but compensates by being difficult to score on—teams with a higher standard deviation will be more unpredictable.
Trying to predict which teams, on any given day, will win, may include looking at the standard deviations of the various team "stats" ratings, in which anomalies can match strengths vs. weaknesses to attempt to understand what factors may prevail as stronger indicators of eventual scoring outcomes.
In racing, a driver is timed on successive laps. A driver with a low standard deviation of lap times is more consistent than a driver with a higher standard deviation. This information can be used to help understand where opportunities might be found to reduce lap times.
[edit] Finance
In finance, standard deviation is a representation of the risk associated with a given security (stocks, bonds, property, etc.), or the risk of a portfolio of securities (actively managed mutual funds, index mutual funds, or ETFs). Risk is an important factor in determining how to efficiently manage a portfolio of investments because it determines the variation in returns on the asset and/or portfolio and gives investors a mathematical basis for investment decisions (known as mean-variance optimization). The overall concept of risk is that as it increases, the expected return on the asset will increase as a result of the risk premium earned – in other words, investors should expect a higher return on an investment when said investment carries a higher level of risk, or uncertainty of that return. When evaluating investments, investors should estimate both the expected return and the uncertainty of future returns. Standard deviation provides a quantified estimate of the uncertainty of future returns.
For example, let's assume an investor had to choose between two stocks. Stock A over the last 20 years had an average return of 10%, with a standard deviation of 20% and Stock B, over the same period, had average returns of 12%, but a higher standard deviation of 30%. On the basis of risk and return, an investor may decide that Stock A is the safer choice, because Stock B's additional 2% points of return is not worth the additional 10% standard deviation (greater risk or uncertainty of the expected return). Stock B is likely to fall short of the initial investment (but also to exceed the initial investment) more often than Stock A under the same circumstances, and is estimated to return only 2% more on average. In this example, Stock A is expected to earn about 10%, plus or minus 20% (a range of 30% to -10%), about two-thirds of the future year returns. When considering more extreme possible returns or outcomes in future, an investor should expect results of up to 10% plus or minus 90%, or a range from 100% to -80%, which includes outcomes for three standard deviations from the average return (about 99.7% of probable returns). Actually, neither stock A or B is a good choice because an S&P 500 index fund over the same period would have had about the same 10% average return and a lower standard deviation of about 15%, because of the diversification benefit of 500 stocks versus one stock, making it always the statistically preferred investment.
Calculating the average return (or arithmetic mean) of a security over a given number of periods will generate an expected return on the asset. For each period, subtracting the expected return from the actual return results in the variance. Square the variance in each period to find the effect of the result on the overall risk of the asset. The larger the variance in a period, the greater risk the security carries. Taking the average of the squared variances results in the measurement of overall units of risk associated with the asset. Finding the square root of this variance will result in the standard deviation of the investment tool in question.
[edit] Original Source
This article was originally posted in Wikipedia.