Robust Scale Estimates
A robust scale estimate, also known as a robust measure of scale, attempts to quantify
the statistical dispersion (variability, scatter, spread) in a set of data which may contain outliers.
In such datasets, the usual variance or standard deviation scale estimate can be rendered useless
by even a single outlier.
\(S_n\) Statistic
The \(S_n\) statistic developed by Croux and Rousseeuw is defined as
\[S_n = 1.1926 \times c_n \times \textrm{median}_i \left\{ \textrm{median}_j \left( \left| x_i - x_j \right| \right) \right\}\]
For each sample \(x_i, 1 \le i \le n\), the median of the values \(\left| x_i - x_j \right|\) is computed for all
\(x_j, 1 \le j \le n\). This yields \(n\) values, whose median then gives the final \(S_n\).
The factor \(1.1926\) makes \(S_n\) an unbiased estimate of the standard deviation for Gaussian data.
The factor \(c_n\) is a correction factor to correct bias in small sample sizes. \(S_n\) has an asymptotic
efficiency of 58%.
-
gsl_stats_Sn0_from_sorted_data(sorted_data)
-
gsl_stats_Sn_from_sorted_data(sorted_data)
These functions return the \(S_n\) statistic of sorted_data.
The elements of the array must be in ascending numerical order.
There are no checks to see
whether the data are sorted, so the function gsl_sort()
should
always be used first. The Sn0
function calculates
\(\textrm{median}_i \left\{ \textrm{median}_j \left( \left| x_i - x_j \right| \right) \right\}\)
(i.e. the \(S_n\) statistic without the bias correction scale factors).
\(Q_n\) Statistic
The \(Q_n\) statistic developed by Croux and Rousseeuw is defined as
\(Q_n = 2.21914 \times d_n \times \left\{ \left | x_i - x_j \right | , i < j \right\}_{ (k) }\)
The factor \(2.21914\) makes \(Q_n\) an unbiased estimate of the standard deviation for Gaussian data.
The factor \(d_n\) is a correction factor to correct bias in small sample sizes.The order statistic
is
\[\begin{split}k = \left(
\begin{array}{c}
\left\lfloor \frac{n}{2} \right\rfloor + 1 \\
2
\end{array}
\right)\end{split}\]
\(Q_n\) has an asymptotic efficiency of 82%.
-
gsl_stats_Qn0_from_sorted_data(sorted_data)
-
gsl_stats_Qn_from_sorted_data(sorted_data)
These functions return the \(Q_n\) statistic of sorted_data.
The elements of the array
must be in ascending numerical order.There are no checks to see
whether the data are sorted, so the function : func:gsl_sort()
should
always be used first.The Qn0
function calculates
\(\left\{ \left| x_i - x_j \right|, i < j \right\}_{(k)}\)
(i.e. \(Q_n\) without the bias correction scale factors).