# Robust Location Estimates¶

A location estimate refers to a typical or central value which best describes a given dataset. The mean and median are both examples of location estimators. However, the mean has a severe sensitivity to data outliers and can give erroneous values when even a small number of outliers are present. The median on the other hand, has a strong insensitivity to data outliers, but due to its non-smoothness it can behave unexpectedly in certain situations. GSL offers the following alternative location estimators, which are robust to the presence of outliers.

## Trimmed Mean¶

The trimmed mean, or truncated mean, discards a certain number of smallest and largest samples from the input vector before computing the mean of the remaining samples. The amount of trimming is specified by a factor $$\alpha \in [0,0.5]$$. Then the number of samples discarded from both ends of the input vector is $$\left\lfloor \alpha n \right\rfloor$$, where $$n$$ is the length of the input. So to discard 25% of the samples from each end, one would set $$\alpha = 0.25$$.

gsl_stats_trmean_from_sorted_data(data, alpha)

This function returns the trimmed mean of sorted_data. The elements of the array must be in ascending numerical order. There are no checks to see whether the data are sorted, so the function gsl_sort() should always be used first. The trimming factor $$\alpha$$ is given in alpha. If $$\alpha \ge 0.5$$, then the median of the input is returned.

## Gastwirth Estimator¶

Gastwirth’s location estimator is a weighted sum of three order statistics,

$gastwirth = 0.3 \times Q_{\frac{1}{3}} + 0.4 \times Q_{\frac{1}{2}} + 0.3 \times Q_{\frac{2}{3}}$

where $$Q_{\frac{1}{3}}$$ is the one-third quantile, $$Q_{\frac{1}{2}}$$ is the one-half quantile (i.e. median), and $$Q_{\frac{2}{3}}$$ is the two-thirds quantile.

gsl_stats_gastwirth_from_sorted_data(sorted_data)

This function returns the Gastwirth location estimator of sorted_data. The elements of the array must be in ascending numerical order. There are no checks to see whether the data are sorted, so the function gsl_sort() should always be used first.