Median and Percentiles¶
The median and percentile functions described in this section operate on sorted data in \(O(1)\) time. There is also a routine for computing the median of an unsorted input array in average \(O(n)\) time using the quickselect algorithm. For convenience we use quantiles, measured on a scale of 0 to 1, instead of percentiles (which use a scale of 0 to 100).
-
gsl_stats_median_from_sorted_data(data)¶
This function returns the median value of sorted_data.
The elements of the array
must be in ascending numerical order. There are no checks to see
whether the data are sorted, so the function gsl_sort()
should
always be used first.
When the dataset has an odd number of elements the median is the value of element \((n-1)/2\). When the dataset has an even number of elements the median is the mean of the two nearest middle values, elements \((n-1)/2\) and \(n/2\). Since the algorithm for computing the median involves interpolation this function always returns a floating-point number, even for integer data types.
-
gsl_stats_median(data)¶
This function returns the median value of data, a dataset The median is found using the quickselect algorithm. The input array does not need to be sorted.
-
gsl_stats_quantile_from_sorted_data(data, f)¶
This function returns a quantile value of sorted_data. The elements of the array must be in ascending numerical order. The quantile is determined by the f, a fraction between 0 and 1. For example, to compute the value of the 75th percentile f should have the value 0.75.
There are no checks to see whether the data are sorted, so the function
gsl_sort()
should always be used first.
The quantile is found by interpolation, using the formula
\[\hbox{quantile} = (1 - \delta) x_i + \delta x_{i+1}\]
where \(i\) is floor((n - 1)f)
and \(\delta\) is
\((n-1)f - i\).
Thus the minimum value of the array (data[1]
) is given by
f equal to zero, the maximum value (data[n]
) is
given by f equal to one and the median value is given by f
equal to 0.5. Since the algorithm for computing quantiles involves
interpolation this function always returns a floating-point number, even
for integer data types.