Nonparametric Estimation of Slope:
Sen's Method in Environmental Pollution

by J. Steven Brauner


Introduction
Options for Detection and Estimation of Trends
Sen's Method
Example Problem
References

Introduction

Detection of chemical concentration trends in environmental contaminants is a critical step in assessing the environmental condition of a given system. For example, positive identification of contaminant concentration trends can assist in both proving plume migration and demonstrating evidence of groundwater contaminant degradation. When considering degradation of a contaminant, conclusive demonstration of the movement of a particular contaminant (e.g. petroleum hydrocarbons) into an area, followed by a subsequent decrease in oxygen concentration (and other potential electron acceptors), while showing an increase in reaction by-products, may be needed to prove contaminant degradation. Each of these individual trends alone may not adequately support a claim of biologically based degradation, but the sequential appearance of a biodegradable contaminant, disappearance of an appropriate electron acceptor, and development of reaction end-products is increasingly being accepted as evidence of in situ remediation.

One of the difficulties encounter in the interpretation of environmental field data is the quantification of trends (e.g. calculation of slope) and demonstration that this estimation of trend is statistically different from zero. The focus here is on one non-parametric method used in determining the presence of slope and is known simply as Sen's Nonparametric Estimator of Slope. Both the methodology and an example (using an artificial groundwater data set) for Sen's method is provided.

Return to Table of Contents


Options for Detection and Estimation of Trends

Several tests are available for the detection and/or quantification of trends. The first step in analyzing any data set, however, is to graph the data, usually as a function of space or location. Graphical representations of data facilitates observation of general trends and cycles which may assist in the selection of an appropriate statistical test.

Table 1 summarizes various methods for detecting and/or estimating trends using various techniques. Each technique has advantages and disadvantages, so care should be taken to carefully examine the type and volume of data collected before selecting a particular technique.

Table 1. Methods for Detection and Estimation of Trends.

Test Procedure Applicability Notes Reference(s)
Graphical Methods Visual estimate of trend presence/absence No quantifiable results  
Linear Regression Provides an estimate of slope, confidence interval, and quantifies goodness of fit Allows quantified estimate of influence of multiple independent variables
Does not handle missing data
Does not handle BD measurements
May be greatly affected by outliers and cyclic data
 
Box-Jenkins Model Test for trends in long term, regularly spaced data Requires large data set
Requires constant temporal spacing of data sets
Box and Jenkins (1976)
Mann-Kendall Yes/No test for existence slope
Non-parametric test
Allows missing data
Allows reporting of levels BD
Not affected by gross data errors and outliers
Mann (1945)
Kendall (1980)
Sen's Method Estimates value and confidence interval for slope Allows missing data
Makes no assumptions on distribution of data
Not affected by gross data errors and outliers
Sen (1968)
Thiel (1950)

Return to Table of Contents


Sen's Method

Sen's method for the estimation of slope requires a time series of equally spaced data. Sen's method proceeds by calculating the slope as a change in measurement per change in time, as shown here in Equation (1) and Table 2 for the simple case of one data measurement per time spacing.


                                                EQUATION 1.(1)

Table 2. Data Manipulation for Sen's method with one data measurement per time series.

Time
Data
1
X1
2
X1
3
X1
... 5
X1
T
XT
0


X2-X1
2-1

X3-X1
3-1
X3-X2
3-2

...

XT-1-X1 
T-2
XT-1-X2 
T-3


:
XT-1-XT-2
1

XT-X1
T-1
XT-X2
T-2
XT-X3
T-3
:
XT-XT-2
2
XT-XT-1
1

If multiple data measurements are collected at a given time, two options exist if multiple measurements are recorded for a given time step. The first option is to simply combine the measurements for a given time step into a single measurement of central tendency (e.g. mean, median) and proceed as above. The second option is to calculate a slope for each individual measurement, as shown in Table 3 below. Note that the slope between measurements collected at the same time is not calculated.

Table 3. Data Manipulation for Sen's method with multiple data measurements per time series.

Time
Data
1
  X1,1
1
X1,1
2
X2,1
2
X2,2
2
X2,3
3
X3,1
... T
XT,J-1



NC X2,1-X1,1
2-1
X2,1-X1,2
2-1
X2,2-X1,1
2-1
X2,2-X1,2
2-1
NC
X2,3-X1,1
2-1
X2,3-X1,2
2-1
NC

NC

X3,1-X1,1
3-1
X3,1-X1,1
3-1
X3,1-X2,1
3-2
X3,1-X2,2
3-2
X3,1-X2,3
3-2
... XT,J-X1,1 
T-1
XT,J-X1,2 
T-1
:
:
:
:
XT,J-XT-1,J-1
1
NC
:
NC

Upon calculation of slope by either method outlined above, Sen's estimator of slope is simply given by the median slope, shown below as:
                                    Sen's Estimator of Slope = median slope = Q'
                                                                          = Q[(N'+1)/2]                             if N' is odd,                (2)
                                                                          = ( Q[N'/2] + Q[(N'+2)/2])/2        if N' is even    

Sen's Method also allows determination of whether the median slope is statistically different from zero. A confidence interval is developed by estimating the rank for the upper and lower confidence interval and using the slopes corresponding to these ranks to define the actual confidence interval for Q'. For a two-sided confidence interval about the median slope, first find the Zstatistic for a two-tailed normal distribution test. For example, if a two-sided confidence interval of 95% is desired, find Z(1-0.05/2) = Z0.975 = 1.96. Next, estimate the variance of the Mann-Kendall statistic (VAR(S)) as developed by Kendall (1975):

Gilbert (1987) notes that Equation 3 is valid for all n>40, while Kendall indicates that Equation 3 may be used for n between 10 and 40 as long as there are not many tied data pairs. To estimate the range of ranks for the specified confidence interval, find C using:

Using the value of Equation 4, find the ranks of the lower (M1) and upper (M2 + 1) confidence limits using:

Finally, choose the slopes corresponding to M1 and M2+1 as the lower and upper confidence limits, respectively. Note that the median slope is then defined as statistically different from zero (for the selected confidence interval) if the zero does not lie between the upper and lower confidence limits.

Potential Difficulties

Missing Data: For Sen's test, simply do not calculate a slope for the missing data point (making sure not to count missing data points in with the total number of samples, n). If large amounts of data are missing, Sen's method is not recommended.

No Detection (ND) or Trace Data: Sen's method may still be used to predict a median slope if the number of ND measurements is less than (n-1)/2, but may severely limit the prediction of a confident interval about this estimate. Gilbert recommends setting BD measurements equal to 1/2 the detection limit and proceeding with calculation of individual slopes. Also note that implementing Sen's test with ND may severely impair the ability to predict confidence intervals.

Return to Table of Contents


Example 1. Application of Sen's Method

For this example, let us assume that we are trying to prove that a toxic subsurface contaminant is degrading in the presence of oxygen. Let us also assume that the biochemical degradation reaction produces a known reaction end product. Given the fictitious field measurements tabulated below (Figure 1) for the concentration of contaminant, dissolved oxygen, and reaction end product, show that the following trends are statistically significant:

Figure 1. Animated graphical representation of concentration data.
(If animation has stopped, please press Reload on your Web Browser.)

Step 1. Calculate slope for each data point .

For Sen's method, a slope is shown to be statistically different from zero if zero does not exist within a two-sided confidence interval about the median slope estimate. For this example, Equation 1 or the first equation in Table 2 may be used to calculate slope for contaminant concentration change from March 1991 (Time =1) to June 1991 (Time = 2).  The slope is given by:

.

Similarly, the calculated slopes for the December 1991 data set (Column 6 of Table 2) are calculated as:

.

Note that the data reported in Figure 1 is provided on a quarterly basis, and thus the temporal spacing of the data has been translated into equal time periods (corresponding to a yearly quarter) and the slopes have the units of mg/L/quarter-year.  If the dates of collection had been provided, individual slopes could have been calculated as change in concentration per day.

Continued calculation of individual slopes leads to the values compiled for the contaminant in Table 4, dissolved oxygen in Table 5, and the reaction end product in Table 6. Note that the red values in these tables indicate a decreasing slope, the blue values indicate an increasing slope, and the black values indicate no change in slope.

Table 4. Calculation of slopes for contaminant.

Table 5. Calculation of slopes for dissolved oxygen.

Table 6. Calculation of slopes for reaction end product.

Step 2. Rank the calculated slopes (listed in Tables 4, 5, & 6). Determine the median slope, Q', for each compound.

 (Click here to see animated calculation of median slope determination.)

Step 3. Test Sen's Estimate of Slope for statistical difference from zero for a 90% confidence interval.

Step 4. Results of Example.

As shown in Table 7, the median contaminant and dissolved oxygen slopes are negative (decreasing) and the median slope of the reaction end product is increasing. Furthermore, these slopes are shown to be different from zero for a 90 percent confidence interval. Thus, all three of the original hypotheses have been met and the conclusion can be made that the contaminant is degrading while dissolved oxygen is decreasing and reaction end product are increasing.

Return to Table of Contents


References

  1. Box, G.E.P. and G.M. Jenkins. 1976. Time Series Analysis: Forecasting and Control,2nd Edition. Holden-Day, San Francisco.
  2. Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring. van Nostrand Rienhold Company, Inc., New York.
  3. Hollander, M. and D.A. Wolfe. 1973. Nonparametric Statistical Methods. John Wiley & Sons, New York.
  4. Sen, P.K. 1968. Estimates of the regression coefficient based on Kendall's tau. Journal of the American Statistical Association. 63:1379-1389
  5. Thiel, H. 1950. A rank-invariant method of linear and polynomial regression analysis, Part 3.Proceedings of Koninalijke Nederlandse Akademie van Weinenschatpen A. 53:1397-1412.



Sampling & Monitoring Primer Table of Contents

Previous Topic

Next Topic

Send comments or suggestions to:
Student Author: J. Steven Brauner, sbrauner@vt.edu
Faculty Advisor: Daniel Gallagher, dang@vt.edu
Copyright © 1997 Daniel Gallagher
Last Modified: 2May1997