Equation 1 describes the expected concentration, C(d), at a distance, d,
away from a source. Where C(0) is the concentration at the source.
The hyperparameter, *α _{E}*, describes the
distance at which the concentration from the source has been reduced by 95%.

(1)

When there are N source nodes, the concentrations observed for M sampling points
(*i*=1,…,M), can be expressed as a sum of the EDC from each *j ^{th}*
source with initial concentrations,

(2)

The *exponential decay range* may be obtained with experimental data;
however, this can be costly when the number of source nodes, N, is large.
Therefore, methods for estimating the *exponential decay range* can be useful.
Messier et al. (Messier et al. 2014) estimates the *exponential decay range* by using *Q* different
values for the hyper parameter to construct *Q* different predictor variables. Each of these
predictors can be used in regressions against observations from sampling data.
Whichever *q ^{th}* hyperparameter leading to the

However, simulations of sources, observations, and different hyperparameters suggest that maximizing the coefficient associated with the predictor variable is often equivalent (as seen in the demonstration below) to maximizing the fit of the model, so long as the constructed predictor variable is standardized (i.e. with interquartile range method, Z-score). The coefficient can be left as the standardized coefficient or for some data may be expressed as a risk ratio (RR), an abundance ratio (AR), or relative abundance ratio (RAR), depending on the measurement and transformation of the response data. For example, if the data is log(y), where y is either a 1 or 0, then a RR can be determined by taking the exponent of the coefficient. If the data is a relative abundance of species with a log10 transformation, the RAR can be determined by raising 10 to the power of the coefficient.

In the below demonstration, response data, y, was created using the following linear model:

(3)

Here *β*_{0} represents an intercept value, which has been set to 0.2.
*β*_{1} represents a coefficient
value, which has been set to 0.25. The constructed predictor variable, x_{1}, represents
an estimation of concentrations at the sampling locations modeled using the SEDC from
7 source nodes of the same source type with a true *exponential decay range* for two
scenarios. In the first scenario, the true *exponential decay range* is 2 km and in the
second scenario it is 4 km. Some error, *ε*, has also been simulated.

GOAL: estimate these *exponential decay range*s for each scenario without a priori
knowledge of the true *exponential decay range* using a linear model.

If helpful, you can imagine that these sources are sprinklers which spray swine manure onto fields. In this example we might be trying to measure the relative abundance of swine fecal matter in the air at 20 different sampling locations. This estimation can be utilized in order to come up with exposure maps that may help in determining potential microbial exposures that nearby residents might face.

We can estimate the SEDC continuously across our whole study by using an estimation grid.
We can also construct our predictor variable, x1, by estimating at the sampling locations.
We can then run a regression of our constructed predictor variable, x1, against our measured
observations, or response data, y, at our 20 sampling sites. We can keep track of the fit
by measuring R-squared and we can keep track of the RAR. By doing so, we can find the optimal
hyperparameter value to describe the *exponential decay range*.

Below there are two examples using simulated data. In the top example, the exponential
decay range was set 2 km and in the bottom example it was set to 4 km. The locations of
sources are fixed, however the way we model the contributions from those sources differs
as we vary the hyperparameter *α _{E}*. With our model, we can find the sum of contributions at
any point in our study area and make a map of this estimate by interpolating values from an
estimation grid. Also, we could instead make estimates of the contributions at the sampling
locations instead of the estimation grid locations in order to construct our predictor
variable. We can then compare those contributions to our observations at those sampling
locations using regression. Through regression we can obtain coefficients and with a little
extra work, we can obtain estimates of fit (i.e. R-squared, AIC, etc.).

We can compare the r-squared values associated with this regression for different hyperparameter values, where the hyperparameter that corresponds to the best fit is the most predictive hyperparameter value and the hyperparameter value corresponding to the largest coefficient is the most physically meaningful hyperparameter value. These are essentially the same, however it can be slightly less computationally expensive to use the coefficient method because estimating the R-squared is one more step.

Essentially this is the maximization of an objective function, where the objective function is either a function of the error associated with the regression or the coefficient values. This can be done numerically as shown below or with algorithms that are less computationally expensive (i.e. gradient descent).

Use the toggle located under the figures to see how different values of *α _{E}* change the
figures on the bottom showing the EDC, SEDC, the constructed predictor variable x1, and the resulting R-squared and RAR on the right.

True *α _{E}* =

2km

4km

Estimation *α _{E}*

(kilometers)

Messier, K.P., Kane, E., Bolich, R. and Serre, M.L. 2014. Nitrate variability in groundwater of North Carolina using monitoring and private well data models. Environmental Science & Technology 48(18), pp. 10804–10812.