Although dose-response curves have been widely used as efficacy readouts in the life sciences, methods are needed to improve quality control for bioassay dose-response curves. In this report, we propose constructing simultaneous prediction intervals dose-response curves as a quality control estimate of future generated curves with a predetermined level of probability. In the absence of curve fitted parameters, sample means, variances, and covariances of the responses at various doses were used to construct an ellipsoid prediction region using a multivariate technique and a prediction band using the Studentized maximum modulus technique. Based on our simulation results, the prediction region is more applicable when there are response correlations among the doses, whereas the prediction band is more applicable in the absence of response correlations.
Keywords: Dose-response curves; Prediction region; Prediction band; Bioassay; Quality control
Dose-response curves are widely used to assess pharmacological, radiological, or toxicological effects in medical research and clinical applications. A dose-response design can provide evidence of causal effects between exposure/treatment and responses; however, such conclusions heavily rely on the quality of the dose-response curves. Thus there is considerable interest in developing analytical tools to control curve quality.
In common practice, dose-response curves are assumed to follow a parametric family, such as linear, exponential, or sigmoid distributions. An advantage of this parametric approach is that the curve information can be easily described by a single metric, such as half maximal effective concentration (EC50) or half maximal inhibitory concentration (IC50), by fitting a linear or non-linear model [1-5]. Usually, the curve fitting approach is suitable for dose-response data with smooth connecting points and curve pattern homogeneity; however, in many situations, dose-response curves generated using human specimens vary dramatically because of data heterogeneity. In such cases, it is not appropriate to use a model fitting approach to summarize dose-response curves. Instead, when curves are not approximated by a parametric model, it is preferable to develop empirical methods to summarize the dose-response data. A frequently used metric to summarize such dose-response curves is area under the curve (AUC) [6-8]. A commonly used method to compute AUC is based on the trapezoid rule to estimate the area under curve by connecting data points on the dose-response curve with straight line segments and then using the area under the polygon to approximate the actual area under the curve calculated by integration. This method has intuitive appeal and is easy to implement; however, it may underestimate the area when the curve is concave upward or overestimate it when the curve is convex upward. Furthermore, sometimes, dose-response curves with different curve patterns may share the same AUC values. Thus, there is a need to use whole dose-response curves instead of summarized curve metrics to assess their quality. Herein we propose constructing simultaneous dose-response curve prediction intervals using the whole dose response curves as an analytical tool for dose-response quality control.
Quality control systems are implemented to control every step that might introduce assay variation. For every new test condition, dose-response curves are generated to calibrate variations, and the test condition is adjusted correspondingly to meet the predefined standard criteria. The prediction region/band constructed from a group of standardized dose-response curves can then be used to predict whether dose-response curves generated under the new testing condition belong to the group. This approach serves as a quality control validation measure for both systemic errors, such as experimental condition change, machine calibration, and protocol modification, and random errors, such as sample preparation and technician operations. A commonly used model-free method to build prediction intervals for dose-response curves is based on independent responses for each individual dose. This approach is simple and easy to implement; however, it may lose data information and lead to misinterpretation of the results. To improve the method using individual prediction intervals, we developed two methods that use simultaneous prediction intervals. Specifically, we took a multivariate approach to construct simultaneous prediction regions and we extended simultaneous confidence bands into simultaneous prediction bands using the Studentized maximum modulus technique and applied them to dose-response curves [9-10].
Generally, the dose-response data can be expressed as a certain function of the responses (y) and the doses (x), i.e.
$y=f(x)$
. Let
${y}_{i1},{y}_{i2},{y}_{i3},\mathrm{...},{y}_{i{n}_{i}}$
be a group of dose response data collected at dose
${x}_{i}$
,
$i=1,2,3,\mathrm{...},k$
, respectively, and
${y}_{ij}\ge 0$
where
$j=1,2,3,\mathrm{...},{n}_{i}$
represents study subjects. Without extracting the dose-response curve by a single parameter, such as EC50, or AUC, interval estimates for prediction of the dose-response curves can be computed either by considering the responses
$y$
as correlated
$k$
-dimensional variables across the doses
$x$
or the response
${y}_{ij}$
given
${x}_{i}$
as one dimensional variables. For
$k$
-dimensional responses, a prediction region is built using a multivariate approach, whereas for one dimensional responses, a prediction band is built using the Studentized maximum modulus technique [10,11].
Prediction region: The concept of a multivariate prediction region is a simultaneous interval estimated by constructing a region that has (
$100\alpha $
)% probability of containing the next dose-response curve, or more generally, containing the sample means
$M$
of the next
$r$
dose-response curves. It is assumed that the next one or
$r$
dose-response curves are independent not only of one another but also of the
$n$
standard or previous dose-response curves. Assume
$Y=({y}_{1j},{y}_{2j},\mathrm{...},{y}_{kj})~MVN(\mu ,\Sigma )$
with unknown mean vector
$\mu =({\mu}_{1},{\mu}_{2},\mathrm{...},{\mu}_{k})$
and covariance matrix
$\Sigma $
. In practice,
$\mu $
s and
$\Sigma $
are often estimated by
${\overline{y}}_{i}=\frac{1}{{n}_{i}}{\displaystyle \sum _{j=1}^{{n}_{{}_{i}}}{y}_{ij}}$
and the sample covariance matrix
$S$
, respectively. The (
$100\alpha $
)% prediction region for the next
$r$
dose-response curves is
$\frac{nr}{n+r}{\left[M-\overline{Y}\right]}^{\text{'}}{\left[S\right]}^{-1}\left[M-\overline{Y}\right]=\frac{(n-1)k}{n-k}F\left(1-\alpha ,k,n-k\right)$
(1)
Where
$M=({m}_{1},{m}_{2},\mathrm{...},{m}_{k})$
and
${m}_{i}$
the mean of responses at dose
${x}_{i}$
of the testing curves and
$r$
is the number of testing curves;
$\overline{Y}=({\overline{y}}_{1},{\overline{y}}_{2},\mathrm{...},{\overline{y}}_{k})$
and
${\overline{y}}_{i}$
is the mean of the responses for the standard or historic curves at dose
${x}_{i}$
. Usually,
$r$
= 1. The left-hand side of equation (1) has the
${T}^{2}$
- distribution [12].
Prediction band: Since no parametric model and dose response function are hypothesized for the dose-response curve
$y=f(x)$
, then
${y}_{{n}_{i}}={f}_{{n}_{i}}({x}_{i})$
can be estimated by the sample means of the responses
${\overline{y}}_{{n}_{i}}$
at the non-decreasing serial doses
${x}_{1}<{x}_{2}<{x}_{3}<\mathrm{...}<{x}_{k}$
. The simultaneous
$1-\alpha $
prediction bands for
${y}_{{n}_{i}+1}={f}_{{n}_{i}+1}({x}_{i})$
${\overline{y}}_{{n}_{i}}-{t}_{1-\frac{\alpha}{2},k,n-k}{s}_{n}\sqrt{1+\frac{1}{{n}_{i}}}\le {f}_{{n}_{i}+1}({x}_{i})\le {\overline{y}}_{{n}_{i}}+{t}_{1-\frac{\alpha}{2},k,n-k}{s}_{n}\sqrt{1+\frac{1}{{n}_{i}}}$
(2)
Where
${t}_{1-\frac{\alpha}{2},k,n-k}$
is the upper
$\frac{\alpha}{2}$
point of the Studentized maximum modulus distribution with parameters
$k$
and
$n-k$
, and
${s}_{n}^{2}$
is the pooled estimate of the variance
${\sigma}^{2}$
. For
$r\ne 1$
, the simultaneous
$1-\alpha $
prediction band for
${f}_{{n}_{i}+r}({\overline{x}}_{r})$
,
${\overline{y}}_{{n}_{i}}-{t}_{1-\frac{\alpha}{2},k,n-k}{s}_{n}\sqrt{\frac{1}{r}+\frac{1}{{n}_{i}}}\le {f}_{{n}_{i}+r}({\overline{x}}_{r})\le {\overline{y}}_{{n}_{i}}+{t}_{1-\frac{\alpha}{2},k,n-k}{s}_{n}\sqrt{\frac{1}{r}+\frac{1}{{n}_{i}}}$
(3)
Under the assumptions of
${y}_{ij}~N({\mu}_{i},{\sigma}_{i})$
and that the mean and variance are unknown at dose
${x}_{i}$
, then we estimate the
${\overline{y}}_{ni}$
and
${s}_{n}^{2}$
as follows:
${\overline{y}}_{ni}=\frac{{\displaystyle \sum _{j=1}^{{n}_{i}}{y}_{ij}}}{{n}_{i}}$
(4)
${s}_{n}^{2}=\frac{1}{{\displaystyle \sum _{i=1}^{k}\left({n}_{i}-1\right)}}\left({\displaystyle \sum _{i=1}^{k}\left({n}_{i}-1\right){s}_{i}^{2}}\right)$
(5)
Where
${s}_{i}^{2}$
is the sample variance at each dose
${x}_{i}$
.
Simulations: To illustrate the methods using the prediction regions and the prediction bands, we simulated dose-response curve data using multivariate normal parameters estimated. For each simulated data point, thirty training dose-response curves and ten test dose-response curves were used to compare the testing results generated by the two prediction methods. Two scenarios were simulated, one with covariance as estimated from the original data and the other with covariance among the responses across all doses assumed to be zero. The simulation tests for each method were repeated 1000 times and the average numbers of testing curves falling in the prediction bands/regions and the corresponding variances are listed in Table 1. When using the prediction regions, the testing results were robust, whereas when using the prediction bands, the results varied with covariance. When the covariance among the responses was smaller, the test using prediction bands was more efficient than the prediction regions.
Method |
With Covariance |
Without Covariance |
Mean |
SD |
Mean |
SD |
Prediction Band |
9.809 |
0.461 |
9.792 |
0.493 |
Prediction Region |
9.992 |
0.089 |
9.262 |
0.995 |
Table 1: Test results using simulated data (x1000).
Prediction interval estimation is an important statistical tool for dose-response curve quality control [13-15]. Based on standard or previous curves one can construct an interval estimate for future generated curves with predefined criteria. These interval estimates can be used to adjust newly produced curve(s) by calibrating experimental conditions to avoid systematic and random errors. Commonly used analytical methods for dose-response data are either based on parametric EC50 or empirical AUC. Both of these summarized metrics are problematic when dose response curves are irregular and do not follow certain parametric distribution. In this report we developed two simple methods, ellipsoidal prediction regions and simultaneous prediction bands, to predict testing dose-response curve(s) as quality control analytical tools for dose-response experimental designs. Both methods involve the construction of simultaneous interval estimates for a group of dose-response curves to predict that individual testing dose-response curves belong to the group of curves. These simultaneous prediction interval estimates can be easily and quickly derived for these decreasing dose-response curves, and do not rely on a parametric modeling. These simple methods offer an alternative to nonlinear regression techniques that are model dependent and computational intensive. Sometimes the proposed methods are more robust for those dose-response curves not belonging to a known family. The prediction region is preferred when the correlation of the responses among the series of doses is strong, whereas the prediction band is suitable for those dose-response curves where the correlation is weak or there are no correlations. Despite the advantages, there are restrictions to using these methods. For the prediction region, multivariate normal distribution is required [16], while for the prediction band, the responses at each dose point need to be normally distributed. When compared with the prediction band method, the prediction region method is more efficient when there are response correlations among the doses.