ISSN: 2378315X BBIJ
Biometrics & Biostatistics International Journal
Research Article
Volume 2 Issue 7  2015
Confidence Intervals for the Risk Ratio when Analyzing Bioassays in the Presence of Over Dispersion
Frank Schaarschmidt*
Leibniz Universität Hannover, Germany
Received: August 7, 2015  Published: October 26, 2015
*Corresponding author: Frank Schaarschmidt, Institute of Biostatistics, Leibniz Universität Hannover, Herrenhäuser Straße 2, D30419 Hannover, Germany, Tel: +49 511 762 5821; Fax: +49 511 762 4966; Email:
Citation: Schaarschmidt F (2015) Confidence Intervals for the Risk Ratio when Analyzing Bioassays in the Presence of Over Dispersion. Biom Biostat Int J 2(7): 00050. DOI: 10.15406/bbij.2015.02.00050
Abstract
Many bioassays that assess toxicity or mutagenicity give rise to clustered binomial data with relatively few replicated experimental units per treatment group. Confidence intervals for the risk ratio to control can then be used to interpret the relevance of effect size or test hypotheses of superiority, noninferiority or equivalence. A frequently observed property of clustered binomial data is overdispersion. So far, the available largesample confidence interval methods for ratios of proportions in presence of overdispersion have been validated for use in epidemiological settings with high numbers of clusters per exposure group.
In this paper, the coverage probability and symmetry of noncoverage of several available methods is investigated in an extensive MonteCarlo simulation study, for the small number of replications that are typical for a number of bioassays. An additional method is proposed that combines profile deviance intervals with the method of variance recovery. So far available confidence intervals have far too low coverage probabilities in the simulated settings. Their performance can be improved by restricting estimators of dispersion not to fall below the binomial variance and by using pooled dispersion estimators. The newly proposed method outperforms the so far available methods by showing coverage probabilities closest to the nominal level. All discussed methods are made available in an addon package for the R software.
Keywords: Ratio of proportions; Extrabinomial variance; Betabinomial; Monte carlo simulation; Coverage probability; Bioassay; Small sample performance
Abbreviations
FN: Fiellertype Intervals Applied on The Observed Proportions; BM: Binomial MOVERRWilson Method After SummingUp Observed Counts Across Replications; LOD: Asymptotic Method on The LogScale, Accounting for Overdispersion via Residual Variation; FOD: FiellerBaileyType Interval Accounting For Over dispersion Via Residual Variance; LBB: Delta Method on The LogScale Under The BetaBinomial Assumption; FBB: FiellerBaileytype Interval Under BetaBinomial Assumption; QBM: MOVERR for Quasibinomial Profile Deviance Intervals
Introduction
Various bioassays give rise to replicated binomial count data. For example, in ecotoxicological assays, fish larvae or daphnids in several tanks are exposed to different dosages of a substance and the number of dead or immobile animals per tank is used to assess the hazardousness of the substance. Usually, a small number of replicated tanks are used for each dosage under consideration. If some experimental conditions differ between tanks such that the proportion of dead or immobile daphnids is effected, counted numbers may show higher variance than expected under the binomial assumption, i.e., extrabinomial variability or overdispersion. Similar situations arise in the invivo micronucleus assay: the number of cells showing micronuclei is counted for a given number of exposed cells for each (randomized) animal, with the aim to assess the substances' potential to cause cytogenetic damage. Also here, a limited number of replications per dosage is performed, such that differences between animals in the invivo micronucleus animals may cause overdispersion. In summary, bioassays that lead to binomial data, often contain clustered replication, and thus make it possible to account for overdispersion in the data. However, the number of replications or clusters per treatment group that allows to assess the extend of overdispersion, is rather limited.
In the statistical analysis of such bioassays, major interest is usually in comparisons to the untreated control group. While a test on significance for the overall effect of dosage of the substance may be of preliminary interest, usually more detailed interpretation for the single dosages is required: Confidence intervals for the effect of given dosages compared to the untreated control or a positive control are required to interpret the toxicological relevance of the observed effect size. Tests of noninferiority (or equivalence) for given dosages compared to an untreated control [1] may be more important than an overall test on significant change in the event rates: In toxicological assessment, confidence is needed primarily when claiming no effect. Both approaches require an interpretable definition of the change of the rate of the detrimental event (death or immobility, presence of micronculei, malformations, etc.) compared to the control treatment: for judging relevance of an effect size or for the definition of a particular noninferiority margin [1].
For this reason, this paper is focused on the ratio of proportions (risk ratio). Compared to the plenty of publications considering the construction of confidence intervals for a single binomial proportion, as well as for differences, ratios or oddsratios of binomial proportions, the construction confidence intervals for risk ratios of overdispersed binomial data has received only little attention [2,3]. The available methods are all asymptotic methods. Their construction and their evaluation is usually motivated by their application to epidemiological studies, where the absence or presence of a disease is counted for a given number of individuals in clusters. In this context, clustering of individuals may arise from humans being clustered in families or locations or from repeated measurements within a given animal, or repeated animals within a given farm when veterinary epidemiology is concerned. In these settings, there is usually a relatively large number of clusters available (many families or locations, many farms or animals), such that simple asymptotic methods work well and the estimation of the overdispersion parameters is rather precise. This is not the case in bioassays that are the focus of this paper. In epidemiological settings it is also rather improbable to observe no single disease case across all clusters in one of the groups to be compared, or, conversely, to observe only disease cases across all clusters in one group. Such outcomes, however, plausibly occur in the untreated control groups of bioassays and may cause problems with Waldtype test and related confidence intervals [4]. Finally, in epidemiological studies, major interest is usually in estimation, such that the statistical evaluation of confidence interval methods for the risk ratio does plausibly focus on (twosided) coverage probability and interval width. In the case of bioassays, estimation as well as related hypothesis tests are of interest. In testing, onesided hypotheses are most relevant and may involve margins of noninferiority.
In consequence, previous recommendations of confidence intervals methods for risk ratios of overdispersed data, motivated by epidemiological applications, cannot directly transferred to their application to bioassays. In this paper, previously proposed confidence intervals are thus investigated with special focus on small to very small number of clusters (i.e. replications) and the possibility that all observations in one group may show the same event. Due to the need of estimation as well as onesided noninferiority tests, twosided as well as onesided coverage probabilities are investigated for a wide range of parameter combinations. Further, a new method based on a straightforward combination of profile deviance intervals, e.g. [5,6], and the method of variance recovery [7] is proposed and is shown to clearly outperform existing approaches under the assumption of a common level of overdispersion.
Material and Methods
Parameter and hypotheses of interest
Assume an experimental setup, where for each treatment i, there are Ji replicated experimental units (tanks, cages, animals, petri dishes, etc.) with index j = 1, …, Ji. In each experimental unit there is a number of nij biological units under observation, and the number of events of interest, xij, is counted in unit ij. These counted events may be death (or survival) of animals, mobility (or immobility) of daphnids, presence or absence of micronuclei, etc. Consider the comparison of one dose group, i = 1, to the untreated control, i = 0. Denote the unknown probability of events in the two groups by i, and parameter of interest is the risk ratio$p=\frac{{\pi}_{1}}{{\pi}_{0}}$
. Beside estimating ρ and displaying the uncertainty of this estimate in terms of a 95% confidence interval, decisions concerning onesided hypotheses on tests on noninferiority or superiority may be of interest. The particular choice of noninferiority margins, ρ0, may be fixed by convention, compare [1] suggesting ρ0 = 0.75 or ρ0 = 0.8 for certain applications. In other situations, it might be a matter of debate. Although general focus is in valid twosided confidence intervals (i.e., with coverage probability close to the nominal level), it will be further investigated whether confidence intervals do also provide valid upper and lower confidence limits and can thus be used to perform (approximate) level α test for onesided hypotheses.
Overdispersed binomial data
There are two wellknown approaches to model overdispersion in binomial data [8]. The quasibinomial approach models overdispersion by assuming the variancemeandependency
${V}^{QB}\left({X}_{ij}\right)=\varphi {n}_{ij}{\pi}_{ij}\left(1{\pi}_{ij}\right)$
,
where Ï• is the overdispersion parameter that inflates the binomial variance term by a common fold, independent of the sample size nij. In this parameterization, the binomial assumption,
${V}^{B}\left({X}_{ij}\right)={n}_{ij}{\pi}_{ij}\left(1{\pi}_{ij}\right)$
, is met for Ï• = 1.
The betabinomial distribution derives from a beta mixture of binomial distributions, i.e.,
${\pi}_{ij}\sim Beta({a}_{1},{b}_{1})$
, and${x}_{ij}\sim Binomial({n}_{ij},{\pi}_{ij})$
(1)
where $E({\pi}_{ij})=\frac{{a}_{i}}{({a}_{i}+{b}_{i})}$
, $E({x}_{ij})=\frac{{n}_{ii}{a}_{i}}{({a}_{i}+{b}_{i})}$
[9]. Denote the sum of the two parameters of the betaparameters by ${a}_{i}^{*}={a}_{i}+{b}_{i}$
. The variance of betabinomial counts, ${v}^{BB}({X}_{ij})$
is a function of nij, πij, ${a}_{i}^{*}$
[5,9]. When ${a}_{i}^{*}$
approaches$\infty $
, the variance of the betabinomial counts approaches that of binomial counts. Here, the overdispersion relative to the binomial variance,${v}^{BB}({X}_{ij})$
, is denoted by ${\varphi}^{BB}=\frac{{V}^{BB}({X}_{ij})}{{V}^{B}({X}_{ij})}$
and is a function of nij and ${a}_{i}^{*}$
. This allows to choose ${a}_{i}^{*}$
given nij such that the over dispersion${a}_{i}^{*}$
is a constant factor, namely ${\phi}^{BB}$
.
Main interest here is in the performance of confidence interval methods in highly controlled laboratory settings. Under such conditions it can be assumed that nij is equal for all experimental units, ij. Under such conditions, the quasibinomial assumption on the variance mean dependency coincides with the variancemeandependency under the betabinomial distribution [5,9]. Hence, methods are considered that are explicitly constructed for the betabinomial distribution, as well as methods that account for overdispersion under the quasibinomial assumption.
Confidence interval methods
Fiellertype intervals applied on the observed proportions (FN): Naively, the ttest for ratios, with a common variance estimator and the assumption of normal distributed residuals, may be used to test the above hypotheses, treating the observed proportions as the variable of interest,${y}_{ij}={\stackrel{\u2322}{\pi}}_{ij}$
. The corresponding Fiellertype confidence interval can be obtained by analytically inverting the ttest statistic for ratios [10]. The method assumes normal distribution and variance homogeneity for the observed proportions, which is clearly not the case in this application. This interval is referred to as FN.
Binomial MOVERRWilson method after summingup observed counts across replications (BM): It may be tempting to sum up the counts over experimental units within each treatment group, ${x}_{i}={\displaystyle \sum {}_{j=1}^{{j}_{i}}}{x}_{ij},{n}_{i}={\displaystyle \sum {}_{j=1}^{{j}_{i}}}{n}_{ij}$
and apply a confidence interval for risk ratios under the assumption of binomial distribution, i.e., ignoring possible extrabinomial variation. As a place holder for the many available options, here the MOVERR method proposed by [7] is used; this is computationally simple and was among the best methods in a recent comparative study under the binomial assumption by [11]. It is referred to as BM.
The two above methods are merely included here to illustrate the effects of either ignoring the meanvariancerelation and skewness implied by binomial distribution (FN) or the effect of applying binomial methods (BM) when data are indeed overdispersed binomial.
Asymptotic method on the logscale, accounting for overdispersion via residual variation (LOD): Among other methods, [3] investigate and recommend a method based on the delta method applied for the log risk ratios (called MR1 therein). It can be computed from:
${n}_{i}={\displaystyle \sum _{j=1}^{{j}_{i}}{n}_{ij}},{x}_{i}.={\displaystyle \sum _{j=1}^{{j}_{i}}{x}_{ij,}{\widehat{\pi}}_{i}}=\frac{{x}_{i.}+0.5}{{n}_{i.}+1},\widehat{p}=\frac{{\widehat{\pi}}_{1}}{{\widehat{\pi}}_{0}}$
,${\widehat{u}}_{i}=\left(\frac{{j}_{i}}{{j}_{i}1}\right){\displaystyle {\sum}_{j=1}^{{j}_{i}}\frac{{({x}_{ij}{\widehat{\pi}}_{i.}{n}_{ij})}^{2}}{{n}_{i}^{2}}}$
(2)
Where the variance is estimated from the residuals on the scale of the original observations. The interval is then given by:
$\widehat{p}\mathrm{exp}(\pm {z}_{1\frac{\alpha}{2}}\sqrt{{\displaystyle \sum _{i=0}^{1}\frac{{\widehat{u}}_{1}}{{\widehat{\pi}}_{i}^{2}}}})$
(3)
By pluggingin the observed residual variance per treatment group i, this method does not assume a particular meanvariance relation and accounts for overdispersion in a more general way. However, if the number of replications per treatment, Ji, is small, these variance estimates might be unstable. Zaihra and Paul [3] additionally consider a closely related method with a sandwichtype variance estimator, which performs worse in their simulation study, and is thus ignored here.
FiellerBaileytype interval accounting for overdispersion via residual variance (FOD): Using the estimators above, [3] follow the approach of [10] and [12] that accounts for the skewed distribution of the original Fieller statistic
$z=\frac{{\widehat{\pi}}_{1}p{\widehat{\pi}}_{0}}{\sqrt{{\widehat{v}}_{1}+{p}^{2}{\widehat{v}}_{0}}}$
By considering the solutions of a cubic equation, with
$A={\widehat{\pi}}_{0}^{{\scriptscriptstyle \raisebox{1ex}{$2$}\!\left/ \!\raisebox{1ex}{$3$}\right.}}{z}_{{\scriptscriptstyle \raisebox{1ex}{$\alpha $}\!\left/ \!\raisebox{1ex}{$2$}\right.}}^{2}\frac{{\widehat{v}}_{0}}{9{\widehat{\pi}}_{0}^{{\scriptscriptstyle \raisebox{1ex}{$4$}\!\left/ \!\raisebox{1ex}{$3$}\right.}}},B={({\widehat{\pi}}_{1}{\widehat{\pi}}_{0})}^{{\scriptscriptstyle \raisebox{1ex}{$1$}\!\left/ \!\raisebox{1ex}{$3$}\right.}},C={\widehat{\pi}}_{1}^{{\scriptscriptstyle \raisebox{1ex}{$2$}\!\left/ \!\raisebox{1ex}{$3$}\right.}}{z}_{{\scriptscriptstyle \raisebox{1ex}{$\alpha $}\!\left/ \!\raisebox{1ex}{$2$}\right.}}^{2}\frac{{\widehat{v}}_{1}}{9{\widehat{\pi}}_{1}^{{\scriptscriptstyle \raisebox{1ex}{$4$}\!\left/ \!\raisebox{1ex}{$3$}\right.}}}$
The interval (referred to as MR4 by [3]) can then be computed by
$\left[\mathrm{max}\left(\left(\frac{B\sqrt{{B}^{2}AC}}{A}\right),0\right);{\left(\frac{B\sqrt{{B}^{2}AC}}{A}\right)}^{3}\right]$
If A > 0 and B2  AC > 0. If these two restrictions are not met, the interval has unbounded or disjoint solutions which do not provide meaningful interpretations of ρ. In the simulation study below, the interval $\left[0,\infty \right]$
is returned in such cases [3]. Again consider a closely related method that uses sandwich estimator for the variance of ${\widehat{\pi}}_{i}$
instead. It is not considered here.
Delta method on the logscale under the betabinomial assumption (LBB): Lui et al. [2] propose methods that are constructed under the assumption of the betabinomial distribution. Theoretically, the variance of ${\widehat{\pi}}_{i}$
under this assumption is ${\widehat{\pi}}_{i}\left(1{\pi}_{i}\right){\varphi}^{BB}\left({n}_{i,{C}_{i}}\right)/{n}_{i}$
where ${\varphi}^{BB}\left({n}_{i,{C}_{i}}\right)$
is the betabinomial overdispersion factor, expressed as a function of the number under risk in each replication ij of treatment i, ${\varphi}^{BB}\left({n}_{i,{C}_{i}}\right)$
and of ci, the intraclass correlation coefficient. Under betabinomial sampling, the intraclass correlation depends on ${a}_{i}^{*}$
via ${c}_{i}=1/({a}_{1}^{*}+1)$
[2].
Lui et al. [2] estimate the intraclass correlation using
${\widehat{c}}_{i}=\frac{BM{S}_{i}WM{S}_{i}}{BM{S}_{i}+\left({n}_{1}^{*}1\right)WM{S}_{i}},with{n}_{1}^{*}=\frac{{n}_{i.}^{2}{n}_{ij}^{2}}{\left({J}_{i}1\right){n}_{i}}$
Based on the between and within mean squared error of the observations,
$BM{S}_{i}=\frac{{\displaystyle {\sum}_{j=1}^{{J}_{1}}\frac{{x}_{ij}^{2}}{{n}_{ij}}\frac{{x}_{i.}^{2}}{{n}_{i.}}}}{{J}_{i}1}$
and $WM{S}_{i}=\frac{{x}_{i.}{\displaystyle {\sum}_{j=1}^{{J}_{i}}\frac{{x}_{ij}^{2}}{{n}_{ij}}}}{{\displaystyle {\sum}_{j=1}^{{J}_{1}}({n}_{ij}1)}}$
This leads to estimators for the betabinomial overdispersion factor, and the related variance of the proportion estimator under the betabinomial assumption, for each treatment group i separately [2]:
${\widehat{\varphi}}^{BB}\left({n}_{i},{\widehat{c}}_{i}\right)={\displaystyle {\sum}_{j=1}^{{J}_{i}}{n}_{ij}}\left(1+\left({n}_{ij}1\right){\widehat{c}}_{i}\right)/{n}_{i.}$
(5)
${\widehat{v}}_{i}^{BB}=\frac{\left(1{\widehat{\pi}}_{i}\right){\widehat{\varphi}}^{B}{}^{B}\left({n}_{{i}_{1}}{\widehat{c}}_{i}\right)}{{n}_{i}{\widehat{\pi}}_{i}}$
(6)
The asymptotic interval relying on the delta method applied for the log risk ratio [2] can be constructed by:
$\widehat{p}\mathrm{exp}\left(\pm {Z}_{1}\alpha /2\sqrt{{\displaystyle {\sum}_{i=0}^{1}{\widehat{v}}_{i}^{BB}}}\right)$
(7)
FiellerBaileytype interval under betabinomial assumption: FBB: Lui et al. [2] consider a Fiellertype interval and its modification according to [12] under the betabinomial distribution:
$A={\widehat{\pi}}_{0}^{2/3}{z}_{\alpha /2}^{2}\left(1{\widehat{\pi}}_{0}\right){\varphi}^{BB}\left({n}_{0},{\widehat{c}}_{0}\right)/\left(9{n}_{0.}{\widehat{\pi}}_{0}^{1/3}\right),B={\left({\widehat{\pi}}_{1}{\widehat{\pi}}_{0}\right)}^{1/3}$
and
$C={\widehat{\pi}}_{1}^{2/3}{z}_{\alpha /2}^{2}\left(1{\widehat{\pi}}_{1}\right){\varphi}^{BB}\left({n}_{1},{\widehat{c}}_{1}\right)/\left(9{n}_{1.}{\widehat{\pi}}_{1}^{1/3}\right)$
As in Eq. (4), a meaningful interval can be calculated if A > 0 and B2  AC > 0:
$\left[\mathrm{max}\left({\left(\frac{B\sqrt{{B}^{2}AC}}{A}\right)}^{3},0\right);{\left(\frac{B\sqrt{{B}^{2}AC}}{A}\right)}^{3}\right]$
(8)
In the simulation study below, the interval $[0,\infty ]$
is returned if A > 0 and B2AC > 0. Following [2], in case of the extreme events ${\widehat{\pi}}_{i}=0$
or ${\widehat{\pi}}_{i}=1$
, is replaced by ${\widehat{\pi}}_{i}=\frac{{x}_{i.}+0.5}{{n}_{i}+1}$
in the LBB and FBB method and their subsequent modifications.
Modifications of LOD, FOD, LBB, FBB by pooling and restricting the variance estimates: Lui et al [2] state, based on theoretical considerations and in the context of estimation problems that the intraclass correlation ci and the overdispersion ${\varphi}_{BB}\left({n}_{i},{c}_{i}\right)$
cannot fall below 0 and 1, respectively. However, their estimates may fall below the boundaries imposed by the binomial assumption. For example, the event ${x}_{i1}={x}_{i2}=\mathrm{....}={x}_{i}{j}_{i}$
may lead to the unreasonable estimates ${c}_{i}<0$
and${\varphi}_{BB}\left({n}_{i},{c}_{i}\right)=0$
. Moreover, interest here is in small sample laboratory experiments and estimating ci and ${\varphi}_{BB}\left({n}_{i},{c}_{i}\right)$
separately for each treatment from very few replications Ji may result in over fitting. Then, the assumption of a common betabinomial overdispersion parameter ${\varphi}_{{}^{BB}}\left({n}_{i},{\widehat{c}}_{i}\right)$
may lead to a more stable estimation with small sample sizes. Therefore, the methods LBB and FBB as well as LOD an FOD are simulated with the following additional restrictions and pooling of variance estimates:
 LBB1, FBB1 refer to methods LBB, FBB with the betabinomial overdispersion factor restricted to be at least 1, i.e., using $max(1,{\varphi}_{{}^{BB}}\left({n}_{i},{\widehat{c}}_{i}\right))$
instead of ${\varphi}_{{}^{{}_{{}_{BB}}}}\left({n}_{i},{\widehat{c}}_{i}\right)$
in the equations (5,6)ff.
 LBBp, FBBp refer to methods LBB, FBB with the intraclass correlation estimator using pooled observations across the groups, ${\widehat{C}}_{.}=\frac{{\displaystyle {\sum}_{i=1}^{2}{c}_{i}}{n}_{i}}{{\displaystyle {\sum}_{i=1}^{2}{n}_{i}}}$
and ${\varphi}^{BB}\left({n}_{i},\widehat{c}.\right)$
, instead of ${\varphi}^{BB}\left({n}_{i},\widehat{c}.\right)$
in equations (5, 6)ff.
 LBB1p, FBB1p combine the two approaches by using the group wise variance estimators with the pooled intraclass correlation estimator and restriction of the over dispersion parameter to be$\ge 1$
, that is, using $max(1,{\varphi}^{BB}\left({n}_{i},{\widehat{c}}_{i}\right))$
instead of ${\varphi}^{BB}\left({n}_{i},\widehat{c}.\right)$
in equations (5,6)ff. This procedure has been already suggested in the example evaluation of [2].
 LOD1, FOD1: refer to methods LOD, FOD, but the group wise variance estimators are restricted to be greater than or equal to the binomial variance estimate: using ${\widehat{v}}_{i}=\mathrm{max}\left(\frac{{\widehat{\pi}}_{i}(1{\widehat{\pi}}_{i})}{{n}_{i.}},\left(\frac{{J}_{i}}{{J}_{i}1}\right){\displaystyle {\sum}_{j=1}^{{J}_{i}}\frac{{({x}_{ij}{\widehat{\pi}}_{i.}{n}_{ij})}^{2}}{{n}_{i}^{2}}}\right)$
in Eq.(2)
MOVERR for quasibinomial profile deviance intervals (QBM): An alternative option to obtain intervals for the risk ratio would be to fit a generalized linear model (GLM) under the quasibinomial assumption ${V}_{QB}\left({X}_{ij}\right)=\varphi {n}_{ij}{\pi}_{ij}\left(1{\pi}_{ij}\right)$
, using a loglink, ${n}_{ij}=\mathrm{log}\left({\pi}_{ij}\right),{n}_{ij}={\beta}_{i}$
. Then the risk ratio can be estimated via $\mathrm{exp}({\beta}_{1}{\beta}_{0})$
and intervals for the difference $({\beta}_{1}{\beta}_{0})$
can be computed by the signed root profile deviance method [6,5]. However, with the current userlevel implementations in R, fitting this model (glm, stats) and obtaining profiledeviance intervals (profile, confint, package MASS) suffers from numerical difficulties if at least one of the groups shows estimated success probabilities close or equal to 1 or equal to 0.
As a numerically stable workaround, the QBM method is proposed: A GLM with the quasibinomial assumption and logit link is fitted ${n}_{ij}=\mathrm{log}({\pi}_{ij}/(1{\pi}_{ij}))={\beta}_{i}$
. If the estimated dispersion parameter in the model fit falls below$1,{\widehat{\varphi}}^{(QB)}<1$
, a binomial model is used instead (i.e., assuming $\varphi =1$
). For the βi, (1α)signed root profile deviance intervals can be computed, with limits denoted $\left[{l}_{\widehat{\beta}0},{u}_{\widehat{\beta}0}\right],\left[{l}_{\widehat{\beta}1},{u}_{\widehat{\beta}1}\right]$
and estimates denoted ${\widehat{\beta}}_{i}$
. In R, these computations can be done in several packages, e.g. package MASS [5], or the addon package mcprofile [13]. Again, in extreme cases, the automatic search of values for the grid of parameter values for the deviance profile may fail in both packages. The signed root deviance is then computed over a prespecified grid of parameter values,
${\beta}_{i}^{*}=\left(10,9.5,\mathrm{...},5,4.8,4.6,\mathrm{....},4.8,5,5.5,\mathrm{....},10\right)$
with elements ${\beta}_{ik}^{*}$
, k = 1,…, K. For each i and each k,
${t}_{ik}=sign({\beta}_{ik}^{*}{\widehat{\beta}}_{i})\sqrt{\frac{d({\beta}_{ik}^{*},{\widehat{\beta}}_{i\text{'}})d({\beta}_{i}^{*},{\widehat{\beta}}_{i\text{'}})}{\widehat{\varphi}}}$
is computed, where $sign({\beta}_{ik}^{*}{\widehat{\beta}}_{i})$
retains the sign of the difference${\beta}_{ik}^{*}{\widehat{\beta}}_{i}$
, $d({\beta}_{ik}^{*}{\widehat{\beta}}_{i})$
is the deviance when replacing ${\widehat{\beta}}_{i}$
by ${\beta}_{ik}^{*}$
while leaving all other parameters at their ML estimates, ${\widehat{\beta}}_{i\text{'}}$
,$d({\beta}_{ik}^{*}{\widehat{\beta}}_{i})$
is the deviance at the ML estimates, and $\widehat{\varphi}$
is the dispersion estimate with all parameters at their ML estimates. For each parameter i, a cubic spline is fitted for tik depending on${\beta}_{ik}^{*}$
and the cut points of the spline with quantiles of the tdistribution, ${t}_{\alpha /2,df=dfr,}{t}_{1\alpha /2,df=dfr},dfr={\displaystyle {\sum}_{i=1}^{I}\left({J}_{i}1\right)}$
is determined by linear interpolation between fitted values. When the binomial model is used, tik is replaced by
${Z}_{ik}=sign\left(({\beta}_{ik}^{*}{\widehat{\beta}}_{i})\right)\sqrt{d({\beta}_{ik}^{*},{\widehat{\beta}}_{i\text{'}})d({\beta}_{i}^{*},{\widehat{\beta}}_{i\text{'}})}$
And the quantiles of the standard normal distribution${Z}_{\alpha /2},{Z}_{1\alpha /2}$
, are used instead.
The interval bounds and ML estimates are transformed to the proportion scale using the inverse link,$\left[{l}_{i}{u}_{i}\right]=\left[\frac{\mathrm{exp}\left({l}_{\widehat{\beta}i}\right)}{1+\mathrm{exp}\left({l}_{\widehat{\beta}i}\right)},\frac{\mathrm{exp}\left({u}_{\widehat{\beta}i}\right)}{1+\mathrm{exp}\left({u}_{\widehat{\beta}i}\right)}\right],{\widehat{\pi}}_{i}=\mathrm{exp}\left({\widehat{\beta}}_{i}\right)/\left(1+\mathrm{exp}\left({\widehat{\beta}}_{i}\right)\right)$
. These estimators and confidence limits are then used to compute intervals for ρ by the MOVERR method [7]. Eq. (9) of [7] is recalled in the following as:
$\left[\frac{{\widehat{\pi}}_{1}{\widehat{\pi}}_{0}\sqrt{{\left({\widehat{\pi}}_{1}{\widehat{\pi}}_{0}\right)}^{2}{l}_{1}{u}_{0}\left(2\widehat{\pi}{l}_{1}\right)}\left(2{\widehat{\pi}}_{0}{u}_{0}\right)}{{u}_{0}\left(2{\widehat{\pi}}_{0}{u}_{0}\right)};\frac{{\widehat{\pi}}_{1}{\widehat{\pi}}_{0}\sqrt{{\left({\widehat{\pi}}_{1}{\widehat{\pi}}_{0}\right)}^{2}{u}_{1}{l}_{0}\left(2\widehat{\pi}{u}_{1}\right)}\left(2{\widehat{\pi}}_{0}{l}_{0}\right)}{{l}_{0}\left(2{\widehat{\pi}}_{0}{l}_{0}\right)}\right]$
Like the FOD and the FBB method, this method may yield (partially) unbounded intervals, particularly when$\widehat{\pi}=0$
, the upper limit for the risk ratio is naturally$\infty $
.
Simulation study
The betabinomial distribution is chosen to simulate overdispersed data such that the resulting data are in line with the quasibinomial assumption, i.e., the assumption of the QBM method is met.
In the simulation, the number of experimental units per treatment group, Ji is chosen balanced, (J0, J1) = (3, 3), (5, 5) and (10, 10). The number of biological units under risk in each unit, nij , is chosen balanced nij = 10 or 20 for all i, j. Overdispersion is set at levels ${\varphi}^{BB}=1.25$
or ${\varphi}^{BB}=2$
, that is, for given nij , ${a}_{1}^{*}$
is chosen according to Section 4.2 to achieve the specified overdispersion: for each set of${\pi}_{0},{\pi}_{1}$
, ${a}_{i}={a}_{i}^{*}{\pi}_{i}$
and ${b}_{i}={a}_{i}^{*}\left(1{\pi}_{i}\right)$
in the beta distribution. The distribution of the counts xij, as well as that of the estimators of the proportions are skewed to different extent, depending on πi, especially if πi is close to the border of the parameter space. Thus, also the distribution of the estimator of ρ or log (ρ) can be skewed if π1 and π0 differ, particularly when they are close to the parameter space. Thus, the performance of large sample methods may severely depend on particular choices of π0, π1. To investigate these potential dependencies, the simulations have been run for a grid of all combinations of π0 = (0.02, 0.04, …, 0.96, 0.98) and π1 = (0.02, 0.04, …, 0.96, 0.98) that imply oddsratio between 0.1 and 10.
Simulations have been performed with 10000 runs for all methods except the QBM method. Due to high computation times, only 5000 simulation runs are used for QBM, such that the standard error of the estimated coverage probabilities for QBM is by factor higher than for the remaining methods. The simulation study has been performed in R 3.1.2 [14], the implementation of all confidence interval methods is available in the Rpackage pairwiseCI, version > 0.125 [15] FBB, LBB and related methods are implemented in the function Betabin.ratio, FOD, LOD and related methods are implemented in the function ODbin.ratio, and the QBM method is implemented in function Quasibin.ratio.
Figure 1: Boxplots of simulated coverage probabilities of nominal twosided 95% intervals, for different numbers of replications (no.rep) and two different levels of overdispersion.
Figure 2: Simulated coverage probabilities (color scale) of nominal twosided 95% intervals over a grid of true proportions π0, π1, for twofold overdispersion (Ï• = 2) and nij = 10 biological units in each experimental unit.
Figure 3: Simulated coverage probabilities (color scale) of nominal twosided 95% intervals over a grid of true proportions π0, π1, for moderate overdispersion (Ï• = 1.25) and nij = 20 biological units in each experimental unit.
Figure 4: Asymmetry of noncoverage: color scale shows the proportion of cases where the true parameter is excluded by the lower limit all cases where the parameter is excluded by nominal twosided 95% intervals over a grid of true prop proportions π0, π1, for clear overdispersion (Ï• = 2) and nij = 10 biological units in each experimental unit.
Figure 5: Boxplots and observed proportions of surviving fathead minnow larvae per tank, for the untreated control group and 5 concentrations (left), and 95% confidence intervals of the proportions of surviving larvae in the treatment groups relative to the control group (right).
Results and Discussion
Coverage probability of twosided 95% confidence intervals
Figure 1 shows the simulated coverage probabilities of all 13 methods under comparison: the NF method can either show too low or too high coverage probability, irrespective of increasing sample size, because the relation of mean and variance in the binomial data is ignored. However, its average coverage probability is closer to the nominal level than some of those methods which explicitly account for over dispersed binomial data (LBB, FBB, LBBp, FBBp, LOD and FOD). Simply ignoring the possibility of overdispersion and assuming the binomial distribution results in too low coverage probability even for moderate (1.25fold) overdispersion: BM method has too low coverage probabilities in nearly all settings. For very small numbers of replications (Ji=3, 5), the far too low coverage probability of the asymptotic methods for overdispersed binomial data (LBB, FBB, LOD, FBB) can be improved slightly by using a pooled variance estimator (LBBp, FBBp) and can be largely improved by setting a lower limit to their variance estimators: If we replace variance estimates suggesting under dispersion by the corresponding estimates under the binomial assumptions, the coverage probabilities of these methods are much closer to the nominal level. The QBM method is always very close to nominal coverage probability but can have slightly too high average coverage probability when overdispersion is moderate and the number of replications is small.
Figures 2 and Figure 3 shows a detailed view on the coverage probabilities in dependence on the true underlying proportions, π0 and π1. This detailed view is restricted to those 6 methods which have an average coverage probability close to 0.95. Figure 2 shows the more difficult case with substantial overdispersion (Ï• = 2), and only nij = 10 biological units in each replication. The QBM methods is close to the nominal confidence levels for a wide range of proportions, but can have too low coverage probabilities if at least one of the proportions is close to 0 or 1. The LBB1p and LOD1 methods need more replications to have coverage probabilities close to 0.95 for a similar range of πi, and still are slightly liberal for almost all πi. LBB1p and LOD1 booth have too high coverage probability for πi close to 0 and too low coverage probabilities when πi close to 1. That is test decisions based on these two methods may be conservative if hypotheses are formulated in terms of mortalities which should be low in the control group, but will be liberal when a similar hypothesis is formulated in terms of the proportion of survivors. In this simulation setting, the FiellerBaileytype intervals FBB1p and FOD1 have lower coverage probabilities for almost all parameter combinations considered as compared to the LBB1p and LOD1 method, respectively.
Figure 3 shows results for the less problematic case of moderate over dispersion (Ï• = 1.25) and nij = 20 biological units in each experimental unit. The coverage probability of QBM rarely falls below 0.94, but is slightly too large (between 0.96 and 0.97) if there are only 3 or 5 experimental units per treatment. The LBB1p and FOD1 method have again slightly too low coverage probability if any πi is close to 1, and slightly too high coverage probability if any πi is close to 0. The two FiellerBaileytype intervals have slightly lower coverage probabilities than their counterparts based on the logdeltamethod.
Symmetry of noncoverage
Figure 4 shows the simulated proportion of cases, where the true parameter was excluded by the lower bound, relative to all cases where the parameter was excluded by the interval. For valid onesided decisions, methods are preferable that exclude the true parameter equally likely by the lower and upper bound, i.e., with probability α/2 for each limit. For brevity, only the challenging setting with nij = 10 and marked over dispersion (Ï• = 2) is shown, while conclusions for the remaining simulation settings are similar: The FiellerBaileytype methods (FBB1p and FOD1) show a wider range of parameter settings where probability of parameter exclusion is equal between lower and upper bounds, as compared to the corresponding methods based on the delta methods on the log scale, LBB1p and LOD1. The QBM method shows asymmetric noncoverage for similar parameter settings as do the FiellerBaileytype intervals, i.e. when any of the true proportions is close to 0, but is clearly more symmetric than the FBB1p and FOD1 for a wide range of parameter settings where at least one πi is close to 1.
Examples
Extreme cases: Table 1 shows four extreme cases: Cases 1 and 2 represent cases where proportions are close to 0 in the control treatment as could result from testing the ratio of mortality or immobility proportions. Cases 3 and 4 show data that could arise from test systems that assume proportions close to 1 in the control, for example, survival proportions as in the fathead minnow data below.

Case 1 
Case 2 
Case 3 
Case 4 
Method 
x0j = (0,0,0,2)
x1j = (1,2,5,6)
=0.35/0.05=7 
x0j = (0,0,1,1)
x1j = (1,2,2,4)
=0.225/0.05=4.5 
x0j = (8,10,10,10)
x1j = (6,7,9,9)
=0.775/0.95=0.816 
x0j=(10,10,10,10)
x1j = (9,9,9,10)
=0.925/1=0.925 
BM 
(1.909, 26.2) 
(1.15, 17.5) 
(0.656, 0.975) 
(0.801, 1.028) 
FBB 
(1.331, 151.9) 
(1.45, 19.8) 
(0.657, 1.005) 
(0.879, 0.997) 
FOD 
(1.269, 182.6) 
(1.41, 21.0) 
(0.654, 1.009) 
(0.885, 0.990) 
FBB1p 
(1.290, 190.0) 
(1.24, 28.9) 
(0.643, 1.022) 
(0.851, 1.028) 
FOD1 
(1.269, 182.6) 
(1.14, 36.8) 
(0.654, 1.009) 
(0.846, 1.033) 
QBM 
(0.902, 574.1) 
(1.24, 27.6) 
(0.558, 1.083) 
(0.817, 0.998) 
Table 1: 95% confidence intervals of selected methods for four extreme cases, assuming all nij = 10, and Ji = 4for two treatment groups, i = 0, 1.
In cases 1 and 3, data lead to variance estimates exceeding that of the binomial distribution. Then, the BM method leads to shorter intervals than all other methods. In both cases, the QBM as at least slightly wider confidence intervals than the FBB1p and FOD1 method, which might correspond to the observation that these two have too low coverage probability for small samples. In cases 2 and 4, data show a variance below that of the binomial variance. Then the methods without restriction of variance estimates to that assumed by the binomial distribution (FOD and FBB) yield considerably shorter intervals, than methods which assume that under dispersion is implausible like BM, FBB1p, FOD1 and QBM.
Fathead minnow data: The toxicity of a compound to fathead minnow larvae was investigated using an untreated control group and 5 concentrations of a compound [16]. The experiment comprised 24 tanks, 4 tanks in each treatment group, each tank contained 10 larvae. The observed proportions of surviving larvae are shown in (Figure 5, left side). Analyzing the data in a generalized linear model with quasibinomial assumption, logit link shows that there are highly significant differences between the mean proportion of surviving larvae between the treatments (p < 0.0001; Ftest in analysis of deviance). An estimated dispersion parameter of 1.082 suggests that the observations are at most slightly overdispersed, i.e., the data are at least roughly in line with the binomial assumption. The right side of Figure 5 shows twosided 95% confidence intervals for ratios of the proportion of surviving larvae in the treatment groups relative to that in concentration 0. In this case, confidence limits based on the quasibinomial assumption (QBM) based on the full data including all 6 treatment groups and confidence intervals under the binomial assumption (MOVERR method for WilsonScore intervals, BM) do hardly differ.
Conclusion
Asymptotic methods based on the delta method applied on the logscale or FiellerBailey type intervals have too low coverage probabilities when applied in small sample settings that are typical for many bioassays, i.e., they cover the true ratio of proportions less often than claimed by their nominal confidence level. Violation of the nominal level is most severe for small numbers of replications, low number of biological units in each replications and extreme proportions. Even for as much as 10 replications (i.e., clusters), coverage probabilities are considerably below the nominal level for wide ranges of proportions. When these intervals are then used for decisions in hypothesis tests for equivalence or noninferiority, erroneous conclusions of equivalence or noninferiority will occur more often than claimed by the nominal level α of such tests.
For the small number of replications and the small number of biological units per replication that are typical for some bioassays, restricting the variance estimates to that of the binomial variance (i.e., setting the dispersion parameter to 1 if under dispersion is estimated) leads to major improvements of the coverage probabilities. Further improvements can be achieved by combining the MOVERR method with a profile deviance approach leads to intervals with better coverage probabilities for low sample sizes. However, this approach also shows too low coverage probabilities for small sample sizes and cases where one proportion is very close to 0 or 1. However, the simulation results shown here rely on the simplifying assumption, that there is a common overdispersion factor for the treatments in the experiments. Based on single data sets with few replications per treatment it will be hard to assess whether this assumption is appropriate, or whether different overdispersion factors per treatment group (as are used in the FBB, LBB method, for example) would be more appropriate. For given, highly standardized bioassay, however, available collections of historical data sets could be used to assess the plausibility of the different assumptions concerning distribution and meanvariance dependency and homogeneity or heterogeneity of overdispersion factors among treatment groups.
Bioassays usually involve several dosages. Depending on the global hypotheses to be tested, adjustments for multiple comparisons may be needed, see, e.g. [17]. The methods for confidence intervals discussed here can be extended to construct approximate simultaneous confidence intervals, using approaches as described in [1720]. However, such extensions require additional investigation as some approaches involve additional approximations. This is subject to further research.
Acknowledgement
I thank Hothorn et al. [17] for raising the topic of the paper, providing examples and reading an earlier version of these ms.
References
 Denton DL, Diamond J, Zheng L (2011) Test of significant toxicity: a statistical application for assessing whether an effluent or site water is truly toxic. Environ Toxicol Chem 30(5): 11171126.
 Lui KL, Mayer JA, Eckhardt L (2000) Confidence intervals for the risk ratio under cluster sampling based on the betabinomial model. Stat Med 19(21): 29332942.
 Zaihra T, Paul S (2010) Interval Estimation of Some Epidemiological Measures of Association. Int J Biostat 6(1): 35.
 Hauck WW, Donner A (1977) Wald's Test as Applied to Hypotheses in Logit Analysis. J Amer Statist Assoc 72(360a): 851853.
 Venables WN, Ripley BD (2002) Modern applied statistics with S. (4th edn), SpringerVerlag. Inc., New York, USA, pp.197210, 220222.
 Chen JS, Jennrich RI (1996) The Signed Root Deviance Profile and Confidence Intervals in Maximum Likelihood Analysis. J Amer Statist Assoc 91(435): 993998.
 Donner A, Zou GY (2012) Closedform confidence intervals for functions of the normal mean and standard deviation. Stat Methods Med Res 21(4): 347359.
 McCullagh P, Nelder JA (1989) Generalized linear models. (2nd edn), Chapman & Hall/CRC, USA, pp. 124135.
 Johnson NL, Kotz S, Kemp AW (1993) Univariate discrete distributions. (2nd edn), John Wiley & Sons, New York, USA, pp. 239284.
 Fieller EC (1954) Some problems in interval estimation. J Roy Statist Soc Ser BStat Methodol 16(2): 175185.
 Fagerland MW, Newcombe RG (2013) Confidence intervals for odds ratio and relative risk based on the inverse hyperbolic sine transformation. Stat Med 32(16): 28232836.
 Bailey BJR (1987) Confidence limits to the risk ratio. Biometrics 43(1): 201205.
 Gerhard D (2013) mcprofile: Multiple Contrast Profiles. R package version 0.15.
 R Core Team (2014) R: A language and environment for statistical computing. Vienna, Austria.
 Schaarschmidt F, Gerhard D (2015) pairwiseCI: Confidence Intervals for Two Sample Comparisons. R package version 0.125.
 Anonymous (2002) Methods for Measuring the Acute Toxicity of Effluents and Receiving Waters to Freshwater and Marine Organisms. (5th edn), U.S. Environmental Protection Agency, Washington, DC, USA.
 Hothorn LA, Hasler M (2008) Proof of hazard and proof of safety in toxicological studies using simultaneous confidence intervals for differences and ratios to control. J Biopharm Statist 18(5): 915933.
 Hothorn T, Bretz F, Westfall P (2008) Simultaneous Inference in General Parametric Models. Biom J 50(3): 346363.
 Lauzon C, Caffo B (2009) Easy Multiplicity Control in Equivalence Testing Using Two Onesided Tests. Am Stat 63(2): 147154.
 Dilba G, Bretz F, Guiard V (2006) Simultaneous confidence sets and confidence intervals for multiple ratios. J Statist Plann Inference 136(8): 26402658.

