Review Article
Volume 5 Issue 1 - 2017
Predictive Influence of Variables on the Odds Ratio and in the Logistic Model
S K Bhattacharjee^{1}, Atanu Biswas^{2}, Ganesh Dutta^{3}, S Rao Jammalamadaka^{4}* and M Masoom Ali^{5}
^{1}Indian Statistical Institute, North-East Centre, Tezpur, Assam-784028, India
^{2}Indian Statistical Institute, India
^{3}Basanti Devi College, India
^{4}Department of Statistics and Applied Probability, University of California, USA
^{5}Department of Mathematical Sciences, Ball State University, USA
Received: October 01, 2016 | Published: February 01, 2017
*Corresponding author:
S Rao Jammalamadaka, Department of Statistics and Applied Probability, University of California, USA, Email:
Citation:
Bhattacharjee SK, Biswas A, Dutta G, Jammalamadaka SR, Ali MM (2017) Predictive Influence of Variables on the Odds Ratio and in the Logistic Model. Biom Biostat Int J 5(1): 00125. DOI:
10.15406/bbij.2017.05.00125
Abstract
We study the influence of explanatory variables in prediction by looking at the distribution of the log-odds ratio. We also consider the predictive influence of a subset of unobserved future variables on the distribution of log-odds ratio as well as in a logistic model, via the Bayesian predictive density of a future observation. This problem is considered for dichotomous, as well as continuous explanatory variables.
AMS Subject Classification: Primary 62J12, Secondary 62B10, 62F15
Keywords: Predictive density/probability; Log-odds ratio; Logistic model; Predictive in u-ence; Missing/unobserved variable; Kullback-Leibler divergence
Introduction
Odds ratio (OR) is perhaps the most popular measure of treatment difference for binary outcomes and is extensively used in dealing with 2 2 tables in biomedical studies and clinical trials. The distribution of the log of sample OR is often approximated by a normal distribution with true log OR as the mean and with variance estimated by the sum of the reciprocal of the four cell frequencies in the 2 2 table Breslow [1]. Bohning et al. [2] provide detailed book-length discussion on the OR. For logistic regression, ORs enable one to examine the e ect of explanatory variables in that relationship.
Logistic link is perhaps the most popular way to model the success probabilities of a binary variable. Pregibon [3], Cook and Weisberg [4] and Johnson [5] have considered the problem of the influence of observations for logistic regression models. Several measures have been suggested to identify observations in the data set which are influential relative to the estimation of the vector of regression coefficients, the deviance, and the determination of predictive probabilities and the classification of future observations.
Bhattacharjee & Dunsmore [6] considered the effect on the predictive probability of a future observation of the omission of subsets of the explanatory variables. Mercier et al. [7] used logistic regression to determine whether age and/or gender were a factor influencing severity of injuries suffered in head-on automobile collisions on rural highways. Zellner et al. [8] considered the problem of variable selection in logistic regression to compare the performance of stepwise selection procedures with a bagging method.
In the present paper, our aim is to measure the predictive influence of a subset of explanatory variables in log-odds ratio of a logistic model using a Bayesian approach. We are also interested in studying the effect of missing future explanatory variables on Bayes prediction, on a logistic model as well as on the log-odds ratio.
In Section 2, we derive the predictive densities of a future log-odds ratio for both the full model and a subset deleted model. We derive the predictive density of log-odds ratio in Section 3, when a subset of future explanatory variables is missing. To derive the predictive densities we assume that the future explanatory variables xf are distributed as multivariate normal, both when these xf's are independent or dependent. In Section 4, we discuss the influence of future missing explanatory variables by considering the predictive probability of a future response in a logistic model. This is done by assuming that the future explanatory variables xf are multivariate normal for the continuous case. Also considered is the dichotomous case. Since the predictive probabilities are not mathematically tractable for the logistic model, we use several approximations.
In Section 2 and 3 we employ Kullback-Leibler [9] directed measure of divergence DKL to assess the influence of variables and also the influence of future missing variables on the log-odds ratio. The form of the Kullback-Leibler [9] measure used here is given by
${D}_{KL}=\int f\left(a\text{'}{W}^{f}|.\right)\mathrm{log}\left(\frac{f\left(a\text{'}{W}^{f}|.\right)}{{f}_{{}_{\left(r+s\right)}\left(a\text{'}\left|{W}^{f}\right|.\right)}}\right)d\left(a\text{'}{W}^{f}\right).$
To assess the influence of missing future variables or to measure the predictive probability in a logistic model we use the absolute difference of the two predictive probabilities.
Influence of variables in Log-odds Ratio
Consider a phase III clinical trial with two competing treatments, say A and B, having binary responses. Suppose
$n$
patients are randomly allocated with
${n}_{A}$
and
${n}_{B}$
patients to treatments A and B respectively. The patient responses are influenced by a covariate vector
${x}^{p\times 1}$
where one component of
$x$
may be 1 (which covers the constant term). Let (
${Y}_{i}$
;
${Z}_{i}$
;
${x}_{i}$
) be the data corresponding to its patient, where Yi is the indicator of response (
${Y}_{i}$
=1 or 0 for a success or failure),
${Z}_{i}$
is the indicator of the treatment assignment (
${Z}_{i}=1$
or 0 according as treatment A or B is applied to the its patient), and
$x$
is the covariate vector. We assume a logit model for the responses:
$\mathrm{Pr}\left({Y}_{i}=1|{Z}_{i},{x}_{i}\right)=\frac{\mathrm{exp}\left(\Delta {Z}_{i}+{x}_{i}\beta \right)}{1+\mathrm{exp}\left(\Delta {Z}_{i}+{x}_{i}\beta \right)}$
$i=1,2,\mathrm{....},n.$
$A$
$B$
(i)
Then the odds for treatments A and B with covariate vector xi are respectively
${O}_{A}=\frac{\mathrm{Pr}\left({Y}_{i}=1|{Z}_{i}=1,{x}_{i}\right)}{\mathrm{Pr}\left({Y}_{i}=0|{Z}_{i}=1,{x}_{i}\right)}=\mathrm{exp}\left(\Delta +{x}_{i}\beta \right)$
,
${O}_{B}=\frac{\mathrm{Pr}\left({Y}_{i}=1|{Z}_{i}=0,{x}_{i}\right)}{\mathrm{Pr}\left({Y}_{i}=0|{Z}_{i}=0,{x}_{i}\right)}=\mathrm{exp}\left({x}_{i}\beta \right)$
and hence the log-odds ratio is
$\mathrm{log}OR=\frac{\mathrm{log}{O}_{A}}{\mathrm{log}{O}_{B}}=\Delta $
Let us partition
$x\beta ={x}_{A}{\beta}_{A}+{x}_{B}{\beta}_{B}+{x}_{AB}{\beta}_{AB}$
Where
${x}_{A}$
indicates the variables used in treatment A only,
${x}_{B}$
is for treatment B only, and
${{\displaystyle x}}_{AB}$
is for both treatments A and B. Then the model can be partitioned for treatments A and B as:
$\mathrm{log}{O}_{A}=u=\Delta +{x}_{A}{x}_{B}+{x}_{AB}{\beta}_{AB}={x}_{\left(A\right)}{\beta}_{\left(A\right)}$
(ii)
$\mathrm{log}{O}_{B}=v={x}_{A}{x}_{B}+{x}_{AB}{\beta}_{AB}={x}_{\left(B\right)}{\beta}_{\left(B\right)}$
(iii)
The predictive density of future log-odds for A,
${u}^{f}$
, for non-informative prior (vague prior) with normal or any spherical symmetric errors is of Student form Jammalamadaka et al. [10] and is given by
$f\left({u}^{f}|{x}_{\left(A\right)}^{f},data\right)\equiv St\left(n-k,{x}_{\left(A\right)}^{f}{\widehat{\beta}}_{\left(A\right)},{s}_{\left(A\right)}^{2}\left(1+{x}_{\left(A\right)}^{f\text{'}}{\left(x{\text{'}}_{\left(A\right)}\right)}^{-1}{x}_{\left(A\right)}^{f}\right)\right)$
where
${\widehat{\beta}}_{\left(A\right)}$
is the MLE of
${\beta}_{\left(A\right)}$
,
${s}_{\left(A\right)}^{2}$
is the MLE of
${A}^{2}$
and k is the number of parameters in the model (ii). See Bhattacharjee et al. [11] in this context. If the sample size is large then this predictive density can be well approximated by its asymptotic normal form
$N\left({x}_{\left(A\right)}^{f}\widehat{\beta}\left(A\right),{s}_{\left(A\right)}^{2}\left(1+{x}_{\left(A\right)}^{f\text{'}}{\left({x}_{\left(A\right)}^{\text{'}}{x}_{\left(A\right)}\right)}^{-1}{x}_{\left(A\right)}^{f}\right)\left(n-k\right)/\left(n-k-2\right)\right)$
Similarly one can find the same for treatment B,
${v}^{f}$
.
Let us de ne
${w}^{f}={\left({u}^{f},{v}^{f}\right)}^{\text{'}}$
and
$a={\left(1,-1\right)}^{\text{'}}$
: Then the predictive density of future log odds ratio
${a}^{\text{'}}{w}^{f}$
is given by
$f\left({a}^{\text{'}}{w}^{f}|{x}_{\left(A\right)}^{f},{x}_{\left(B\right)}^{f},data\right)\approx N\left(\theta ,{\delta}^{2}\right)$
(iv)
Where
$\theta ={x}_{\left(A\right)}^{f}{\widehat{\beta}}_{\left(A\right)}-{x}_{\left(B\right)}^{f}{\widehat{\beta}}_{\left(B\right)}$
and
${\delta}^{2}={s}_{\left(A\right)}^{2}\left(1+{x}_{\left(A\right)}^{f\text{'}}{\left({x}_{\left(A\right)}^{\text{'}}{x}_{\left(A\right)}\right)}^{-1}{x}_{\left(A\right)}^{f}\right)\left(n-k\right)/\left(n-k-2\right)+{s}_{\left(B\right)}^{2}\left(\left(1+{x}_{\left(B\right)}^{f\text{'}}{\left({x}_{\left(B\right)}^{\text{'}}{x}_{\left(B\right)}\right)}^{-1}{x}_{\left(B\right)}^{f}\right)\left(n-q\right)/\left(n-q-2\right)\right)$
Our interest is to measure the influence of explanatory variables in the predictive density (iv) for the following cases:
Case 1: Influence of
$r$
explanatory variables
${x}_{A}^{r}$
of
${x}_{A}$
in treatment A.
Case 2: Influence of
$r$
explanatory variables
${x}_{B}^{r}$
of
${x}_{B}$
in treatment B.
Case 3: Influence of
$s$
explanatory variables
${x}_{AB}^{s}$
of
${{\displaystyle x}}_{AB}$
in treatment A.
Case 2: Influence of
$r$
explanatory variables
${x}_{B}^{r}$
of
${x}_{B}$
in treatment B.
Case 3: Influence of
$s$
explanatory variables
${x}_{AB}^{s}$
of
${{\displaystyle x}}_{AB}$
in treatment A.
Case 4: Influence of
$s$
explanatory variables
${x}_{AB}^{s}$
of
${{\displaystyle x}}_{AB}$
in treatment B.
Case 5: Joint influence of
$r$
explanatory variables
${x}_{A}^{r}$
of
${x}_{A}$
and s explanatory variables
${x}_{AB}^{s}$
of
${{\displaystyle x}}_{AB}$
in treatment A.
Case 6: Joint influence of r explanatory variables
${x}_{B}^{r}$
of
${x}_{B}$
and s explanatory variables
${x}_{AB}^{s}$
of
${{\displaystyle x}}_{AB}$
in treatment B.
To see the influence of explanatory variables in log-odds ratio, we construct a reduced log-odds model deleting a subset of explanatory variables. Then we derive the predictive density of future log-odds ratio for reduced model and compare it with the predictive density (iv) for full model. It is enough to consider Case 5 for illustration. We construct the reduced model by deleting variables
${x}_{A}^{r}$
of
${x}_{A}$
and
${x}_{AB}^{s}$
of
${{\displaystyle x}}_{AB}$
in (ii) as
$u=\Delta +{x}_{A}^{*}{\beta}_{A}^{*}+{x}_{A}^{*}{|}_{B}{\beta}_{AB}^{*}={x}_{\left(A\right)}^{*}{\beta}_{\left(A\right)}^{*}$
Then the predictive density of
${u}^{f}$
is given by
$f\left({u}^{f}|{x}_{\left(A\right)}^{*f},data\right)=St\left(n-k+r+s,{x}_{\left(A\right)}^{*f}{\widehat{\beta}}_{\left(A\right)}^{*},{S}_{\left(A\right)}^{*2}\left(1+{x}_{\left(A\right)}^{*f\text{'}}{\left({x}_{\left(A\right)}^{*\text{'}}{x}_{\left(A\right)}^{*}\right)}^{-1}{x}_{\left(A\right)}^{*f}\right)\right)$
The normal approximation of the predictive density is
$N\left({x}_{\left(A\right)}^{*f}{\widehat{\beta}}_{\left(A\right)}^{*},{s}_{\left(A\right)}^{*2}|\left(1+{x}_{\left(A\right)}^{*f\text{'}}{\left({x}_{\left(A\right)}^{*\text{'}}{x}_{\left(A\right)}^{*}\right)}^{-1}{x}_{\left(A\right)}^{*f}\left(n-k+r+s\right)/\left(n-k+r+s-2\right)\right)\right)$
Since no variable is missing in
$\upsilon =\mathrm{log}{O}_{B}$
, the predictive density of
${\upsilon}^{f}$
is unaltered along with its normal approximation. Hence the predictive density of log-odds ratio
${a}^{\text{'}}{w}^{f}$
under Case 5 is given by
${f}_{\left(r+s\right)}\left({a}^{\text{'}}{\omega}^{f}|{x}_{\left(A\right)}^{*f},{x}_{\left(B\right)}^{f},data\right)\approx N\left({\theta}^{*},{\delta}^{*2}\right)$
(v)
Where
${\theta}^{*}={x}_{\left(A\right)}^{*f}{\widehat{\beta}}^{*}{}_{\left(A\right)}-{x}_{\left(B\right)}^{f}{\widehat{\beta}}_{\left(B\right)}$
and
${\delta}^{*2}={s}_{\left(A\right)}^{*2}\left(1+{x}_{\left(A\right)}^{*f\text{'}}{\left({x}_{\left(A\right)}^{*\text{'}}\right)}^{-1}{x}_{\left(A\right)}^{*f}\right)\left(n-k+r+s\right)/\left(n-k+r+s-2\right)+{s}_{\left(B\right)}^{2}\left(1+{x}_{\left(B\right)}^{f\text{'}}{\left({x}_{\left(B\right)}^{\text{'}}{x}_{\left(B\right)}\right)}^{-1}{x}_{\left(B\right)}^{f}\right)\left(n-q\right)/\left(n-q-2\right)$
To access the influence of the deleted variables we employ the Kullback-Leibler [9] directed measure of divergence
${D}_{KL}$
between the predictive densities of
${a}^{\text{'}}{w}^{f}$
for full model (iv) and reduced model (v). The form of K-L measure used here is given by
${D}_{KL}={\displaystyle \int {f}_{\left(r+s\right)}}\left(a\text{'}{\omega}^{f}|.\right)\mathrm{log}\left(\frac{{f}_{\left(r+s\right)}\left(a\text{'}{\omega}^{f}|.\right)}{f\left(a\text{'}{\omega}^{f}|.\right)}\right)d{a}^{\text{'}}{\omega}^{f}$
The discrepancy measure
${D}_{KL}$
between the predictive densities (iv) and (v) reduces to
${D}_{KL}=\frac{{\left(\theta -\theta *\right)}^{2}}{2{\delta}^{2}}+\frac{1}{2}\left(\frac{{\delta}^{*2}}{{\delta}^{2}}-\mathrm{log}\left(\frac{{\delta}^{*2}}{{\delta}^{2}}\right)-1\right)$
Here
$L=\frac{{\left(\theta -{\theta}^{*}\right)}^{2}}{2{\delta}^{2}}$
is due to difference of location parameters and
$S=\frac{1}{2}\left(\frac{{\delta}^{*2}}{{\delta}^{2}}-\mathrm{log}\left(\frac{{\delta}^{*2}}{{\delta}^{2}}\right)-1\right)$
due to difference of scale parameters of the two predictive densities (iv) and (v).
Example 1: Here we have considered a u shot Data Pregibon [3]. A local health clinic sent fliers to its clients to encourage everyone, but especially older persons at high risk of complications, to get a u shot for protection against an expected u epidemic. In a pilot follow-up study, 159 clients were randomly selected and asked whether they actually received a u shot. A client who received a u shot was coded Y=1; and a client who did not receive a u shot was coded Y=0. In addition, data were collected on their age
$\left({x}_{1}\right)$
and their health awareness
$\left({x}_{2}\right)$
. Also included in the data were client gender
$\left({x}_{3}\right)$
, with males coded
${x}_{3}=1$
and females coded
${x}_{3}=0$
. Here we have divided whole data set into two groups A and B on the basis of gender that is group A corresponds to the male and group B corresponds to the female. We have computed
${D}_{KL}$
to measure the influence of the deleted variable
${x}_{1}$
in group A and B separately and the discrepancies are drawn in Figure
- Similar gure can be obtained by deleting
${x}_{2}$
. From this gure the discrepancy is less around the mean of the deleted variable.
Example 2: This is a simulation exercise. Here we have drawn sample of size 159 from bivariate normal distribution and we have used means, variances and correlation coefficient of
${x}_{1}$
and
${x}_{2}$
of the above u shot data of size 159 for generating the sample. Now using these
${x}_{1}$
and
${x}_{2}$
, we got response that is Y values and thereafter using this whole generated data set we have computed
${D}_{KL}$
. Now we have repeated whole process 1000 times and computed means of
${D}_{KL}s$
. The mean discrepancies are shown in Figure 2. Here we get the same conclusion as in the data example.
Influence of Missing Future Explanatory Variables in Log-Odds Ratio
Here the aim is to detect the predictive influence of a set of missing future explanatory variables in log-odds ratio of logistic model (i). Our interest is to detect the influence of missing future explanatory variables in the six cases pointed out in Section 2. Let in treatment A, r future variables missing from
${x}_{A}^{f}$
and s future variables missing from
${x}_{AB}^{f}$
be denoted by
${x}_{\left(A\right)}^{\left(r+|s\right)f}$
. Similarly in treatment B, r future missing variables from
${x}_{B}^{f}$
and s future variables missing from
${x}_{AB}^{f}$
be denoted by
${x}_{\left(B\right)}^{\left(r+|s\right)f}$
. We assume that the errors of models (ii) and (iii) are normally distributed with zero means and variances
${\tau}_{\left(A\right)}^{-1}$
and
${\tau}_{\left(B\right)}^{-1}$
, respectively. We also assume that the conditional density of
${x}_{\left(A\right)}^{\left(r+|s\right)f}$
given
${x}_{\left(A\right)}^{*f}$
is independent of
${\beta}_{\left(A\right)}$
and
${\tau}_{\left(A\right)}$
and
${x}_{\left(B\right)}^{\left(r+|s\right)f}$
given
${x}_{\left(B\right)}^{*f}$
is independent of
${\beta}_{\left(B\right)}$
and
${\tau}_{\left(B\right)}$
, i.e.,
$f\left({x}_{(.)}^{\left(r+s\right)f}|{x}_{(.)}^{*f},{\beta}_{(.)},{\tau}_{(.)}\right)=f\left({x}_{(.)}^{\left(r+s\right)f}|{x}_{(.)}^{*f}\right)$
where
${x}_{(.)}^{*f}$
denotes the future explanatory variables
${x}_{(.)}^{f}$
without
${x}_{(.)}^{\left(r+s\right)f}$
.
Explanatory variables are continuous
We assume that
${x}_{i}^{f}$
's are dependent and the distribution of
${x}_{\left(A\right)}^{f}$
is
$\left(k-1\right)$
-dimensional multivariate normal, i.e.
$f\left({x}_{\left(A\right)}^{f}\right)\equiv {N}_{k-1}\left(\eta ,\psi \right)$
.
The conditional density of
${x}_{\left(A\right)}^{\left(r+|s\right)f}$
given
${x}_{\left(A\right)}^{*f}$
is given by
$f\left({x}_{\left(A\right)}^{\left(r+s\right)f}|{x}_{\left(A\right)}^{*f}\right)\equiv {N}_{r+s}\left({\eta}_{\left(r+s\right)}^{*},{\psi}_{\left(r+s\right)}^{*}\right)$
,
Where
$\eta =\left({\eta}^{*},{\eta}_{r+s}\right),{x}_{\left(A\right)}^{f}=\left({x}_{\left(A\right)}^{*f},{x}_{\left(A\right)}^{\left(r+s\right)f}\right),\psi =\left(\begin{array}{l}{\psi}_{11}{\psi}_{12}\\ {\psi}_{21}{\psi}_{22}\end{array}\right),{\eta}_{r+s}^{*}={\eta}_{r+s}+{\psi}_{21}{\psi}_{11}^{-1}\left({x}_{\left(A\right)}^{*f}-{\eta}^{*}\right)$
and
${\psi}_{\left(r+s\right)}^{*}={\psi}_{22}-{\psi}_{21}{\psi}_{11}^{-1}{\psi}_{12}$
As earlier it is enough to consider Case 5 to see the joint influence of r missing future ex-planatory variables
${x}_{A}^{rf}$
of
${x}_{A}^{f}$
and s missing future explanatory variables
${x}_{AB}^{sf}$
of
${x}_{AB}^{f}$
in treatment A. The density of
${u}^{f}$
when
${x}_{\left(A\right)}^{\left(r+s\right)f}$
is missing is given by
$f\left({u}^{f}|{x}_{\left(A\right)}^{*f},\beta {|}_{\left(A\right)},{\tau}_{\left(A\right)}\right)={\displaystyle \int f\left({u}^{f}|{x}_{\left(A\right)}^{f},{\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)}f\left({x}_{\left(A\right)}^{\left(r+s\right)f}|{x}_{\left(A\right)}^{*f}\right)d{x}_{\left(A\right)}^{\left(r+s\right)f}\equiv N\left(i=0{\displaystyle \sum _{i=0}^{k-r-s-1}{x}_{\left(A\right)i}^{f}{\beta}_{\left(A\right)i}+{\displaystyle \sum _{i=k-r-s}^{k-1}{\eta}_{i}^{*}{\beta}_{\left(A\right)i},{\displaystyle \sum _{i=k-r-s}^{k-1}{\beta}_{\left(A\right)i}}{\beta}_{\left(A\right)j}{\psi}_{ij}^{*}+{\tau}_{\left(A\right)}^{-1}}}\right)$
Where
${\eta}_{i}^{*}$
is the
${\eta}_{i}^{*}$
th component of
${\eta}_{\left(r+s\right)}^{*}$
and
${\psi}_{ij}^{*}$
is the
$\left(i,j\right)$
th component of
${\psi}_{\left(r+s\right)}^{*}$
.
See Bhattacharjee et al [11] in this context. Using Taylor's expansion and improper prior density for both
${\beta}_{\left(A\right)}$
and
${\tau}_{\left(A\right)}$
, the approximate predictive density of
${u}^{f}$
when
${x}_{\left(A\right)}^{\left(r+s\right)f}$
is missing is given by
${f}_{\left(r+s\right)}\left({u}^{f}|{x}_{\left(A\right)}^{*f},data\right)\equiv N\left({\displaystyle \sum _{i=0}^{k-r-s-1}{x}_{\left(A\right)i}^{f}{\widehat{\beta}}_{\left(A\right)i}+{\displaystyle \sum _{i=k-r-s}^{k-1}{\eta}_{i}^{*}{\widehat{\beta}}_{\left(A\right)i},{\displaystyle \sum _{i,j=k-r-s}^{k-1}{\widehat{\beta}}_{\left(A\right)i}{\widehat{\beta}}_{\left(A\right)j}}{\psi}_{ij}^{*}+{s}_{\left(A\right)}^{2}{\gamma}^{*}}}\right)$
evaluated at
${\widehat{\beta}}_{\left(A\right)}$
and
${s}_{\left(A\right)}^{2}$
where
${\gamma}^{*}=\left(1+\frac{1}{2}{\displaystyle \sum _{0}^{k-1}{Q}_{ij}^{*}\left({\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)Cov\left({\beta}_{\left(A\right)i},{\beta}_{\left(A\right)j}\right)+\frac{1}{2}}{Q}_{{\tau}_{\left(A\right)}}^{2}\left({\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)Var\left({\tau}_{\left(A\right)}\right)\right)$
is the multiplicative factor for the second order Taylor's approximation. If
${x}_{\left(A\right)}^{f}$
's are independent the corresponding approximate predictive density of
${u}^{f}$
is
${f}_{\left(r+s\right)}\left({u}^{f}|{x}_{\left(A\right)}^{*f},data\right)\equiv N\left({\displaystyle \sum _{i=0}^{k-r-s-1}{x}_{\left(A\right)i}^{f}{\widehat{\beta}}_{\left(A\right)i}+{\displaystyle \sum _{i=k-r-s}^{k-1}{\eta}_{i}^{*}{\widehat{\beta}}_{\left(A\right)i},{\displaystyle \sum _{i,j=k-r-s}^{k-1}{\widehat{\beta}}_{\left(A\right)i}{\widehat{\beta}}_{\left(A\right)j}}{\psi}_{ij}^{*}+{s}_{\left(A\right)}^{2}{\gamma}^{*}}}\right)$
evaluated at
${\widehat{\beta}}_{\left(A\right)}$
and
${s}_{\left(A\right)}^{2}$
, where
${\eta}_{i}$
and
${\psi}_{i}^{2}$
are mean and variance of the ith missing variable and
$\gamma =\left(1+\frac{1}{2}{\displaystyle \sum _{0}^{k-1}{Q}_{ij}\left({\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)Cov\left({\beta}_{\left(A\right)i},{\beta}_{\left(A\right)j}\right)+\frac{1}{2}}{Q}_{{\tau}_{\left(A\right)}}^{2}\left({\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)Var\left({\tau}_{\left(A\right)}\right)\right)$
. Since no future variable is missing in
$\upsilon $
, the approximate predictive density of
${\upsilon}^{f}$
is same as obtained in Section 2. Thus when
${x}_{\left(A\right)}^{f}$
s are dependent the approximate predictive density of log-odds ratio
${a}^{\text{'}}{w}^{f}$
for
${x}_{\left(A\right)}^{\left(r+s\right)f}$
missing is given by
${f}_{\left(r+s\right)}\left({a}^{\text{'}}{w}^{f}|{x}_{\left(A\right)}^{*f},{x}_{\left(B\right)}^{f};data\right)\equiv {\gamma}^{*}N\left(\xi ,{\omega}^{2}\right)$
, (vi)
Where
$\xi ={\displaystyle \sum _{i=0}^{k-r-s-1}{x}_{\left(A\right)i}^{f}}{\widehat{\beta}}_{\left(A\right)i}+{\displaystyle \sum _{i=k-r-s}^{k-1}{\eta}_{i}^{*}}{\widehat{\beta}}_{\left(A\right)i}-{x}_{\left(B\right)}^{f}{\widehat{\beta}}_{\left(B\right)}$
and
${\omega}^{2}=\left({\displaystyle \sum _{i,j=k-r-s}^{k-1}{\widehat{\beta}}_{\left(A\right)i}}{\beta}_{\left(A\right)j}{\psi}_{ij}^{*}+{s}_{\left(A\right)}^{2}\right)+{s}_{\left(B\right)}^{2}\left(1+{x}_{\left(B\right)}^{f}{\left({X}_{\left(B\right)}^{\text{'}}{X}_{\left(B\right)}\right)}^{-1}{x}_{\left(B\right)}^{f\text{'}}\right)\frac{n-q}{n-q+2}$
The K-L directed measure of divergence between the predictive densities (iv) when no variable is missing and the predictive density (3.1) when
$r+s$
future variables are missing is given by
${D}_{KL}={\displaystyle \int f\left({a}^{\text{'}}{w}^{f}|{x}_{\left(A\right)}^{f},{x}_{\left(B\right)}^{f},data\right)}\mathrm{log}\left(\frac{f\left({a}^{\text{'}}{w}^{f}|{x}_{\left(A\right)}^{f},{x}_{\left(B\right)}^{f},data\right)}{{f}_{\left(r+s\right)}\left({a}^{\text{'}}{w}^{f}|{x}_{\left(A\right)}^{*f},{x}_{\left(B\right)}^{f},data\right)}\right)d{a}^{\text{'}}{w}^{f}=\frac{1}{2{\omega}^{2}}{\left(\theta -\xi \right)}^{2}+\frac{1}{2}\left(\frac{{\delta}^{2}}{{\omega}^{2}}-\mathrm{log}\left(\frac{{\delta}^{2}}{{\omega}^{2}}\right)-1\right)$
=
$-\frac{1}{2}{\displaystyle \sum _{i,j=0}^{k-1}E\left({Q}_{ij}^{*}\left({\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)Cov\left({\tau}_{\left(A\right)i},{\tau}_{\left(A\right)j}\right)\right)}-\frac{1}{2}E\left({Q}_{\tau \left(A\right)}^{2}\left({\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)\mathrm{var}\left({\tau}_{\left(A\right)}\right)\right)$
(vii)
If
${x}_{\left(A\right)}^{f}$
‘s are independent the predictive density of
${a}^{\text{'}}{w}^{f}$
when
$\left(r+s\right)$
future variables are missing is same as (vi) and the corresponding K-L [9] measure
${D}_{KL}$
is same as (vii) but replacing
${\eta}_{i}^{*}$
by
${\eta}_{i}$
$\xi $
,
${\widehat{\beta}}_{\left(A\right)i}{\widehat{\beta}}_{\left(A\right)j}{\psi}_{ij}^{*}$
by
${\widehat{\beta}}_{\left(A\right)i}^{2}{\psi}_{i}^{2}$
in
${\omega}^{2}$
and
${Q}_{ij}^{*}\left({\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)$
by
${Q}_{ij}\left({\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)$
in
${\gamma}^{*}$
, where
${\eta}_{i}$
and
${\psi}_{i}^{2}$
are mean and variance of the
$i$
th missing variable.
Explanatory variables are dichotomous
Here we assume that all the explanatory variables are dichotomous and independent. We assume that the errors of models (ii) and (iii) are normally distributed with means zero and variances
${\tau}_{\left(A\right)}^{-1}$
and
${\tau}_{\left(B\right)}^{-1}$
respectively. To assess the influence of the missing variables in treatment A, we consider that
${x}_{\left(A\right)i}^{f}$
is distributed as
$\mathrm{Pr}\left({X}_{\left(A\right)i}^{f}={x}_{\left(A\right)i}^{f}\right)={\theta}_{\left(A\right)i}^{{x}_{\left(A\right)i}^{f}}{\left(1-{\theta}_{\left(A\right)i}\right)}^{1-{x}_{\left(A\right)i}^{f}},{x}_{\left(A\right)i}^{f}=0,1,i=1,2,\mathrm{...},k-1$
The density of a future
${u}^{f}$
is
$f\left({u}^{f}|{x}_{\left(A\right)}^{f},{\beta}_{\left(A\right)},{\tau}_{\left(A\right)}\right)\equiv N\left({\displaystyle \sum _{i=0}^{k-1}{x}_{\left(A\right)i}^{f}{\beta}_{\left(A\right)i},{\tau}_{\left(A\right)}^{-1}}\right)$
If
${x}_{\left(A\right)}^{\left(r\right)f}$
future variables are missing in treatment A, then the density of a future
${u}^{f}$
is given
$f\left({u}^{f}|{x}_{\left(A\right)}^{*f},{\beta}_{\left(A\right)},{\tau}_{\left(A\right)}^{-1}\right)={\displaystyle \sum _{{x}_{\left(A\right)k-1}^{f}=0}^{1}N\left({\displaystyle \sum _{i=0}^{k-1}{x}_{\left(A\right)i}^{f}{\beta}_{\left(A\right)i},{\tau}_{\left(A\right)}^{-1}}\right){\displaystyle \prod _{i=k-r}^{k-1}{\theta}_{\left(A\right)i}^{{x}_{\left(A\right)i}^{f}}}}{\left(1-{\theta}_{\left(A\right)i}\right)}^{1-{x}_{\left(A\right)i}^{f}}$
The predictive density of
${u}^{f}$
when
${x}_{\left(A\right)}^{\left(r\right)f}$
is missing is given by
$f\left({u}^{f}|{x}_{\left(A\right)}^{*f},{\beta}_{\left(A\right)},{\tau}_{\left(A\right)}^{-1}\right)f\left({\beta}_{\left(A\right)}|data\right)d{\beta}_{\left(A\right)}$
which is not mathematically tractable. For vague prior densities for
${\beta}_{\left(A\right)}$
and
${\tau}_{\left(A\right)}$
Taylor's expansion, and using the approximate predictive density of (viii) is
$f\left({u}^{f}|x{*}_{\left(A\right)}^{f},data\right)={\displaystyle \sum _{{x}_{\left(A\right)f-r}^{f}=0}^{1}\mathrm{...}{\displaystyle \sum _{{x}_{\left(A\right)f-1}^{f}=0}^{1}N\left({\displaystyle \sum _{i=0}^{k-1}{x}_{\left(A\right)i}^{f}{\widehat{\beta}}_{\left(A\right)i}}{s}_{\left(A\right)}^{2}\right)}}{\displaystyle \prod _{i=k-r}^{k-1}{\theta}_{\left(A\right)i}^{{x}_{\left(A\right)i}^{f}}}{\left(1-{\theta}_{\left(A\right)}i\right)}^{1-{x}_{\left(A\right)i}^{f}}\left(1+{\displaystyle \sum _{i,j=0}^{k-1}{Q}_{ij}}\left(\widehat{\beta},{s}_{\left(A\right)}^{-2}\right)\frac{\mathrm{cov}\left({\beta}_{\left(A\right)i},{\beta}_{\left(A\right)j}\right)}{2}+{Q}_{{T}_{\left(A\right)}^{2}}\left({\widehat{\beta}}_{\left(A\right)},{s}_{\left(A\right)}^{-2}\right)\frac{\mathrm{var}\left({T}_{\left(A\right)}\right)}{2}\right)$
(viii)
Since there are no missing variables in
${\upsilon}^{f}$
, the density of
${\upsilon}^{f}$
is same as that can be obtained in Section 2. Then the predictive density of
${a}^{\text{'}}{w}^{f}$
is given by
$\begin{array}{l}f\left(a\text{'}{w}^{f}|x{*}_{\left(A\right)}^{f},{x}_{\left(B\right)}^{f},data\right)={\displaystyle \sum _{{x}_{\left(A\right)f-r}^{f}=0}^{1}\mathrm{...}{\displaystyle \sum _{{x}_{\left(A\right)f-1}^{f}=0}^{1}N\left({\displaystyle \sum _{i=0}^{k-1}\left({x}_{\left(A\right)i}^{f}{\widehat{\beta}}_{\left(A\right)i}-{x}_{\left(B\right)i}^{f}{\widehat{\beta}}_{\left(B\right)i}\right)},{s}_{\left(A\right)}^{2}+{s}_{\left(B\right)}^{2}\left(1+{x}_{\left(B\right)}^{f}{\left(X{\text{'}}_{\left(B\right)}{X}_{\left(B\right)}\right)}^{-1}x{\text{'}}_{\left(B\right)}\right)\right)}}\\ {\displaystyle \prod _{i=k-r}^{k-1}{\theta}_{\left(A\right)i}^{{x}_{\left(A\right)i}^{f}}}{\left(1-{\theta}_{\left(A\right)}i\right)}^{1-{x}_{\left(A\right)i}^{f}}\left(1+{\displaystyle \sum _{i,j=0}^{k-1}{Q}_{ij}}\left({\widehat{\beta}}_{\left(A\right)},{s}_{\left(A\right)}^{-2}\right)\frac{\mathrm{cov}\left({\beta}_{\left(A\right)i},{\beta}_{\left(A\right)j}\right)}{2}+{Q}_{{T}_{\left(A\right)}^{2}}\left({\widehat{\beta}}_{\left(A\right)},{s}_{\left(A\right)}^{-2}\right)\frac{\mathrm{var}\left({T}_{\left(A\right)}\right)}{2}\right)\end{array}$
(ix)
Analytical solution of
${D}_{KL}$
between the predictive densities (iv) and (ix) is very difficult to obtain but numerical solution can be obtained. In Some situations it is seen that among the explanatory variables, some of the variables are dichotomous and some of the variables are continuous. Among the
$k-1$
-explanatory variables, without loss of generality we assume that the first are dichotomous and the remaining last
$k-l-1$
are continuous variables. We also assume that out of l dichotomous future variables last d variables are missing and out of
$(k-l-1)$
continuous future variables last g variables are missing. Then the predictive density of future log-odds ratio
${a}^{\text{'}}{w}^{f}$
when d dichotomous and g continuous variables are missing is given by
$\begin{array}{l}f\left({u}^{f}|x{*}_{\left(A\right)}^{f},data\right)={\displaystyle \sum _{{x}_{\left(A\right)f-r}^{f}=0}^{1}\mathrm{...}{\displaystyle \sum _{{x}_{\left(A\right)f-1}^{f}=0}^{1}N\left({\displaystyle \sum _{i=0}^{k-1}{x}_{\left(A\right)i}^{f}{\widehat{\beta}}_{\left(A\right)i}}{s}_{\left(A\right)}^{2}\right)}}\\ {\displaystyle \prod _{i=k-r}^{k-1}{\theta}_{\left(A\right)i}^{{x}_{\left(A\right)i}^{f}}}{\left(1-{\theta}_{\left(A\right)}i\right)}^{1-{x}_{\left(A\right)i}^{f}}\left(1+{\displaystyle \sum _{i,j=0}^{k-1}{Q}_{ij}}\left(\widehat{\beta},{s}_{\left(A\right)}^{-2}\right)\frac{\mathrm{cov}\left({\beta}_{\left(A\right)i},{\beta}_{\left(A\right)j}\right)}{2}+{Q}_{{T}_{\left(A\right)}^{2}}\left({\widehat{\beta}}_{\left(A\right)},{s}_{\left(A\right)}^{-2}\right)\frac{\mathrm{var}\left({T}_{\left(A\right)}\right)}{2}\right)\end{array}$
(x)
Again, analytical solution of
${D}_{KL}$
between the predictive densities (iv) and (x) is very difficult but we can obtain its numerical solution. In similar way we can derive the predictive density of future log-odds ratio when some future variables are missing in treatment B.
Example 1 revisited: This example is based on the u shot data of Example 1. From Figure 3 we have observed same as Examples 1 and 2 that the discrepancies are less around the mean of the missing variables. Moreover we have observed from Figures 1 and 3 that the discrepancies of the missing variables are less as compared to the discrepancies of the deleted variables.
Example 2 revisited: This example is based on the simulation data of Example 2 and here we have also got same conclusion as Example 1 revisited (Figures 2 & 4).
Group A Group B
Figure 1: Three dimensional scatter plots based on real data for DKL
x1 is deleted
Group A Group B
Figure 2: Three dimensional scatter plots based on simulated data for DKL
x1 is deleted
Group A Group B
Figure 3: Three dimensional scatter plots based on real data for DKL
x_{f}^{1} is missing
Group A Group B
Figure 4: Three dimensional scatter plots based on simulated data for DKL
x_{f}^{1} is missing
Examples 1 and 2 revisited: In this example, we have used
${D}_{KL}$
values for real data for drawing box plots for each cases (deleted and missing). From Figure 5, we have observed that x2 is more in uential than x1. Moreover the discrepancies are much less in missing case than deleted case. We have got same result in simulation study and are illustrated in Figure 6.
Treatment A Treatment B
Figure 5: Box plot for DKL based on real data
Treatment A Treatment B
Figure 6: Box plot for D_{KL} based on simulated data
Evaluation of Predictive Probability of a Logistic Model
We consider the logistic model as
$\mathrm{Pr}\left(y=1|x,\beta \right)=\mathrm{exp}\left(x\beta \right)/\left(1+\mathrm{exp}\left(x\beta \right)\right)$
The probability that a future response yf will be a success is given by
$\mathrm{Pr}\left({y}^{f}=1|{x}^{f},\beta \right)=\mathrm{exp}\left({x}^{f}\beta \right)/\left(1+\mathrm{exp}\left({x}^{f}\beta \right)\right)$
We assume that the conditional density of xf(r) given xf is independent of, where xf denotes the future explanatory variables without variables xf(r). Then predictive probabilities of yf will be a success for models are given by
$\mathrm{Pr}\left({y}^{f}=1|{x}^{f},data\right)={\displaystyle \int \mathrm{Pr}\left({y}^{f}=1|{x}^{f},\beta \right)}f\left(\beta |data\right)d\beta $
and
$\mathrm{Pr}\left({y}^{f}=1|x{*}^{f},data\right)={\displaystyle \int \mathrm{Pr}\left({y}^{f}=1|x{*}^{f},\beta \right)}f\left(\beta |data\right)d\beta $
respectively. Simple analytically tractable priors are not available here. Numerical integration techniques might be used for some specified priors to approximate
$\mathrm{Pr}\left({y}^{f}=1|x{*}^{f},data\right)$
and
$\mathrm{Pr}\left({y}^{f}=1|{x}^{f},data\right)$
, respectively.
Normal approximation for the posterior density
Let us suppose that the sample size is large. Lindley [12] stated that the posterior density
$f\left(\beta |data\right)$
may then be well approximated by its asymptotic normal form as
$f\left(\beta |data\right)\approx {N}_{p}\left(\widehat{\beta},\sum \right)$
where ^ _ is the maximum likelihood estimate of _, _ = ( H) 1 and H is the Hessian of
log L(_) evaluated at b_.
${h}_{jl}\left(\widehat{\beta}\right)=-{\displaystyle \sum _{i=1}^{n}\frac{{x}_{ij}{x}_{il}\mathrm{exp}\left({x}_{i}\widehat{\beta}\right)}{{\left(1+\mathrm{exp}\left({x}_{i}\widehat{\beta}\right)\right)}^{2}}},j,l=0,1,\mathrm{...},k,$
Where xij is the jth component of xi with xi0 = 1: For given xf , z = xf will have approximately a posteriori a normal distribution with mean bxf = xf b and variance d2xf = xf xf0, and with probability density function (zjbxf ; d2xf ). Using the transformation we can approximate f( jxf ; data) by
$\mathrm{Pr}\left({y}^{f}=1|{x}^{f},data\right)\approx {\displaystyle \int \frac{\mathrm{exp}\left(z\right)}{1+\mathrm{exp}\left(z\right)}}\varphi \left(z|{b}_{{}_{{x}^{f}},{d}_{{x}^{f}}^{2}}\right)dz$
Analytical evaluation of (4.1) is very di cult. We can however evaluate then by numerical integration techniques viz Gauss-Hermite Quadrature Abramowitz and Stegun [13], Normal approximation Cox [14], Laplace's approximation de Bruijn [15].
If the sample size is small, the posterior normality assumption may not be accurate. Therefore, we consider Flat prior approximation Tierney and Kadane [16] as an alternative approach using the Laplace's method for integrals
Effect of the variables
${x}^{f}$
Here we assume that the future variables
${x}^{f}$
are dependent and the density of
${x}^{f}$
is p-dimensional multivariate normal i.e.
$f\left({x}^{f}\right)\equiv {N}_{p}\left(n,\psi \right)$
The conditional density of
${x}_{\left(r\right)}^{f}$
for given
${x}^{*f}$
is
$f\left({x}_{\left(r\right)}^{f}|x{*}^{f}\right)\equiv {N}_{r}\left(n{*}_{\left(r\right)},\psi {*}_{\left(r\right)}\right)$
The probability of
${y}^{f}$
as a success when
${x}_{\left(r\right)}^{f}$
is missing given by
$\mathrm{Pr}\left({y}^{f}=1|{x}^{f},data\right)=|{\displaystyle \int g\left(\beta \right)f\left(\beta |data\right)}d\beta $
$\approx \varphi \left(\left({\displaystyle \sum _{i=0}^{k-r}{x}_{i}^{f}{\beta}_{i}+{\displaystyle \sum _{i=k-r+1}^{k}{n}_{i}{\beta}_{i}}}\right)/{\left({k}^{2}+{\displaystyle \sum _{i=k-r+1}^{k}{\beta}_{i}^{2}{\Psi}_{i}^{2}}\right)}^{1/2}\right)$
The integral in (ii) can be evaluated as the integral in (i) using Taylor's and Laplace's approximations.
If, instead, the future variables
${x}_{1}^{f}$
,…,
${x}_{k}^{f}$
are independently and normally distributed with mean and variance
${\psi}_{i}^{2}$
(i = 1, 2, … , k), then the conditional density of
${x}_{\left(r\right)}^{f}$
is
$f\left({x}_{\left(r\right)}^{f}|{x}^{*f}\right)\equiv f\left({x}_{\left(r\right)}^{f}\right)$
Consequently, we get
$\mathrm{Pr}\left({y}^{f}=1|x{*}^{f},data\right)={\displaystyle \int h\left(\beta \right)f\left(\beta |data\right)}d\beta $
See Aitchison and Begg (1976) in this context. Again,
$\mathrm{Pr}\left({y}^{f}=1|x{*}^{f},data\right)={\displaystyle \int h\left(\beta \right)f\left(\beta |data\right)}d\beta $
Variables xf are dichotomous
Here we assume that the variables
${x}_{f}$
are independent and they can take only two values 0 or 1. We also assume that
${x}_{1}^{f}$
is distributed as
$\mathrm{Pr}\left({X}_{i}^{f}={X}_{i}^{f}\right)={\theta}_{i}^{{x}_{i}^{f}}{\left(1-{\theta}_{i}\right)}^{1-{x}_{i}^{f}}$
${x}_{i}^{f}=0,1,i=1,2,\mathrm{....},k$
If
${x}_{\left(r\right)}^{f}$
is missing the probability of
${y}^{f}$
as a success is given by
$\mathrm{Pr}\left({y}^{f}=1|x{*}^{f},\beta \right)={\displaystyle \sum _{{x}_{k-r+1}^{f}=0}^{1}\mathrm{...}}{\displaystyle \sum _{{x}_{k}^{f}}^{1}\frac{\mathrm{exp}\left({x}^{f}\beta \right)}{1+\mathrm{exp}\left({x}^{f}\beta \right)}}{\displaystyle \prod _{i=k-r+1}^{k}{\theta}_{i}^{{x}_{i}^{f}}}{\left(1-{\theta}_{i}\right)}^{1-{x}_{i}^{f}}=h\left(\beta \right)$
If
${x}_{\left(r\right)}^{f}$
is missing the probability of
${y}^{f}$
as a success is given by
$\mathrm{Pr}\left({y}^{f}=1|x{*}^{f},\beta \right)={\displaystyle \sum _{{x}_{k-r+1}^{f}=0}^{1}\mathrm{...}}{\displaystyle \sum _{{x}_{k}^{f}}^{1}\frac{\mathrm{exp}\left({x}^{f}\beta \right)}{1+\mathrm{exp}\left({x}^{f}\beta \right)}}{\displaystyle \prod _{i=k-r+1}^{k}{\theta}_{i}^{{x}_{i}^{f}}}{\left(1-{\theta}_{i}\right)}^{1-{x}_{i}^{f}}=h\left(\beta \right)$
The predictive probability of
${y}^{f}$
as a success when
${x}_{\left(r\right)}^{f}$
is missing is given by
$\mathrm{Pr}\left({y}^{f}=1|x{*}^{f},data\right)={\displaystyle \int h\left(\beta \right)f\left(\beta |data\right)}d\beta $
If the sample size is large, assuming the normality assumption for the posterior density we can approximate (iii) using Taylor's theorem, Laplace's method and normal approximation.
Example: one variable case
Here we consider two different logistic models based on any single variable either
${x}_{1}$
or
${x}_{2}$
. We want to measure the discrepancies between the predictive probability
${\widehat{p}}_{i}$
, based on a single variable
${x}_{i}$
when
${x}_{i}^{f}$
is known, and the predictive probability
${\widehat{p}}_{0}$
, based on xi alone when
${x}_{2}^{f}$
is missing, to assess the influence of the missing variable
${x}_{1}^{f}$
, i = 1; 2. The predictive probability
${\widehat{p}}_{i}$
is determined using quadrature approximation and the predictive probability
${\widehat{p}}_{0}$
is determined using second order Taylor's approximation.
We assume that the marginal densities of the future variables
${x}_{1}^{f}$
and
${x}_{2}^{f}$
are normal with means 33.35, 78.24 and variances 65.39, 1827.0 respectively, where means and variances are the estimated sample means and sample variances from the observed data. We employ the absolute difference of probabilities and Kullback-Leibler divergence measure to assess the influence of the missing variable. The discrepancies are drawn in Figure 7. Here we see that the discrepancies due to missing
${x}_{1}^{f}$
in the predictive probability based on
${x}_{1}$
are very large compared to the discrepancies due to missing
${x}_{2}^{f}$
in the predictive probability based on
${x}_{2}$
. The discrepancies are less around the mean of the missing variable.
x_{f}^{1} is missing x_{f}^{2} is missing
Kullback-Leibler directed divergence DKL
Figure 7: Box plot for D_{KL} based on simulated data
Example: two-variable Case
Now we consider that the predictive probability based on two variables
${x}_{1}^{f}$
and
${x}_{2}^{f}$
when both
${x}_{1}^{f}$
and
${x}_{2}^{f}$
are known is denoted by
${\widehat{p}}_{12}$
and the predictive probability
${\widehat{p}}_{ij}$
,
$i=0,1$
,
$j=0,2$
and
$\left(i,j\right)\ne \left(1,2\right)$
based on
${x}_{1}$
and
${x}_{2}$
when any future variable is missing. " indicates missing variable. Here also the predictive probability
${\widehat{p}}_{12}$
is determined using quadrature approximation and predictive probabilities
${\widehat{p}}_{10}$
,
${\widehat{p}}_{02}$
and
${\widehat{p}}_{0}$
are determined using second order Taylor's approximation. Here we assume that the joint density of
${x}_{1}^{f}$
and
${x}_{2}^{f}$
is bivariate normal with correlation coefficient 0:33 which is the estimated sample correlation coefficient from the observed data. The absolute differences of the two predictive probabilities
${\widehat{p}}_{12}$
and
${\widehat{p}}_{02}$
when
${x}_{1}^{f}$
is missing and the absolute differences of the two predictive probabilities
${\widehat{p}}_{12}$
and
${\widehat{p}}_{10}$
when
${x}_{2}^{f}$
is missing are drawn in Figure 8. Kullback-Leibler directed divergence DKL are drawn in Figure 9. The discrepancies when
${x}_{1}^{f}$
is missing and for different given values of the other variable for both the cases are close together since the correlation between
${x}_{1}^{f}$
and
${x}_{2}^{f}$
are very small. The discrepancies due to missing
${x}_{1}^{f}$
are very large compared to missing
${x}_{2}^{f}$
except near the mean of the missing variable. If both
${x}_{1}^{f}$
and
${x}_{2}^{f}$
are missing the discrepancies are drawn in Figure 10. These discrepancies are very similar to the discrepancies due to missing
${x}_{1}^{f}$
alone in the predictive probability based on
${x}_{1}$
and
${x}_{2}$
since the contribution of
${x}_{2}$
is negligible.
x_{f}^{1} is missing
Absolute difference jp^12 p^10j
x_{f}^{2} is missing
Figure 8: Absolute difference jp^12p^02j
x_{f}^{1} is missing
Kullback-Leibler directed divergence DKL
x_{f}^{2} is missing
Figure 9: Box plot for D_{KL} based on simulated data
Kullback-Leibler directed divergence DKL
x^{f}_{1} and x^{f}_{2} are both missing
Figure 10: Absolute difference jp^12p^00j.
Concluding Remarks
In our present study we have observed that the discrepancies are minimum around the mean of the deleted variables as well as the mean of the missing future variables in both the logistic model and the log-odds ratio; the discrepancies are larger if the deleted or missing variables are more influential; the discrepancies in the deleted case are higher than the missing case.
In this present paper we studied the important problem of predictive influence of variables on the log odds ratio under a Bayesian set up. The treatment difference
$\mathrm{Pr}\left({Y}_{i}=1|{Z}_{i}=1,{x}_{i}\right)-\mathrm{Pr}\left({Y}_{i}=1|{Z}_{i}=0,{x}_{i}\right)$
Or the risk of ratio
$\mathrm{Pr}\left({Y}_{i}=1|{Z}_{i}=1,{x}_{i}\right)/\mathrm{Pr}\left({Y}_{i}=1|{Z}_{i}=1,{x}_{i}\right)$
can also be studied along the same lines.
We have also considered the influence of missing future explanatory variables in a logistic model. Influence of missing future explanatory variables in a Probit and complementary log-log models can also be studied in similar fashion.
References
- Breslow N (1981) Odds ratio estimators when the data are sparse. Biometrika 68: 73-84.
- Bohning D, Kuhnert, Rattanasiri S, R Kuhnert (2008) Meta-analysis of binary data using pro le likelihood. (1st ed), A Chapman and Hall CRC Interdisciplinary Statistics.
- Pregibon D (1981) Logistic regression diagnostics. Annals of Statistics 9: 705-724.
- Cook, R Dennis, Weisberg, Sanford (1982) Residuals and Influence in Regression. Chapman and Hall, New York, USA.
- Johnson W (1985) Influence measures for logistic regression: Another point of view. Biometrika 72(1): 59-65.
- Bhattacharjee S K, Dunsmore I R (1991) The influence of variables in a logistic model. Biometrika 78(4): 851-856.
- Mercier C, Shelley M, C, Rimkus J, Mercier J (1997) Age and Gender as Predictors of Injury Severity in Head-on Highway Vehicular Collisions. The 76th Annual Meeting , Transportation Research Board, Washington, USA.
- Zellner D, Keller F, Zellner GE (2004) Variable selection in logistic regression models. Communications in Statistics 33(3): 787-805.
- Kullback S, Leibler R A (1951) On information and sufficiency. Ann Math Statist 22: 79-86.
- S Rao Jammalamadaka, Tiwari R C, Chib Siddhartha (1987) Bayes prediction in the linear model with spherically symmetric errors. Economics Letters 24: 39-44.
- Bhattacharjee S K, Shamiri A, Sabiruzzaman Md, S Rao Jammalamadaka (2011) Predictive Influence of Unavailable Values of Future Explanatory Variables in a Linear Model. Communications in Statistics - Theory and Methods 40: 4458-4466.
- Lindley D V (1961) The use of prior probability distributions in statistical inference and decisions. Proc. 4th Berkeley Symp 1: 453-468.
- Abramowitz M, Stegun I A (1966) Handbook of Mathematical Functions. National Bureau of Standards, USA.
- Cox D R (1970) Binary regression. Chapman and Hall, London, UK.
- De Bruijn N C (1961) Asymptotic Methods in Analysis. Amsterdam, North-Holland.
- Tierney L, Kadane, Joseph B. Kadane (1986) Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association 81(393): 82-86.
- Logistic Regression Example with Grouped Data. Regression FluShots, University of North Florida.
- Bhattacharjee S K, Dunsmore I R (1995) The predictive influence of variables in a normal regression model. J Inform Optimiz Sci 16(2): 327-334.