Document

Transcripción

Document

Curs 2014-15
UPC
APUNTS DE CLASSE: TEMA 3
RESPOSTA BINÀRIA
GRAU D’ESTADÍSTICA – UB- UPC
MLGz
TABLE OF CONTENTS
3-1.
TOPIC 3: BINARY RESPONSE DATA. BINOMIAL MODELS _________________________________________________________________ 3
3-1.1 COMPONENTS OF GENERALIZED LINEAR MODELS _____________________________________________________________________________ 3
3-1.2 CLASSIFICATION OF STATISTICAL LINEAR MODELS ____________________________________________________________________________ 5
3-2.
BINOMIAL MODELS FOR BINARY DATA _______________________________________________________________________________ 12
3-2.1 LINK FUNCTIONS _______________________________________________________________________________________________________ 13
3-3.
INTERPRETING MODEL PARAMETERS ________________________________________________________________________________ 19
3-4.
ESTIMATION OF MODEL PARAMETERS ________________________________________________________________________________ 21
3-5.
GOODNESS OF FIT ____________________________________________________________________________________________________ 23
3-5.1 ROC CURVE AND CONFUSION MATRICES ___________________________________________________________________________________ 30
3-6.
MODEL DIAGNOSTICS ________________________________________________________________________________________________ 33
3-6.1 RESIDUALS IN GLMZ ___________________________________________________________________________________________________ 33
3-6.2 INFLUENCE DATA IN GLMZ ______________________________________________________________________________________________ 36
3-6.3 GRAPHICS FOR DIAGNOSTICS _____________________________________________________________________________________________ 37
3-7.
EJEMPLO 1: ACCIDENTES CON HERIDOS SEGÚN USO DEL CINTURÓN – AGRESTI (2002) __________________________________ 38
3-8.
EJEMPLO 2: PARTICIPACIÓN LABORAL FEMENINA EN CANADA (FOX) __________________________________________________ 67
Prof. Lídia Montero i Josep A. Sánchez
Pàg.
3- 2
Curs 2. 01 4- 2. 01 5
MLGz
3-1. TOPIC 3: BINARY RESPONSE DATA. BINOMIAL MODELS
3-1.1
Components of generalized linear models
Generalized linear models are extensions of the classical multiple regression models.
Let
y T = ( y1 ,  , y n )
be a vector of n components, sample of a random vector
with statistical independent components, identically distributed with means
Y T = (Y1 , , Yn ) ,
µ T = (µ 1 ,  , µ n ) :
• The random component assumes that mutual Independence holds and each random variable in
Y T = (Y1 , , Yn )
belongs to the exponential family with one parameter distributions with
jointly expected values
Ε[Y] = µ .
 At desaggreted leved for each individual observation, the response is dycothomic and we are
concerned with Bernoulli distribution. For grouped data, binomial distributions are suitable.
• The systematic component in the model model specifies a vector
η , the predictor lineal vector is a
linear combination from a limited number of explicative variables
or regressors
β T = (β 1 ,, β p ) to be estimated. In matrix notation, η = X β
and β is px1.
and parameters
nx1,
X is nxp
X = ( X 1 , , X p )
Pàg.
3- 3
where
Curs 2. 01 4- 2. 01 5
η
is
MLGz
THEME 3: BINARY RESPONSE DATA. BINOMIAL MODELS
• For each observation, the expected value
function, notated as g(. ), and thus
Response function is:
µ
is related to the linear predictor
η = g (µ ) .
η , through
the link
µ = g −1 (η )
In ordinary least squares models for normal data, the identity link is used
η=µ.
For binary data, several link functions are commonly used and will be detailed in a forthcoming section.
Since ML estimates
βˆ → ∀i ηî = xiT βˆ → µˆ i = g −1 (ηî )
Pàg.
3- 4
Curs 2. 01 4- 2. 01 5
MLGz
THEME 3: BINARY RESPONSE DATA. BINOMIAL MODELS
3-1.2
Classification of statistical linear models
Explicative
Variables
Dicothomic
Polythomic
Continuous
(covariates)
Factors and
covariates
Random
Effects
Response Variable
Dicothomic or
Binary
Polythomic
Contingency tables
Logistic regression
Log-linear models
Contingency tables
Logistic regression
Log-linear models
Logistic regression
Contingency tables
Log-linear models
Log-linear
models
Contingency tables
Log-linear models
Logistic regression
Mixed models
Counts
(discrete)
Continuous
Normal
Time between
events
Survival
Analysis
Log-linear
models
Tests for 2
subpopulation
means: t.test
ONEWAY,
ANOVA
*
Log-linear
models
Multiple
regression
Survival
Analysis
*
Log-linear
models
Covariance
Analysis
Survival
Analysis
Mixed models
Mixed
models
Pàg.
3- 5
Mixed models
Survival
Analysis
Mixed models
Curs 2. 01 4- 2. 01 5
MLGz
3-2. INTRODUCTION TO BINARY RESPONSE DATA. BINOMIAL MODELS
This variable appears when given a sample each individual holds or does not hold a target characteristic of
the study and this is codified as (Y=1) or not (Y=0).
For example, on mode selection in transportation models, one might be interested in the modal choice
between public (metro, bus, light train, etc) or private modes (car, motorcycle, etc) in the study area
for home to work trips. In those models, the response variable can be defined for a commuter as Y=1
(positive response or success, for example public modes), or Y=0 (negative response or fail, in our case
private modes).
 It is possible the extension to more than 2 levels or categories in the response variable.
 The probability of success is notated by
π
, in such a way that,
P (Yk = 1) = π k :
Probability of positive response (success) for kth individual unit in sample.
P (Yk = 0) = 1 − π k :
Probability of negative response (fail) for kth individual in sample.
Each individual in the sample is characterized by a set of variables some covariates (income, age)
some of them factors (gender, grades, etc) that defines:
Pàg.
3- 6
x Tk = (x1  x p ) .
Curs 2. 01 4- 2. 01 5
MLGz
(
 Explicative variables that will form the linear predictor x k = x1
T
 x p ) might be:
• Quantitative variables or covariates.
• Transformation of original variables .
• Polynomial regressors build from covariates.
• Dummy variables to represent factors.
• Dummy variables to represent interaction between factors and covariates.
Por example, in the public-private binary modal choice model, for each commuter variables as income,
gender, car availability, distance to local public transport, value of time, etc.
 In this subject, the goal relies on studying the relationship between the response variable y and the
explicative variables in order to model the probability of positive response:
π = π (x ) .
 In design of experiment groups of individual units are defined, and each group receives a combination
of experimental conditions in such a way that are shared by all the unit in the group, in general,
factors are considered as explicative variables, and kth experimental condition is modeled by a
common set of values for all the explicative variables of individual units in the group
x Tk = (x1  x p ) and thus apply to mk individual units.
Pàg.
3- 7
Curs 2. 01 4- 2. 01 5
MLGz
 The total number of units in the sample is the sum of the size of the groups and thus, the number of
experimental conditions or groups is defined by
N = m1 +  + mn .
n
and the total number of units is
Each group or combination of experimental conditions defines a covariate class where all individual
units belonging to the covariate class share the same values for explicative variables.
The former difference between individual and covariate class is critical when specifying data to statistical
packages. In general, both representations fully desaggregated (each individual outcome is detailed) or
aggregated at some level according to covariate classes are allowed:
1. Some analysis methods are well suited for aggregated data, and perform badly when applied to
disaggregated data, for example asymptotic approximations to normality.
2. Asymptotic approximations for aggregated data are based either on the asymptotic evolution of the
number of covariate classes (or groups) ( m → ∞ ) or on the total number of individual units (
N → ∞ ).
3. Disaggregated data is suitable for asymptotic approximations based on the total size.
Pàg.
3- 8
Curs 2. 01 4- 2. 01 5
MLGz
 … Let us work with a simple example to see differences in the representation…
Disaggregated data
Aggregated data
Individual unit
Variables
Response
Covariate class
Size of the
class
Positive
responses
1
(male, 1)
0
(1,1)
2
1
2
(male,2)
1
(1,2)
3
2
3
(male,2)
0
(2,1)
1
0
4
(female,1)
0
(2,2)
1
1
5
(female,2)
1
6
(male,2)
1
7
(male,1)
1
Former table shows an experiment consisting on dicothomic factors A and C and thus n=4=2x2 is the
number of covariate classes, but the total number of individuals is N=7 . In our example, factor A can be
gener (two levels, coded as male and female) and factor C is the car availability (1 car or more than 1).
Pàg.
3- 9
Curs 2. 01 4- 2. 01 5
MLGz
… Aggregated versus disaggregated data …
 More efficient and less memory consumption when aggregated data is considered. It simplifies
significant effect detected at a glance.
 Aggregated data implies serial order is lost. If additional variables are present, only average values
can be considered possibly leading to ecological fallacy situations.
 Aggregated data implies a response variable model of the binomial type, since sample observed
y1 m1 , , y n m n , where 0 ≤ y k ≤ mk
positive responses are
responses in kth covariate class which size is
being the number of positive
mk .
 Size of covariate classes in a vector form are called binomial index vector and notated by
m = (m1  mn ) .
For disaggregated data, each individual unit defines a binomial response for a
group of size 1 and thus,
m = (1  1) .
Pàg.
3- 1 0
Curs 2. 01 4- 2. 01 5
MLGz
 When factor levels define covariate classes, as in our example, a contingency table seems a good
representation for aggregated data, our convention for future subject is stated as: Y response (in
columns), factor C (subtable) and factor A (rows):
FACTOR C
C1 =1
FACTOR A
CK=2 =2
FACTOR B – Respuesta Y
FACTOR B – Respuesta Y TOTAL
B1 Y=0
BJ=2 Y=1
SUBTOTAL
B1 Y=0
BJ=2 Y=1
SUBTOTAL
A1 = male
1
1
2
1
2
3
5
AI=2 =female
1
0
1
0
1
1
2
SUBTOTAL
2
1
1
3
TOTAL
3
4
Pàg.
3- 1 1
7
Curs 2. 01 4- 2. 01 5
3-2.
MLGz
BINOMIAL MODELS FOR BINARY DATA
m y
pY ( y ) = P ([Y = y ]) =  π (1 − π) m − y
 y

y <0
0

 y  m  i
m −i
0 ≤ y≤m
FY ( y ) = ∑  π (1 − π )
 i =0  i 

y>m
1

Ε[Y ] = m ⋅ π
V [Y ] = m ⋅ π ⋅ (1 − π )
Pàg.
3- 1 2
 Usually considered and defined
in introductory courses to statistical
analysis or theory of probability:
(
)
Let Y ≈ B m , π
a binomial
variable that models the number of
positive responses in m independent
trials in a Bernoully process and thus
each one with a common probability
π.
Curs 2. 01 4- 2. 01 5
MLGz
3-2. BINOMIAL MODELS FOR BINARY DATA
3-2.1
Link functions
 The goal consists on stablishing a functional relationship between the probability of a positive result
π and the vector of explicative variables (predictors in general, covariates if they are continuous)
x T = (x1  x p ) : π = π (x ) .
• In Generalized Linear Models, the link function relates the linear predictor scale with the expected
value of the probabilistic variable selected to model the random response. In the case of a binomial
model concerned with the probability of positive response for a dicothomic individual response, the
η
linear predictor
might for a given observation any value in the real axis, but the probability of
positive answer belongs to the open interval (0, 1).
 Vector
π
is related to the linear predictor
η = g (π ) , π
is nx1.
 Canonic link for binomial data is the logit function:
η,
through the link function, notated as g(. ),
η = θ = logit(π ) .
Pàg.
3- 1 3
Curs 2. 01 4- 2. 01 5
MLGz
3-2. BINOMIAL MODELS FOR BINARY DATA
 Logit link function is the most frequently used link, but one should understand the role of link
functions and do not act authomatically.
 Some common link function for binary response
data are:
1. The logit link (sometimes bad-called logistic
 π
1− π
link): η = g1 (π ) = logit (π ) = log
and
π 1 (η ) = g1−1 (η ) =

.

exp(η )
1 + exp(η ) ; this is the
distribution function for a standard logistic
variable, whose density function is
(g )' (η ) =
−1
1
exp(η )
(1 + exp(η ))2
with 0 mean (position
parameter) and variance π 3 (scale
parameter 1), this is a continuous and
symmetric variable, quite similar to normal
distribution.
2
Pàg.
3- 1 4
Curs 2. 01 4- 2. 01 5
MLGz
3-2. BINOMIAL MODELS FOR BINOMIAL DATA : LINK FUNCTIONS
 … link function for binary response data:
2. Probit link or standard normal inverse:
Standard normal (mean 0 and variance 1).
3. Log-log complementary link
η = g 2 (π ) = Φ −1 (π ) and π 2 (η ) = g 2−1 (η ) = Φ (η ) .
η = g 3 (π ) = log(log(1 (1 − π ))) . Where the response function is,
π 3 (η ) = g 3−1 (η ) = 1 − exp(− exp(η ))
.
This link function is the inverse of the distribution function for the minimum extreme value law
(Gompertz law), with position parameter 0 and scale parameter 1, giving an expected value of e=0.577216 (first derivative of gamma function evaluated at 1) and variance
4. Log-log
link
η = g 4 (π ) = − log(log(1 π )) ,
where
the
π 2 6.
response
π 4 (η ) = g 4−1 (η ) = 1 − exp(− exp(− η )) . This link function is the
function
is
inverse of distribution of
probability function for the maximum extreme value law (Gumbel law) ), with position parameter 0
and scale parameter 1, giving an expected value of e=0.577216 (-first derivative of gamma function
evaluated at 1) and variance
π 2 6.
 All former link function are related to well-known inverse of distribution functions for continuous
variates.
Pàg.
3- 1 5
Curs 2. 01 4- 2. 01 5
MLGz
 Logit link:
exp(η )
π 1 (η ) = g (η ) =
1 + exp(η )
−1
1
(g )' (η ) =
−1
1
y
(1 + exp(η ))
where
P(.)
indicates
some
distribution function for continuous
variables and transforms real values
for the linear predictor to the
interval.
[0,1]
Transformations should depend on
characteristics of data and not to
be selected in an straight forward
way.
3.74
2.88
2.02
1.16
0.3
-0.56
-1.42
D_PI_1(ETA)
-2.28
= π 1 (η )(1 − π 1 (η ))
π i = Ρ(ηi ) = Ρ(x Ti β ),
PI_1(ETA)
-3.14
2
In general,
1.2
1
0.8
0.6
0.4
0.2
0
-4
Probability
Link Logt
Linear Predictor
exp(η )
Pàg.
3- 1 6
Curs 2. 01 4- 2. 01 5
MLGz
 logit and probit links can be seen related to changes in scales:
Probability
π
0,01
0,05
0,10
0,15
0,20
0,25
0,30
0,50
0,70
0,75
0,80
0,85
0,90
0,95
0,99
Odds
Probit
1− π
Log-odds
 π 
log
 = xβ
1− π 
Φ −1 (π ) = xβ
C_log-log
  1 
  = xβ
log log
 1− π 

 
Log-log
  1 
− log log   = xβ
  π 
  
0,0101
0,0526
0,1111
0,1765
0,2500
0,3333
0,4286
1,0000
2,3333
3,0000
4,0000
5,6667
9,0000
19,0000
99,0000
-4,5951
-2,9444
-2,1972
-1,7346
-1,3863
-1,0986
-0,8473
0,0000
0,8473
1,0986
1,3863
1,7346
2,1972
2,9444
4,5951
-2,3263
-1,6449
-1,2816
-1,0364
-0,8416
-0,6745
-0,5244
0,0000
0,5244
0,6745
0,8416
1,0364
1,2816
1,6449
2,3263
-4,60015
-2,97020
-2,25037
-1,81696
-1,49994
-1,24590
-1,03093
-0,36651
0,18563
0,32663
0,47588
0,64034
0,83403
1,09719
1,52718
-1,52718
-1,09719
-0,83403
-0,64034
-0,47588
-0,32663
-0,18563
0,36651
1,03093
1,24590
1,49994
1,81696
2,25037
2,97020
4,60015
π
Pàg.
3- 1 7
Curs 2. 01 4- 2. 01 5
 Relantionship between log-log and c-log-log link functions:
  1
g 3 (π ) = log log
 1 − π
   1

  = − log log

  1 − π
  
   = − g 4 (1 − π )

 All former link functions are continuous and monotone increasing functions in the (0,1).
 logit and probit function show an almost linear relationship in the 0.1 to 0.9 subinterval.
 For low probabilities logit and complementary log-log functions are quite similar.
 For probabilities closed to 1, complementary log-log trends slowly to 1 than logit function does.
 For probabilities closed to 1, logit and log-log links are quite similar.
Pàg.
3- 1 8
Curs 2. 01 4- 2. 01 5
MLGz
3-3.
MLGz
INTERPRETING MODEL PARAMETERS
Assume a logit link.
 The linear predictor contains 2 covariates X1 and X2, so the log-odd of a positive response would be:
 π
1 − π
η = log

 = [1 x1

β0 
x 2 ] β 1  = x T β
 β 2 
 In the odd scale for a positive response:
π
1 −π
(
)
= exp(η ) = exp x T β = exp(β 0 + β 1 x1 + β 2 x 2 )
 In terms of the probability scale, for a positive response given a logit link, and its response function
π = g1−1 (η ) one has,
(
)
exp(β 0 + β 1 x1 + β 2 x 2 )
exp(η )
exp x T β
=
=
π=
1 + exp(η ) 1 + exp x T β 1 + exp(β 0 + β 1 x1 + β 2 x 2 )
(
)
Pàg.
3- 1 9
Curs 2. 01 4- 2. 01 5
MLGz
INTERPRETING MODEL PARAMETERS
 … and thus, the negative response probability is,
1 −π =
1
1
1
=
=
1 + exp(η ) 1 + exp(Xβ ) 1 + exp(β 0 + β1 x1 + β 2 x2 )
 In the log-odds scale a change from the reference level of factor C to the alternative level modeled
β2 :
through the dummy variable x2 whose parameter is
1. Would have an effect of a change in the log odds scale of a positive response equal to the estimate
for the model parameter
β2
.
2. In the odd scale, the effect would be multiplicative, a changing from the reference level of C to
the alternative level represents to multiply the odd for the reference level by exponential of the
estimated parameter for the dummy variable
leaving all the other variables in the same values.
exp(β 2 ) ,
all else being equal, and this means
3. Interpreting in the probability scale is much more difficult since the effect depends not only of
factor C levels, but on the rest of model variables and thus, only an approximation can be given. We
can use the partial derivative effect on the positive response
means a greater effect when the positive response probability
Pàg.
3- 20
π
π
is
∂π
= π (1 − π )β 2
, that
∂x 2
is closer to 0.5 and not to 0 or 1.
Curs 2. 01 4- 2. 01 5
3-4.
MLGz
ESTIMATION OF MODEL PARAMETERS
 The estimation process relies on an unconstricted maximization of the log likelihood function,
T
Maxβ (β, y ) = ∑ log f ( yi , β p ) β T = (β ,  , β )
= ( y1 ,  , y n ) .
y
,
1
p and
n
i =1
 The iterative process to compute the estimates is called the method of the scores, a second order
method Newton-type, specialized to the properties of the log likelihood function. The method converges
fast, but it is not globally convergent.
 Existence and unicity for estimates under any of the formerly presented link functions, if
0 < yi < mi
for any covariate class/observations.
 The quality of the initialization is not usually very important, since the algorithm shows fast
convergence properties, but it is not globally convergent- so an extreme initial point might lead to
divergence.
Pàg.
3- 21
Curs 2. 01 4- 2. 01 5
MLGz
3-4 ESTIMATION OF MODEL PARAMETERS
There are some examples, easy to build, where convergence of the estimates does not hold.
 When some values for variables perfectly separate positive from negative responses, then observations
have a null probability or full positive response,
Estimates for such parameters
value. For example,
if
β
yi = 0
o
yi = mi .
do not converge, but fitted values
(
and deviance tend to a limiting
)
exp β1 + β 2 xi
→1
xi = 1 ⇒ π i = 1 ⇒ β 2 → ∞ π i =
1 + exp β1 + β 2 xi
Or
if
πˆ
xi = 0 ⇒ π i = 0 ⇒ β 2 → −∞
Pàg.
(
(
)
)
exp β1 + β 2 xi
→0
πi =
1 + exp β1 + β 2 xi
3- 22
(
)
Curs 2. 01 4- 2. 01 5
3-5.
 Let
MLGz
GOODNESS OF FIT
β̂
the estimates of the model parameters, thus a linear predictor for each observation i might be
computed
ηî = x iTβ
and thus through the response function (inverse of the selected link function)
−1
ˆ
(ηî ) .
=
π
g
fitted values can be computed:
i
 The scaled deviance can be computed,
D' (y, µˆ ) = 2 (y, y) − 2 (µˆ , y ) .
 And so, the deviance that under the binomial distribution is identical, since φ = 1
D(y, µˆ ) = D' (y, µˆ )φ = D' (y, µˆ )
 The saturated model
(y, y) implies
if
Yi ≈ B (mi , π i )
fitted probabilities as observed probabilities, notated in the
y
~
i
following as π i =
mi , for all i = 1,…n.
Pàg.
3- 23
Curs 2. 01 4- 2. 01 5
BINOMIAL MODELS FOR BINARY DATA: GOODNESS OF FIT
 Expression for the deviance:

 yi
D (y, µˆ ) = D (y, πˆ ) = 2∑  yi log
i =1 
 miπˆ i
n
 (mi − yi )  

 
 + (mi − yi ) log
 (mi − miπˆ i )  

 Sometimes, the deviance statistic is ,
D=2
∑
n
∑ oi log
postive , negative i =1
oi
ei
where,
1. Observed values for positive response for observation i,
oi = y i .
2. Observed values for negative response for observation i,
o i = mi − y i .
3. Expected positive responses for observation i,
ei = miπ̂ i .
4. Expected negative responses for observation i,
ei = mi − miπ̂ i
Pàg.
3- 24
.
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
 Assimptotic distribution for model (M) with p parameters DM = D(Y, πˆ ) is
confused with
χ N2 − p ).
2
χ n−
p
(not to be
Assymptotic conditions are
Thus, a goodness of fit test can be formulated as H0 “The current model fits properly the data” and
the p value for the test is
(
)
P χ n2− p > DM = p _ value :
 If pvalue <<0.05 then there is evidence to reject H0 and thus, the model (M) does not fit properly
data. There is an statistical evidence of discrepancy between observations and fitted values
provided by model (M).
 If pvalue >> 0.05 then there is not evidence to reject H0 and thus, H0 is accepted leading to the
conclusion that model (M) does fit properly data, since discrepance between observed and fitted
values is not significative in statistical terms.
 AIC (Akaike Information Criteria, 1974) is defined as a trade-off between a goodness of fit
provided by a model (M) and the number of parameters p in the model (as an indicator for model
complexity) kaike. Let M be a model with p parameters
with minimum AIC are preferred.
Pàg.
3- 25
AIC M =2 (− (πˆ M , y ) + p ) . Models
Curs 2. 01 4- 2. 01 5
MLGz
 In order to consider sample size, another statistic named BIC (Bayesian Information Criteria) is
proposed (in SAS©), Schwartz criteria
are preferred (AIC(model,k=log(n)).
BICB =− 2 (πˆ B , y ) + p log n . Minimum BIC models
 AIC and BIC might be used to compare unnested models.
 Following McCullagh, the test for Generalized LM equivalent to F.Test in classical linear
regression, consists on comparing differences on scaled deviance on 2 hierarchical models
(nested models):
Let MA be a model with q parameters nested in model MB with p > q parameters, let πˆ A and πˆ B
fitted probabilities for both models, Such that, the set of parameters for MB are those common to
Τ
MA and those specific; i.e., β Β =
(β
T
1
, β 2T
) and β ΑΤ = (β )
T
1
with dim( β A )=q<p, then
∆D AB = D(y , πˆ A ) − D(y , πˆ B ) = 2 (πˆ B , y ) − 2 (πˆ A , y )
And for testing
Η 0 : β 2 = 0 P (χ
2
p−q
> ∆D AB
)
is asymptotically distributed
2
χ p−
q
 << α Η 0 Rejected
→
<< α Η 0 Accepted
It is a contrast for multiple coefficients! Large values indicate non-equivalence of models
Pàg.
3- 26
Curs 2. 01 4- 2. 01 5
.
MLGz
 Classical t de Student test for linear regression has the equivalence in Wald test for binomial
models:
H0 :
β j = βˆ j
βˆ j − β j
Z0 =
asint . N (0,1)
σˆ βˆ
then
, if H0 is true.
j
 Bilateral asymptotic confidence interval at level 100(1 - α )%
βˆ j ± zα / 2σˆ βˆ
j
, where
z1- α / 2
for a model parameter is
is N(0,1) distributed
 Wald Test is natural in a maximum likelihood estimation framework since asymptotically :
where
[
ℑ = Ε UU T
βˆ − β ≈ N p (0, ℑ −1 ) ,
] is Fisher Expected Information Matrix (score variance), using as an approximation
X T WX
belonging to last iteration (convergence) of Method of the Scores
ˆ
ˆ
Then , (β − β ) ℑ (β − β ) ≈ χ
T
2
p
, Wald statistic is W=
Pàg.
3- 27
(
) (
T
ˆ
β − β ℑ βˆ − β
).
Curs 2. 01 4- 2. 01 5
MLGz
Multiple hypothesis testing is also possible with Wald Test

(
Η0 : β = β 0
 Or β
T
(
) [ ] (βˆ − β )
T
ˆ
β
β
=
−
W
V βˆ
through
0
= β ,β
T
1
T
2
)
−1
0
≈ χ p2 .
[ ]
ˆT ˆ
dim( β 2 )=q<p and Η 0 : β 2 = 0 then W = β 2 V β 2
 If dim( β 2 )=1 then Η 0 : β 2 = 0 : z =
βˆ2
[ ]
V βˆ2
−1
βˆ 2
≈ χ q2 .
≈ N (0, 1) .
 Deviance for a GLM plays a role similar to Residual Sum of Squares in classical regression, thus it
allows to define a Generalized
R2 = 1 −
R2, or pseudo R2
D(y , π A )
G (y , π A )
=
where G (y , π A ) = D(y , π 0 ) − D(y , π A ) ,
D(y , π 0 ) G (y , π A ) + D(y , π A )
0 ≤ R2 ≤ 1
 In R software:
library(lmtest)
waldtest(modelA, modelB, ..., test = c("F", "Chisq")) # Wald Test
anova(modelA, modelB, ...,
test = c("F", "Chisq")) # Deviance Test
Pàg.
3- 28
Curs 2. 01 4- 2. 01 5
MLGz
 Goodness of fit by generalized Pearson X2 statistic, asymptotically distributed as:
2
n
 n mi ( yi − μ̂ i )2  
(
oi − ei ) 

= ∑
  = ∑∑




ei
 i=1 μ̂ i (mi − μ̂ i )   + , − i=1

2
(
yi − mi π̂ i )
=∑
i =1 mi π̂ i (1 − π̂ i )
n
X
2
 When disaggregated data is used then mi = 1 and asymptotic theory does not apply, as a rule of thumb
2
at a fully disaggregated level for data, a model is good if Deviance or Pearson X statistics are similar
to degrees of freedom for the model. Large values indicate a lack of fit of the model.
 The Hosmer-Lemeshow Statistic (1980,1989) is another measure of goodness of fit. Hosmer and
Lemeshow recommend partitioning the observations into 10 equal sized groups according to their
predicted probabilities. Usually G=10 groups are defined for predicted probabilities 0–0.1, 0.1–0.2, …
0.9–1, but fewer than 10 groups depending of data characteristics can be applied. For each group, the
observed number of outcomes (positive, negative) in the sample and the expected number of outcomes
(positive, negative) are compared by calculating the Pearson chi-square statistic from the 2xG table of
observed and expected frequencies. The statistic is distributed as a chi-square distribution with G2 degrees of freedom;
G
2
X HL
=∑
g =1
(o
− mg π̂ g )
2
g
mg π̂ g (1 − π̂ g )
Pàg.
3- 29
Curs 2. 01 4- 2. 01 5
MLGz
3-5.1
ROC Curve and Confusion Matrices
ROC (Receiver Operating Characteristic) curve analysis has been widely accepted as the standard for
describing and comparing the accuracy of predictions.
If the ROC curve rises rapidly towards the upper right-hand corner of the graph, or if the value of area
under the curve is large, we can say the model performs well. If the area is close to 1.0, it indicates that
the model is good. While if the area is to 0.5, it shows that the model is bad.
Confusion matrix for a binary model (M) shows predicted response versus observed response
(positive/negative outcomes)
Let the prediction in response be
yˆ i = 1 if πˆ i > s , where is a threshold
s between 0 and 1. For each s
a confusion matrix can be built for model (M):
• Sensibility is the proportion of observed positive
s
Y=1
Y=0
Total
yˆ i = 1
a
b
a+b
yˆ i = 0
c
d
c+d
a+c
b+d
n
outcomes (Y=1) predicted as positive (
yˆ i = 1 ):
Sn =a/(a+c).
• Specificity is the proportion of observed negative
outcomes (Y=0) predicted as negative ( yˆ i
Sp = d/(b+d).
Pàg.
3- 30
Curs 2. 01 4- 2. 01 5
= 0 ):
MLGz
 ROC Curve shows in X axis for each s, 1-Sp (false negative rate) and Sensibility - Sn (true positive
rate) in Y-Axis.
• The point (0,1) is the perfect classifier: it classifies all positive cases and negative cases correctly.
It is (0,1) because the false positive rate is 0 (none), and the true positive rate is 1 (all).
• The point (0,0) represents a classifier that predicts all cases to be negative.
• The point (1,1) corresponds to a classifier that predicts every case to be positive.
• A good electronic address to understand ROC curves http://gim.unmc.edu/dxtests/ROC1.htm.
A guideline for interpreting ROC curve is:
.90-1 = excellent(A)
.80-.90 = very good (B)
.70-.80 = good (C)
.60-.70 = bad (D)
.50-.60 = very bad (F)
Pàg.
3- 31
Curs 2. 01 4- 2. 01 5
MLGz
BINOMIAL MODELS FOR BINARY DATA: GOODNESS OF FIT – ROC CURVE
How to compute in R, Pearson X2 statistic - sum of squares of Pearson’s residuals:
sum( resid( model, ‘pearson’) ^2 )
As in the case, for deviance residual:
sum( resid( model, ‘deviance’) ^2 )
==
model$deviance
Package rms contains the specific method lrm(.) for logistic regression with additional diagnostics (c,
Naglekerke R2, and so on). NagelkerkeR2 is also in the fmsb package.
To compute ROC curves: Install package ROCR – specific performance plots are available
>
>
>
>
>
>
library("ROCR")
dadesroc<-prediction(predict(lm2_logit,type="response"),ars$resposta)
par(mfrow=c(1,2))
plot(performance(dadesroc,"err"))
plot(performance(dadesroc,"tpr","fpr"))
abline(0,1,lty=2)
Pàg.
3- 32
Curs 2. 01 4- 2. 01 5
3-6.
MODEL DIAGNOSTICS
3-6.1
Residuals in GLMz
MLGz
Extension to Generalized linear Models of Normal regression methods:
 According to Pregibon (1981), Williams (1987) and Fox (2008): a residual can be defined for each
covariate
class
i:
ei = y i − yˆ i = y i − miπ̂ i .The
response
residuals
diagnostics,however, because they ignore the nonconstant variance that is a crucial
are
not
used
in
part of a GLMz.
 Pearson residuals are casewise components of the Pearson goodness of fit statistic for the model,
e =
P
i
( yi − μ̂ i )
V [μ̂ i ] φˆ and
n
X = ∑ rPi
2
2
i =1
These are a basic set of residuals for use with a GLM because of their direct analogy to linear models.
For a model named M, the R command residuals(M, type="pearson") returns the Pearson
residuals.
 Standardized Pearson residuals correct for conditional response variation and for the leverage of the
observations:
eiPS =
( yi − μ̂ i )
V [μ̂ i ](1 − hii )
Pàg.
3- 33
Curs 2. 01 4- 2. 01 5
MLGz
MODEL DIAGNOSTICS: RESIDUALS
To compute the standardized Pearson residuals, we need to define the hat-values hii for GLMs.
 Deviance residuals,
residual deviance
eiD = sign ( yi − μ̂ i ) d i
D (y, µˆ ) = ∑i =1,, n rD2i
are the square roots of the casewise components of the
, attaching the sign of
yi − μ̂ i .
o In the linear model, the deviance residuals reduce to the Pearson residuals.
o The deviance residuals are often the preferred form of residual for GLMs, and are returned by
the command residuals(M, type="deviance") .
φ (1 − hii ) .
 Standardized deviance residuals are
 Studentized residual in generalized linear models are approximations to the scaled difference between
eiDS = eiD
ˆ
the response and the fitted value computed without case i:
( )
eiStu = sign ( yi − μ̂ i ) (1 − hii ) eiDS
2
( )
+ hii eiPS
2
 The following functions, some in standard R and some in the car package, have methods for GLMs:
rstudent, hatvalues, cooks.distance, dfbetas, outlierTest, avPlots, residualPlots,
 marginalModelPlots, crPlots, etc.
Pàg.
3- 34
Curs 2. 01 4- 2. 01 5
MLGz
 Hat matrix for Generalized Linear Models can be defined, although it depends on Y (through W)
and x’s values,
(
H = W X X WX
12
T
)
−1
XT W1 2
H matrix is symmetric with diagonal values between 0 and 1, hii, named leverages and
p/n.
average value
It corresponds to the last iteration (convergence) of the IWLS for estimating model
parameters.
The hii are taken from the final iteration of the Iterative Weighted Least Squares procedure for
fitting the model and have the usual interpretation, except that, unlike in a linear model, the hatvalues in a GLM depend on y as well as on the configuration of the xs.
Pàg.
3- 35
Curs 2. 01 4- 2. 01 5
MLGz
3-6.2
Influence data in GLMz
 Influence data are detected by and adapted Cook’s statitstic derived from Wald statistic for
multiple hypothesis testing: H0:
β = β0 ,
) [](
(
T
ˆ
Z = β − β 0 Vˆ βˆ
2
0
−1
) (
)
(
T
ˆ
ˆ
β − β 0 = β − β 0 X T WX βˆ − β 0
Let Wald statistic be for observation I,
Z (2−i )
for testing H0:
)
β = βˆ(−i ) , distance between
βˆ(−i ) ( d i = βˆ − βˆ(−i ) ).
And thus,
(
)
T
(
Z (2− i ) = βˆ − βˆ(− i ) X T WX βˆ − βˆ(− i )
) = (e ) h
PS 2
i
ii
p(1 − hii )
Pàg.
3- 36
Curs 2. 01 4- 2. 01 5
β̂
and
MLGz
3-6.3
Graphics for diagnostics
 A scatterplot showing Standardized Pearson residuals (Y-axis) and leverage (hii, diagonal of H). Cutoffs can be included at 2p/n.
 A scatterplot showing Pearson residuals versus each of the predictors in turn.
 A scatterplot showing Pearson residuals against fitted values in the linear predictor scale, thus,
residualPlots() plots residuals against the estimated linear predictor, η( x).
 Examine leverage for observations.
 Examine Cook’s distance for observations.
o In binary regression for disaggregated data, the plots of Pearson residuals or deviance
residuals are strongly patterned—particularly the plot against the linear predictor, where the
residuals can take on only two values, depending on whether the response is equal to 0 or 1.
o A correct model requires that the conditional mean function in any residual plot be constant as
we move across the plot, and ‘to see this’ smoothers help in the purpose.
 In R, residualPlots(M), in each panel in the graph by default includes a smooth fit; a lack-of-fit test is
provided only for the numeric predictor
residualPlots(mod.working, layout=c(1, 3))
influenceIndexPlot(mod.working, vars=c("Cook", "hat"), id.n=3)
Pàg.
3- 37
Curs 2. 01 4- 2. 01 5
3-7.
EJEMPLO 1: ACCIDENTES
CINTURÓN – AGRESTI (2002)
CON
HERIDOS
SEGÚN
USO
MLGz
DEL
Datos de 68694 accidentes sucedidos en el estado de Main. Se recoge la gravedad y las variables
explicativas de género, entorno y uso del cinturón. Se estudiará la incidencia en la presencia de heridos de
los factores, por tanto se crea un factor dicotómico: Sin – Con Heridos (ref. Sin)
genero
Mujer
Mujer
Mujer
Mujer
Hombre
Hombre
Hombre
Hombre
Mujer
Mujer
Mujer
Mujer
Hombre
Hombre
Hombre
Hombre
Mujer
Mujer
Mujer
Mujer
entorno
Urbano
Urbano
NoUrbano
NoUrbano
Urbano
Urbano
NoUrbano
NoUrbano
Urbano
Urbano
NoUrbano
NoUrbano
Urbano
Urbano
NoUrbano
NoUrbano
Urbano
Urbano
NoUrbano
NoUrbano
cinturon
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
gravedad
y
SinHeridos
SinHeridos
SinHeridos
SinHeridos
SinHeridos
SinHeridos
SinHeridos
SinHeridos
LeveSinHospital
LeveSinHospital
LeveSinHospital
LeveSinHospital
LeveSinHospital
LeveSinHospital
LeveSinHospital
LeveSinHospital
LeveConHospital
LeveConHospital
LeveConHospital
LeveConHospital
7287
11587
3246
6134
10381
10969
6123
6693
175
126
73
94
136
83
141
74
720
577
710
564
genero
Hombre
Hombre
Hombre
Hombre
Mujer
Mujer
Mujer
Mujer
Hombre
Hombre
Hombre
Hombre
Mujer
Mujer
Mujer
Mujer
Hombre
Hombre
Hombre
Hombre
Pàg.
3- 38
entorno
Urbano
Urbano
NoUrbano
NoUrbano
Urbano
Urbano
NoUrbano
NoUrbano
Urbano
Urbano
NoUrbano
NoUrbano
Urbano
Urbano
NoUrbano
NoUrbano
Urbano
Urbano
NoUrbano
NoUrbano
cinturon
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
No
Si
gravedad
y
LeveConHospital
LeveConHospital
LeveConHospital
LeveConHospital
Hospitalización
Hospitalización
Hospitalización
Hospitalización
Hospitalización
Hospitalización
Hospitalización
Hospitalización
Mortal
Mortal
Mortal
Mortal
Mortal
Mortal
Mortal
Mortal
Curs 2. 01 4- 2. 01 5
566
259
710
353
91
48
159
82
96
37
188
74
10
8
31
17
14
1
45
12
MLGz
EXEMPLE 1
> summary(acc)
genero
entorno
Hombre:20
NoUrbano:20
Mujer :20
Urbano :20
cinturon
gravedad
Si:20
Hospitalización:8
No:20
LeveConHospital:8
LeveSinHospital:8
Mortal
:8
SinHeridos
:8
y
Min.
:
1.00
1st Qu.:
66.75
Median : 138.50
Mean
: 1717.35
3rd Qu.: 710.00
Max.
:11587.00
f.heridos
Sin: 8
Con:32
> tapply(acc$y,acc$f.heridos,sum);sum(acc$y)
Sin
Con
62420 6274
[1] 68694
 Tomando como variable de respuesta la presencia de heridos (f.heridos), globalmente se observa 6274
accidentes de un total de 68694, con una probabilidad de 0,0913. El odds es 6274/62420 o 0,1005 a 1
i el log-odds es log(0,1005) = -2.297472.
 Se propone comparar inicialmente la presencia de heridos (respuesta) según el Factor Uso del Cinturón
(2 niveles, base-line Si).
Cinturón
Con Heridos
Sin Heridos
m
(respuesta positiva)
Si (ref)
2409
35383
37792
No
3865
27037
30902
6274
62420
68694
P(‘Accidente CON Heridos’)= 0. 091 3= 6274/68694
Pàg.
3- 39
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
Sólo hay 2 posibles modelos: el modelo nulo que asume homogeneidad en el Uso en los dos grupos definidos
por el Factor (M1) y el modelo completo (M2) que propone proporciones diferentes en el Uso entre los dos
grupos:
(M1)
 π 
log i  = η
1 − πi 
(M2)
 π
log i
1−πi

 = η + αι i = 1, 2 α1 = 0

> dfc
cinturon
m ypos yneg
Si
Si 37792 2409 35383
No
No 30902 3865 27037
>
> acc.m1 <-glm(cbind(ypos,yneg)~1, family=binomial(link=logit), data=dfc)
> summary(acc.m1)
Call:
glm(formula = cbind(ypos, yneg) ~ 1, family = binomial(link = logit),
data = dfc)
Deviance Residuals:
Si
No
-19.59
19.60
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.29747
0.01324 -173.5
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Pàg.
3- 40
Curs 2. 01 4- 2. 01 5
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 768.03 on 1 degrees of freedom
Residual deviance: 768.03 on 1 degrees of freedom
AIC: 789.55
>
> acc.m2 <-glm(cbind(ypos,yneg)~cinturon, family=binomial(link=logit), data=dfc)
> summary(acc.m2)
Call:
glm(formula = cbind(ypos, yneg) ~ cinturon, family = binomial(link = logit),
data = dfc)
Deviance Residuals:
[1] 0 0
Coefficients:
0.02106 -127.61
<2e-16 ***
cinturonNo
0.74178
0.02719
27.29
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null deviance: 7.6803e+02 on 1 degrees of freedom
Residual deviance: -4.3099e-13 on 0 degrees of freedom
AIC: 23.523
> residuals(acc.m1,'pearson')
Si
No
-18.61742 20.58856
> xpea<-sum(residuals(acc.m1,'pearson')^2);xpea
[1] 770.4972
Pàg.
3- 41
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EXEMPLE 1
El estadístico de Pearson de (M2) es 0 y de (M1) toma por expresión:
m ( y − µˆ i )
= 770.4972 ≈ χ n2− p = 2 −1=1
X = ∑i =1, 2 i i
µˆ i (mi − µˆ i )
2
2
P
La devianza de (M2) es 0 y de (M1) toma por expresión:

y 
 m − yi 
 = 768.3 ≈ χ n2− p = 2 −1=1 .
D = 2∑i =1, 2  yi log i  + (mi − yi )log i
 µˆ i 
 mi − µˆ i 

Ambos estadísticos son altamente significativos, implicando que el modelo no se ajusta bien a los datos.
En (M1) el estimador
ηˆ = −2.29747
, el logit de la proporción muestral.
En (M2), el estimador η̂ , es el logit del nivel de referencia (Si) (logit de la proporción de heridos en grupo
que Usa cinturón, logit(2409/37792)=-2.687) y el efecto del nivel No sobre el logit de la proporción de
heridos (diferencia de logits entre el nivel No y el nivel de referencia Si: logit(3865/30902)logit(2409/37792)=0.742.
 eη
πi
=
1 − π i eη eα
2
ι = 1 Yes
ι = 2 No
odds − ratio NovsYes = eα 2 = 2.1
Los odds de tener heridos entre los accidentes que No usan cinturón es más del doble que el odds de tener
heridos entre los que Si usan cinturón.
Pàg.
3- 42
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
 Ahora procedamos a analizar la incidencia de accidentes con heridos según el género del conductor
accidentado (referencia género hombre).
Genero
Con
yi
mi − yi
Sin
mi
Hombre
2789
34166
36955
Mujer
3485
28254
31739
6274
62420
68694
> acc.m2g <-glm(cbind(ypos,yneg)~genero, family=binomial(link=logit), data=dfg)
> summary(acc.m2g)
Call:
glm(formula = cbind(ypos, yneg) ~ genero, family = binomial(link = logit),
data = dfg)
Deviance Residuals:
[1] 0 0
Coefficients:
0.01969 -127.23
<2e-16 ***
generoMujer 0.41278
0.02665
15.49
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null deviance: 2.4172e+02
Residual deviance: -7.0122e-13
on 1
on 0
degrees of freedom
degrees of freedom
Pàg.
3- 43
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
AIC: 23.571
Number of Fisher Scoring iterations: 2
>> xpea<-sum(residuals(acc.m1g,'pearson')^2);xpea
[1] 242.4970
> log(2789 /34166);log(3485 /28254);log(3485 /28254)-log(2789 /34166)
[1] -2.505548
[1] -2.092767
[1] 0.4127809
> exp(0.41278)
[1] 1.511013
>
 Sólo hay 2 posibles modelos: el modelo nulo que asume homogeneidad en la presencia de heridos en
accidentes en los 2 grupos definidos por el Factor (M1) y el modelo completo (M2) que propone
proporciones diferentes en los accidentes con heridos entre los 2 grupos:
(M1)
 π 
log i  = η
1 − πi 
(M2)
 π i   η i = 1≡ H
 = 
log
1
π
−
η + α 2 i = 2 ≡ M
i 

El estadístico de Pearson de (M2) es 0 y de (M1) toma por expresión:
m ( y − µˆ i )
X = ∑i =1÷ 2 i i
= 242.497 ≈ χ n2−1= 2 −1=1
µˆ i (mi − µˆ i )
2
2
P
La devianza de (M2) es 0 y de (M1) toma por expresión:
Pàg.
3- 44
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1

 m − yi 
y 
 = 241.72 ≈ χ n2− p = 2 −1=1
D = 2∑i =1÷ 2  yi log i  + (mi − yi )log i
.
 mi − µˆ i 
 µˆ i 

Ambos estadísticos
son altamente significativos, implicando que el modelo no se ajusta bien a los datos.
 En (M1) el estimador ηˆ = −0,775 , el logit de la proporción muestral.
 En (M2), el estimador
η̂ ,
es el logit del nivel de referencia (Hombres) (logit de la proporción de
heridos en accidentes en hombres a la vista de la tabla, logit(2789/34166)= -2.51) y el efecto del nivel
2 (mujeres) sobre el logit de “H” (diferencia de logits en los grupos: log(3485 /28254)-log(2789
/34166)=0.413.
 eη
πi
= η α
1 − π i e e
i
ι =1 H
ι =2H
Los odds de accidentes con heridos
hombres.
odds − ratio Grupo i vs H = eα i = 1.51
se incrementan en un 51% en las mujeres respecto los
Queda por probar el último modelo univariante según Entorno urbano o no urbano: los odds de accidentes
con heridos se decrementan en un (1-exp(-0.7158))x100%=51% si sucede en entorno urbano. Los odds de
urbano son 0.4887= exp(-0.7158) veces los odds de no urbano.
Pàg.
3- 45
Curs 2. 01 4- 2. 01 5
EXEMPLE 1
> summary(acc.m2e)
Call:
glm(formula = cbind(ypos, yneg) ~ entorno, family = binomial(link = logit),
data = dfe)
Deviance Residuals:
[1] 0 0
Coefficients:
(Intercept)
-1.89784
0.01859 -102.08
<2e-16 ***
entornoUrbano -0.71584
0.02664 -26.87
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null deviance: 7.1961e+02
Residual deviance: 3.9262e-12
AIC: 23.564
on 1
on 0
degrees of freedom
degrees of freedom
> xpea<-sum(residuals(acc.m1e,'pearson')^2);xpea
[1] 745.0957
>
Pàg.
3- 46
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EXEMPLE 1
Modelos con 2 Predictores: Cinturón y Entorno
Hay 4 grupos o clases de las covariables, sea yij el número de accidentes con heridos en el grupo de
Cinturón i-ésimo y grupo de Entorno j-ésimo, donde los niveles de referencia son ‘Si’ para Cinturón (Factor
A) y ‘NoUrbano’ para el Factor C.
> df2
cinturon entorno
m ypos yneg
1
Si NoUrbano 14097 1270 12827
2
No NoUrbano 11426 2057 9369
3
Si
Urbano 23695 1139 22556
4
No
Urbano 19476 1808 17668
Hay 5 modelos de interés aplicables a la estructura sistemática de los datos anteriores (M1) a (M5), cuyas
devianzas y detalles de la estimación con MINITAB se detallan a continuación.
Modelo
n-p
Devianza
1 1
3
1504.1
2 A
2
736.11
767.99
(M2) vs (M1)
1
3 C
2
784.53
719.57
(M3) vs (M1)
1
4 A+C
1
2.7116
733.4
(M4) vs (M2)
1
781.8
(M4) vs (M3)
1
2.7116
(M5) vs (M4)
1
5 A*C
0
0
∆D
Contraste
g.l.
Constante:
Todos significativos
Pàg.
3- 47
Modelo
η
η +αi
Entorno: η + β j
Cinturón:
Aditivo:
η + αi + β j
Interacción Factores:
η + α i + β j + αβ ij
Curs 2. 01 4- 2. 01 5
EXEMPLE 1
> sum(df2[,3]);sum(df2[,4]);sum(df2[,5])
[1] 68694
[1] 6274
[1] 62420
> acc.m20 <-glm(cbind(ypos,yneg)~1, family=binomial(link=logit), data=df2)
> summary(acc.m20)
Call:
glm(formula = cbind(ypos, yneg) ~ 1, family = binomial(link = logit),
data = df2)
Deviance Residuals:
1
2
3
-0.5131
29.4486 -25.2217
4
0.7247
Coefficients:
0.01324 -173.5
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null deviance: 1504.1
Residual deviance: 1504.1
AIC: 1542.4
on 3
on 3
degrees of freedom
degrees of freedom
> acc.m21 <-glm(cbind(ypos,yneg)~entorno, family=binomial(link=logit), data=df2)
> summary(acc.m21)
Pàg.
3- 48
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
Call:
glm(formula = cbind(ypos, yneg) ~ entorno, family = binomial(link = logit),
data = df2)
Deviance Residuals:
1
2
3
-14.92
15.04 -12.97
4
12.94
Coefficients:
(Intercept)
-1.89784
0.01859 -102.08
<2e-16 ***
0.02664 -26.87
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AIC: 824.76
on 3
on 2
degrees of freedom
degrees of freedom
> acc.m22 <-glm(cbind(ypos,yneg)~cinturon, family=binomial(link=logit), data=df2)
> summary(acc.m22)
Call:
glm(formula = cbind(ypos, yneg) ~ cinturon, family = binomial(link = logit),
data = df2)
Pàg.
3- 49
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
Deviance Residuals:
1
2
3
12.10
16.82 -10.30
4
-14.17
Coefficients:
0.02106 -127.61
<2e-16 ***
cinturonNo
0.74178
0.02719
27.29
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AIC: 776.34
on 3
on 2
degrees of freedom
degrees of freedom
> acc.m23 <-glm(cbind(ypos,yneg)~cinturon+entorno, family=binomial(link=logit), data=df2)
> summary(acc.m23)
Call:
glm(formula = cbind(ypos, yneg) ~ cinturon + entorno, family = binomial(link = logit),
data = df2)
Deviance Residuals:
1
2
3
-0.8793
0.7358
0.9220
4
-0.7396
Pàg.
3- 50
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
Coefficients:
(Intercept)
-2.28676
0.02465 -92.78
<2e-16 ***
cinturonNo
0.75265
0.02734
27.53
<2e-16 ***
0.02682 -27.12
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual deviance:
2.7116
AIC: 44.938
on 3
on 1
degrees of freedom
degrees of freedom
[1] 787.0698
[1] 761.8445
[1] 1618.284
[1] 2.712893
> 1-pchisq(xpea,1)
[1] 0.09954032
>
Pàg.
3- 51
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EXEMPLE 1
El modelo aditivo ajusta bien los datos, vamos a interpretar sus parámetros:
1.
η = −2.28676
es el logit de la probabilidad base: accidentes cuando se usa cinturón en entorno
rural.
2.
α2
3.
β2
muestra un efecto decreciente de la incidencia de accidentados cuando el accidente ocurre en
Entorno urbano.
4.
α2
muestra un efecto creciente de la incidencia de accidentados cuando No se usa el cinturón.
es positivo y el odds de padecer heridos cuando no se usa cinturón es más del doble que entre los
accidentes cuando se usa cinturón dentro del mismo grupo de Entorno (all else being equal o ceteris
paribus).
 La tentativa final consiste en considerar todos las variables explicativas disponibles, es decir,
considerar tres factores A, C y D (Cinturón, Entorno y Género). Los posibles modelos son 12 ¡!! Se va a
cambiar el orden de los niveles del Factor C – Entorno para facilitar la interpretación.
Pàg.
3- 52
Curs 2. 01 4- 2. 01 5
EXEMPLE 1
 El modelo aditivo ajusta bien los datos, pero todavía queda devianza por explicar:
> summary(acc)
genero
entorno
Hombre:20
Urbano :20
Mujer :20
NoUrbano:20
cinturon
gravedad
Si:20
Hospitalización:8
No:20
LeveConHospital:8
LeveSinHospital:8
Mortal
:8
SinHeridos
:8
y
Min.
:
1.00
1st Qu.:
66.75
Median : 138.50
Mean
: 1717.35
3rd Qu.: 710.00
Max.
:11587.00
f.heridos
heridos
Sin: 8
Min.
: 0.0
Con:32
1st Qu.: 9.5
Median : 74.0
Mean
:156.8
3rd Qu.:163.0
Max.
:720.0
>
> df3
cinturon
1
Si
2
No
3
Si
4
No
5
Si
6
No
7
Si
8
No
entorno
Urbano
Urbano
NoUrbano
NoUrbano
Urbano
Urbano
NoUrbano
NoUrbano
genero
m ypos yneg
Hombre 11349 380 10969
Hombre 11193 812 10381
Hombre 7206 513 6693
Hombre 7207 1084 6123
Mujer 12346 759 11587
Mujer 8283 996 7287
Mujer 6891 757 6134
Mujer 4219 973 3246
Pàg.
3- 53
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
> summary(acc.m331)
Call:
glm(formula = cbind(ypos, yneg) ~ cinturon + entorno + genero,
family = binomial(link = logit), data = df3)
Deviance Residuals:
1
2
3
-0.5055 -0.7976
0.2133
4
0.9023
5
1.7426
6
-0.4639
7
-1.5365
8
0.3172
Coefficients:
(Intercept)
-3.33639
0.03114 -107.14
<2e-16 ***
cinturonNo
0.81710
0.02765
29.55
<2e-16 ***
entornoNoUrbano 0.75806
0.02697
28.11
<2e-16 ***
generoMujer
0.54483
0.02727
19.98
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual deviance:
7.4645
AIC: 82.167
on 7
on 4
degrees of freedom
degrees of freedom
Pàg.
3- 54
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EXEMPLE 1
 El siguiente paso podría ser añadir una interacción entre 2 de los factores: A*C o A*D o C*D.
Modelo
n-p
Devianza
∆D
Contraste
g.l.
1 A+C+D
4
7.4645
2 A*C+D
3
3 A*D+B
4 C*D+A
Modelo
Aditivo: η + α i + β j + γ k
3.5914
3.8730
(M2) vs (M1)
1
Interacción Cinturón-Entorno :
3
7.3826
0.0818
(M3) vs (M1)
1
Interacción Cinturón-Género:
3
4.4909
2.9736
(M4) vs (M1)
1
Interacción Entorno-Género:
η + α i + β j + γ k + αβij
η + α i + β j + γ k + αγ ik
η + α i + β j + γ k + βγ jk
Estrictamente sólo la interacción entre Cinturón y Entorno es estadísticamente significativa, aunque la
interacción entre Entorno y Género tiene un pvalor del 8% según el contraste de devianza con el modelo
aditivo. Se interpreta el mejor modelo obtenido hasta el momento donde intervienen los 3 factores y una
interacción doble entre el Uso de Cinturón y el Entorno donde sucede el accidente.
glm(formula = cbind(ypos, yneg) ~ cinturon * entorno + genero,
Coefficients:
(Intercept)
-3.30342
0.03509 -94.149
<2e-16
cinturonNo
0.76173
0.03933 19.366
<2e-16
entornoNoUrbano
0.69360
0.04239 16.362
<2e-16
generoMujer
0.54594
0.02729 20.007
<2e-16
cinturonNo:entornoNoUrbano 0.10800
0.05486
1.968
0.049
Pàg.
3- 55
family = binomial, data = df3)
***
***
***
***
*
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
La interpretación en la escala lineal de:
• Si el conductor es mujer los log odds se incrementan en 0.55 unidades respecto al grupo de referencia
hombres dentro del mismo grupo del resto de factores.
• No usar el cinturón incrementa la escala lineal en 0.76 unidades en Entorno urbano y 0.76+0.11 en
entorno NoUrbano; dentro del mismo grupo de género.
• Conducir en entorno No Urbano incrementa la escala lineal en 0.69 unidades si se usa cinturón y
0.69+0.11 si no se uso cinturón.
• Tanto el uso del cinturón como el entorno no pueden interpretarse independientemente, ya que hay un
término de interacción.
La interpretación en la escala de los odds seria:
• Si el conductor es mujer los odds de darse heridos en el accidente se incrementan en un 73%
(exp(0.55)=1.73) respecto al grupo de referencia hombres, dentro del mismo grupo del resto de
factores.
• No usar el cinturón incrementa los odds de darse heridos en el accidente en un 113% (exp(0.76)=2.13) en
Entorno urbano y en un 140% (exp(0.76+0.11)=2.387) en entorno NoUrbano; dentro del mismo grupo de
género.
• Conducir en entorno No Urbano incrementa los odds de darse heridos en el accidente en un 100%
(exp(0.69)=1.994) si se usa cinturón y en casi un 125% (exp(0.69+0.11)=2.226) si no se usa cinturón;
dentro del mismo grupo de género.
Pàg.
3- 56
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
La interpretación en la escala de las probabilidades son aproximadas y seria en términos absolutos según
una probabilidad marginal de darse heridos en un accidente de P(‘Accidente
CON
Heridos’)= 0. 091 3= 6274/68694: Y de aquí 0. 091 3x(1 - 0. 091 3)= 0. 083.
• Si el conductor es mujer la probabilidad de darse heridos en el accidente sube en 0.046
(0.083x0.55=0.046) respecto al grupo de referencia hombres, dentro del mismo grupo del resto de
factores.
• No usar el cinturón incrementa la probabilidad de darse heridos en el accidente en 0.063
(0.083x0.76=0.063) en Entorno urbano y en un 0.072 (0.083(0.76+0.11)=0.072) en entorno NoUrbano;
• Conducir en entorno No Urbano incrementa la probabilidad de darse heridos en el accidente en 0.057
(0.083x0.69=0.057) si se usa cinturón y en 0.066 (0.083(0.696+0.11)=0.066) si no se usa cinturón;
Pàg.
3- 57
Curs 2. 01 4- 2. 01 5
EXEMPLE 1
> summary(acc.m331)
Call:
glm(formula = cbind(ypos, yneg) ~ cinturon + entorno + genero,
Deviance Residuals:
1
2
3
-0.5055 -0.7976
0.2133
4
0.9023
5
1.7426
6
-0.4639
7
-1.5365
8
0.3172
Coefficients:
(Intercept)
-3.33639
0.03114 -107.14
<2e-16 ***
cinturonNo
0.81710
0.02765
29.55
<2e-16 ***
entornoNoUrbano 0.75806
0.02697
28.11
<2e-16 ***
generoMujer
0.54483
0.02727
19.98
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual deviance:
7.4645
AIC: 82.167
on 7
on 4
degrees of freedom
degrees of freedom
> summary(acc.m332)
Call:
glm(formula = cbind(ypos, yneg) ~ cinturon + entorno * genero,
Pàg.
3- 58
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
…
Coefficients:
(Intercept)
-3.36383
0.03519 -95.592
<2e-16
cinturonNo
0.81618
0.02765 29.521
<2e-16
entornoNoUrbano
0.80907
0.04010 20.177
<2e-16
generoMujer
0.59306
0.03914 15.152
<2e-16
entornoNoUrbano:generoMujer -0.09345
0.05422 -1.724
0.0848
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual deviance:
4.4909
AIC: 81.193
on 7
on 3
***
***
***
***
.
degrees of freedom
degrees of freedom
> summary(acc.m333)
Call:
glm(formula = cbind(ypos, yneg) ~ cinturon * entorno + genero,
…
Coefficients:
(Intercept)
-3.30342
0.03509 -94.149
<2e-16 ***
cinturonNo
0.76173
0.03933 19.366
<2e-16 ***
entornoNoUrbano
0.69360
0.04239 16.362
<2e-16 ***
generoMujer
0.54594
0.02729 20.007
<2e-16 ***
cinturonNo:entornoNoUrbano 0.10800
0.05486
1.968
0.049 *
Pàg.
3- 59
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
Residual deviance:
3.5914
AIC: 80.294
on 7
on 3
degrees of freedom
degrees of freedom
> summary(acc.m334)
Call:
glm(formula = cbind(ypos, yneg) ~ cinturon * genero + entorno,
Coefficients:
(Intercept)
-3.34236
0.03755 -89.014
<2e-16 ***
cinturonNo
0.82621
0.04220 19.579
<2e-16 ***
generoMujer
0.55459
0.04370 12.691
<2e-16 ***
entornoNoUrbano
0.75792
0.02698 28.096
<2e-16 ***
cinturonNo:generoMujer -0.01598
0.05586 -0.286
0.775
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual deviance:
7.3826
AIC: 84.085
on 7
on 3
degrees of freedom
degrees of freedom
Pàg.
3- 60
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
> anova(acc.m331,acc.m332,test="Chisq")
Analysis of Deviance Table
Model 1: cbind(ypos, yneg) ~ cinturon + entorno + genero
Model 2: cbind(ypos, yneg) ~ cinturon + entorno * genero
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1
4
7.4645
2
3
4.4909 1
2.9736
0.0846
Model 2: cbind(ypos, yneg) ~ cinturon * entorno + genero
1
4
7.4645
2
3
3.5914 1
3.8730
0.0491
Model 2: cbind(ypos, yneg) ~ cinturon * genero + entorno
1
4
7.4645
2
3
7.3826 1
0.0818
0.7748
[1] 4.496567
> 1-pchisq(xpea,3)
[1] 0.2125967
[1] 3.580126
> 1-pchisq(xpea,3)
[1] 0.3105178
Pàg.
3- 61
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EXEMPLE 1
El siguiente paso consistiría en analizar los modelos con 2 interacciones entre los factores, ya que el
modelo A*C+D ajusta bien los datos, pero todavía deja una devianza de 3.5914 por explicar en 3 grados de
libertad, se podría dar por bueno el modelo.
Modelo
1 A*C+A*D
n-p Devianza
2
3.562410
∆D
Contraste
g.l.
2.2371
(M1) vs (M4)
1
Modelo
Interacción Cinturón-Entorno Y
Cinturón-Género :
η + α i + β j + γ k + αβij + αγ jk
2 A*D+C*D
2
4.371979
3.0467
(M2) vs (M4)
1
Interacción Cinturón-Género Y
Entorno-Género :
η + α i + β j + γ k + αγ ik + βγ jk
3 A*C+C*D
2
1.367022
0.04171
(M3) vs (M4)
1
Entorno-Género :
η + α i + β j + γ k + αβij + βγ jk
4 A*C+C*D+
A*D
1
η + α i + β j + γ k + αβij + αγ ik + βγ jk
1.325317
 El modelo no requiere de más análisis, no hay diferencias significativas entre el modelo con las 3
interacciones dobles y ninguno de los modelos con 2 pares de factores en interacciones.
Pàg.
3- 62
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
El siguiente paso consistiría en analizar los modelos con 2 interacciones entre los factores y compararlos
con el modelo aditivo, para ver si son significativas 2 interacciones dobles simultáneamente.
Modelo
1 A*C+A*D
n-p Devianza
2
3.562410
∆D
Contraste
g.l.
3.9021
(M1) vs (M4)
1
Modelo
Cinturón-Género :
η + α i + β j + γ k + αβij + αγ jk
2 A*D+C*D
2
4.371979
3.0925
(M2) vs (M4)
1
Interacción Cinturón-Género Y
Entorno-Género :
η + α i + β j + γ k + αγ ik + βγ jk
3 A*C+C*D
2
1.367022
6.0975
(M3) vs (M4)
1
Entorno-Género :
η + α i + β j + γ k + αβij + βγ jk
4 A+C+D
4
η + αi + β j + γ k
7.4645
 El modelo no requiere de más análisis, ya que simultáneamente son significativas 2 interacciones dobles
Cinturón-Entorno Y Entorno-Género.
Pàg.
3- 63
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
Comparando el mejor modelo con 1 interacción doble (Cinturón-Entorno) con el modelo que tiene 2
interacciones dobles (Cinturón-Entorno y Entorno-Genero) se cuantifica el p valor del contraste de la
devianza de la interacción Entorno-Género con un 0.14, por tanto, no significativa una vez que CinturónEntorno está en el modelo, pero con un valor incómodo.
Model 1: cbind(ypos, yneg) ~ cinturon * entorno + genero
Model 2: cbind(ypos, yneg) ~ cinturon * entorno + entorno * genero
1
3
3.5914
2
2
1.3670 1
2.2244
0.1358
>
Se propone para finalizar el análisis valorar el modelo con 2 interacciones dobles y el mejor modelo con 1
interacción doble según el criterio de información de Akaike y el método step() en R. Se prefiere mantener
las 2 interacciones dobles.
Al final se da una tabla resumen con la devianza residual y el AIC para todos los modelos que se han
calculado.
Pàg.
3- 64
Curs 2. 01 4- 2. 01 5
EXEMPLE 1
> acc.res<-step(acc.m34)
Start: AIC=82.7
cbind(ypos, yneg) ~ cinturon * genero * entorno
Df Deviance
AIC
- cinturon:genero:entorno 1
1.325 82.028
<none>
2.411e-12 82.702
Step: AIC=82.03
cbind(ypos, yneg) ~ cinturon + genero + entorno + cinturon:genero +
cinturon:entorno + genero:entorno
Df Deviance
AIC
- cinturon:genero
1
1.367 80.069
<none>
1.325 82.028
- genero:entorno
1
3.562 82.265
- cinturon:entorno 1
4.372 83.074
Step: AIC=80.07
cbind(ypos, yneg) ~ cinturon + genero + entorno + cinturon:entorno +
genero:entorno
Df Deviance
AIC
<none>
1.367 80.069
- genero:entorno
1
3.591 80.294
- cinturon:entorno 1
4.491 81.193
>
Pàg.
3- 65
Curs 2. 01 4- 2. 01 5
MLGz
EXEMPLE 1
Modelos
logit(πijk)
Devianza
n-p
AIC
1
η
1912.5
7
1981.2
Cinturón - A
η+ αi
1144.4
6
1215.1
Entorno - C
η+ βj
1192.8
6
1263.5
Género -D
η+ γk
1670.7
6
1741.4
A+D
η+ αi+ βj
795.82
5
868.52
A+C
η+ αi+ γk
411.02
5
483.73
D+C
η+ βj+ γk
911.01
5
983.71
AD
η+ αi+ βj+ (αβ)ij
795.32
4
870.03
AC
η+ αi+ γk+ (αγ)ik
408.31
4
483.01
A+D+C
η+ αi+ βj+ γk
7.4645
4
82.167
AD+C
η+ αi+ βj+ γk+ (αβ)ij
7.3826
3
84.085
AC+D
η+ αi+ βj+ γk+ (αγ)ik
3.5914
3
80.294
A+DC
η+ αi+ βj+ γk+ (βγ)jk
4.4909
3
81.193
AD+AC
η+ αi+ βj+ γk+ (αβ)ij+ (αγ)ik
3.5624
2
82.265
AD+DC
η+ αi+ βj+ γk+ (αβ)ij+ (βγ)jk
4.372
2
83.074
AC+DC
η+ αi+ βj+ γk+ (αγ)ik+ (βγ)jk
1.3670
2
80.07
AD+AC+DC
η+ αi+ βj+ γk+ (αβ)ij+ (αγ)ik+ (βγ)jk
1.3253
1
82.028
Pàg.
3- 66
Curs 2. 01 4- 2. 01 5
MLGz
3-8.
(FOX)
MLGz
EJEMPLO 2: PARTICIPACIÓN LABORAL FEMENINA EN CANADA
En 1977 se realizó una encuesta sociodemográfica a la población de Canadá. El modelo lineal generalizado
que se plantea investiga el análisis de la relación entre las mujeres jóvenes casadas que trabajan en función
de la existencia de hijos en el hogar, los ingresos de sus maridos y la región del país donde residen.
• La variable de respuesta es dicotómica: trabaja frente a no trabaja (para cada mujer joven casada que
interviene en el modelo). Originariamente en los datos la variable tiene 3 categorías, lo que será
aprovechado en un ejemplo del Tema 5.
• La presencia de hijos en el hogar es el factor A, que tiene 2 categorías (SI, NO). Categoría base: NO (la
constante corresponde al valor medio de la categoría NO).
• La región del Canadá es un factor politómico B, con 5 categorías. Los ingresos del marido (en miles de
dólares) es la covariable X.
• La intuición indica una interacción entre los ingresos de los maridos (X) y la presencia de hijos (A).
Pàg.
3- 67
Curs 2. 01 4- 2. 01 5
EJEMPLO 2 (FOX)
WOMEN'S LABOUR-FORCE PARTICIPATION DATASET, CANADA 1977
[1] OBSERVATION
[2] LABOUR-FORCE PARTICIPATION
fulltime = WORKING FULL-TIME
parttime = WORKING PART-TIME
not_work = NOT WORKING
OUTSIDE THE HOME
[3] HUSBAND'S IINCOME, $1000'S
[4] PRESENCE OF CHILDREN
absent
present
[5] REGION
Atlantic = ATLANTIC CANADA
Quebec
Ontario
Prairie = PRAIRIE PROVINCES
BC
= BRITISH COLUMBIA
Source: Social Change in Canada Project, York Institute for Social Research.
DATA:
1
2
…
253
254
255
256
257
…
263
ENDDATA
not_work
not_work
15
13
present
present
Ontario
Ontario
not_work
parttime
fulltime
not_work
fulltime
13
23
11
9
2
present
present
absent
absent
absent
Quebec
Quebec
Quebec
Quebec
Quebec
not_work
15
present
Quebec
Pàg.
3- 68
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EJEMPLO 2 (FOX)
La tabla contiene el análisis de la devianza para diversos modelos. El modelo más adecuado contiene X y A,
cuyo coeficiente negativo indican que ante la presencia de niños y mayores ingresos masculinos es menor la
incidencia del trabajo femenino.
Análisis de la Devianza
Modelo
p
Devianza o
∆Devianza
g.l.
LogVerosimilitud
Comentarios
Contraste
H 0 Accept.
0 1
1
¿?
39.609
7
0 vs 8
No
1 A
2
-162.279
4.826
1
1 vs 3
No
2 X
2
-175.528
31.324
1
2 vs 3
No
3 A+X
3
-159.866
2.43
4
3 vs 7
Si
4 A+B
6
-161.213
5.124
1
4 vs 7
No
5 B+X
6
-171.322
25.342
1
5 vs 7
No
6 A*X
4
-159.562
2.582
4
6 vs 8
Si
7 A+B+X
7
-158.651
0.76
1
7 vs 8
Si
8 B+A*X
8
-158.271
χ 12, 0.05 = 3.84
Pàg.
3- 69
Curs 2. 01 4- 2. 01 5
MLGz
EJEMPLO 2 (FOX)
 El contraste de M7 vs M8 indica que las interacciones entre los ingresos masculinos y la presencia de
niños no es estadísticamente significativa (Factor A).
 El contraste de M3 vs M7 indica que la región (Factor B) tampoco es estadísticamente significativa.
> summary(bm3)
Call: glm(formula = bwork ~ sons + income, family = binomial, data = womenlf)
Coefficients:
(Intercept) 1.33583
0.38376
3.481
0.0005 ***
sonspresent -1.57565
0.29226 -5.391
7e-08 ***
income
-0.04231
0.01978 -2.139
0.0324 *
AIC: 325.73
>
on 262
on 260
degrees of freedom
degrees of freedom
 Sin embargo, los efectos principales del Factor A (M1 vs M3) y de la covariable (M2 vs M3) son
estadísticamente significativos (se rechazan las correspondientes hipótesis nulas).
log
donde
πi
= 1.336 − 1.576 Factor Ai − 0.04231xi
1− πi
Factor Ai = 1 si hay presencia de niños y 0 de otro modo.
Pàg.
3- 70
Curs 2. 01 4- 2. 01 5
EJEMPLO 2 (FOX)
 El análisis de los residuos de la devianza frente a las probabilidades estimadas es:
3
absent
present
DRES1
2
1
0
-1
-2
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
EPRO1
Pàg.
3- 71
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EJEMPLO 2 (FOX)
 Los residuos de la devianza frente al leverage:
3
absent
present
DRES1
2
1
0
El valor medio del
leverage p/n es
0,06522 y el extremo
superior del intervalo a 2
y 3 veces la distancia es
0.16704 y 0.21795,
respectivamente.
-1
-2
0,0
0,1
0,2
HI1
Pàg.
3- 72
Curs 2. 01 4- 2. 01 5
EJEMPLO 2 (FOX)
 Los residuos son difíciles de interpretar en los modelos lineales generalizados!!!
residualPlots(bm3, layout=c(1, 3))
Pàg.
3- 73
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EJEMPLO 2 (FOX)
 El modelo propuesto no parece demasiado adecuado a los datos: el logit no es lineal a los ingresos!!!
 La distorsión y dificultad de comprensión de los gráficos ante un nivel de máxima desagregación de los
datos.
2
2
absent
present
1
OLOGIT6
OLOGIT6
1
absent
present
0
-1
0
-1
-2
-2
0
10
20
30
40
50
0
Income-X
10
20
30
40
Income-X
Pàg.
3- 74
Curs 2. 01 4- 2. 01 5
50
MLGz
EJEMPLO 2 (FOX)
• Los 2 gráficos muestran en la escala logit, la comparación entre valores empíricos (considerando una
categorización de INCOME-X cada 10 unidades y con etiquetas el número total de observaciones en la
clase de la covariable correspondiente) y ajustados con el modelo INCOME-X sin categorizar: hay un
problema serio de observaciones influyentes y no linealidad.
3
2
1
43
OLOGIT7
1
ELOGIT6
absent
present
absent
present
0
21
0
12
44
109
-1
-1
26
2
3
-2
-2
0
5
10
15
20
25
30
35
40
C_INCOMEX
0
10
20
30
40
50
Income-X
Pàg.
3- 75
Curs 2. 01 4- 2. 01 5
45
EJEMPLO 2 (FOX)
Pàg.
3- 76
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EJEMPLO 2 (FOX)
> options(contrasts=c("contr.treatment","contr.treatment"))
> bm3 <-glm( bwork~sons+income,family=binomial, data=womenlf )
> bm6 <-glm( bwork~sons*income, family=binomial, data=womenlf )
> anova(bm3,bm6,test='Chisq')
Model 1: bwork ~ sons + income
Model 2: bwork ~ sons * income
1
260
319.73
2
259
319.12 1 0.60831
0.4354
>
> ## Diagnosi
> library(car)
> residualPlots(bm3, layout=c(1, 3))
Test stat Pr(>|t|)
sons
NA
NA
income
1.226
0.268
> influenceIndexPlot(bm3, id.n=10)
> matplot(dfbetas(bm3),type='l')
# Cut-off dfbetas 2/arrel(n)
> abline(h=sqrt(4/(dim(womenlf)[1])),lty=3,col=6)
> abline(h=-sqrt(4/(dim(womenlf)[1])),lty=3,col=6)
> lines(sqrt(cooks.distance(bm3)),lwd=3,col=1)
>legend(locator(n=1),legend=c(names(as.data.frame(dfbetas(bm3))),"Cook
lty=c(3,3,3,1) )
Pàg.
3- 77
D"),
col=c(1:3,1),
Curs 2. 01 4- 2. 01 5
EJEMPLO 2 (FOX)
>
Pàg.
3- 78
Curs 2. 01 4- 2. 01 5
MLGz
EJEMPLO 2 (FOX)
> library(car)
> Anova(bm3)
Analysis of Deviance Table (Type II tests)
Response: bwork
LR Chisq Df Pr(>Chisq)
sons
31.3229 1 2.185e-08 ***
income
4.8264 1
0.02803 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> library(effects)
> plot(allEffects(bm3),ask=FALSE)
>
Pàg.
3- 79
Curs 2. 01 4- 2. 01 5
MLGz
MLGz
EJEMPLO 2 (FOX)
 El análisis de los residuos de la devianza frente al anclaje (leverage, id.n=x indica que se desea mostrar
las x observaciones más resaltadas de rstudent i hatvalues, x de cada uno):
77
76
1
120
-1
0
Studentized Residuals
2
> llista<-influencePlot(bm3,id.n=3,col=3,pch=19)
> womenlf[row.names(llista),c(8,3:4)]
bwork income
sons
76
work
38 present
77
work
38 present
89 not_work
35 absent
90 not_work
35 absent
91 not_work
35 absent
120
work
30 present
> llista
StudRes
Hat
CookD
76
2.044657 0.02949436 0.2573350
77
2.044657 0.02949436 0.2573350
89 -1.138115 0.05332491 0.1309859
90 -1.138115 0.05332491 0.1309859
91 -1.138115 0.05332491 0.1309859
120 1.871816 0.01866566 0.1709365
>
El valor medio del leverage p/n es 0,06522 y el
extremo superior del intervalo a 2 y 3 veces la
distancia es 0.16704 y 0.21795, respectivamente
0.01
91
90
89
0.02
0.03
0.04
Hat-Values
Pàg.
3- 80
Curs 2. 01 4- 2. 01 5
0.05
EJEMPLO 2 (FOX)
avPlots(bm3,labels=row.names(womenlf),id.method=abs(cooks.distance(bm3)), id.n=5)
marginalModelPlots(bm3,labels=row.names(df),id.method=abs(cooks.distance(bm3)), id.n=5)
Pàg.
3- 81
Curs 2. 01 4- 2. 01 5
MLGz

Document

Transcripción

Documentos relacionados

OFERTA DE EMPLEO PARA 1 INGENIERO SUPERIOR