Winter 2002 - HaPI Database

Comentarios

Transcripción

Winter 2002 - HaPI Database
••• ••• ••• ••• • ••• ••••••• •• ••• • ••• • ••••• •
-, The
BEHAVIORAL
MEASUREMENT
Letter
Behavioral.
Measurement
Database
Services
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Vol.
• • 7,• No.2
• •
Winter 2002
• • • • • • • • • • •••• • • • • •••••••••••••••• ••••• •••• • •••• • •
Enriching the health and behavioral sciences by broadening instrument access
Introduction to This Issue
This issue of The Behavioral Measurement
Letter has three featured articles: one describes a
process
through
which
measurement
instruments may be improved in various ways
and outlines a model for revising instruments;
another discusses self-estimated intelligence,
gender differences in overestimations
and
underestimations of intelligence, and possible
explanations for and consequences of these
differences; the third piece presents ten very
practical guidelines -- a list of "do's and don'ts"
-- for selecting measurement instruments that
will best meet one's needs.
In "Refinements to the Lubben Social Network
Scale: The LSNS-R," James Lubben, Melanie
Gironda, and Alex Lee describe in detail the
process of successive changes and analyses
thereof through which they modified the Lubben
Social Network Scale to improve it in various
ways. The process
involved performing
principal component analysis to determine
which items in the instrument contribute to
underlying factors being measured by the
instrument, and thus which items best measure
what they want the instrument to measure. This
analytical process resulted in a "cleaner and
meaner" version of the LSNS, the LSNS-R. In
addition, by detailing the process of successive
modifications and mathematical analyses that
produced the LSNS-R, the authors outline a
model for instrument revision that is broadly
applicable.
In a piece titled "Self-Estimated Intelligence,"
Adrian Furnham from London University
reviews literature pertaining to estimation of
intelligence by oneself and others, and then
Vol. 7, No.2, Winter 2002
compares, contrasts, and offers explanations for
the reported findings. The literature he reviewed
shows that there is a gender difference in
intelligence estimation -- females tend to
underestimate their IQ and that of other females,
while males, on the other hand, tend to
overestimate their IQ and that of other males,
and that this gender difference exists across
cultures
and nationalities,
across socioeconomic classes, and across age groups. He
then explores possible explanations for the
gender difference and suggests possible life
consequences of over- and under-estimating
one's abilities.
Our regular contributor, Fred Bryant, presents a
set of guidelines for choosing measurement
tools, "Ten Commandments for Selecting SelfReport Instruments." Some of the guidelines
may seem to be common-sense. Many readers
will be familiar with their basic content. Those
experienced in instrument selection may see
mistakes they've made in the past (and vowed
never to make again). In any case, all of our
readers, from neophytes to seasoned instrument
users, will find value in the piece -- if not
something they hadn't known previously, then,
certainly, practical reminders of the numerous
subtleties, cautions, and pitfalls that should be
attended to while selecting a measurement
instrument.
Address comments and suggestions to The
Editor, The Behavioral Measurement Letter,
Behavioral Measurement Database Services, PO
Box 110287, Pittsburgh, PA 15232-0787. If
warranted
and as space permits,
your
communication may appear as a letter to the
The Behavioral Measurement Letter
Introduction
(continued from page 1)
Refinements to the Lubben Social
Network Scale: The LSNS-R
editor. Whether published or not; your feedback
will be attended to and appreciated.
James Lubben, Melanie Gironda, and
Alex Lee
We also accept short manuscripts for The BML.
Submit, at any time, a brief article, opinion piece
or book review on a BML-relevant topic to The
Editor at the above address. Each submission
will be given careful consideration for possible
publication.
Increased interest in social support networks
during the past decade spawned the development of
many new social network scales. However,
psychometric properties of these instruments are
generally inadequately reported and there are few
reported attempts to refine them. The work
presented here addresses both of these concerns in
examining the psychometric properties of the
Lubben Social Network Scale (Lubben, 1988) and
creating a revised version, the LSNS-R. In
addition, the analytic plan of procedure used in this
work is a model that could be employed to evaluate
and refine other social and behavioral measures.
HaPI reading . . .
Al K. DeRoy, Editor
O'Reilly (1988) lamented that most social
support network assessment instruments have
inadequate clarity of definition and also lack
reported reliability
and validity statistics.
Similarly, a common criticism of many studies
that examine social support networks is that they
employ
instruments
with
unknown
or
unreported
psychometric
properties
(Winemiller, Mitchell, Sutliff, & Cline, 1993).
The work discussed below addresses these
concerns 'by examining psychometric properties
of the Lubben Social Network Scale (Lubben,
1988) and putting forth a refinement of the
LSNS, the LSNS-R (i.e., the "revised" version
of the LSNS). Given growing consensus on the
importance of social support networks to health
and well-being, developing and using consistent
tools to conduct assessments of such networks
are
becoming
ever
more
crucial
to
gerontological research and geriatric practice
(House, Landis, & Umberson, 1988; Steiner,
Raube, Stuck, Aronow, Draper, Rubenstein, &
Beck, 1996; Glass, Mendes, de Leon, Seeman,
& Berkman, 1997).
Staff
Director
Evelyn Perloff, PhD
Newsletter Editor
Al K. DeRoy, PhD
Database Coordinator
Barbara L. Brooks, MLS
IndexerlReviewer
JacquelineA.Reynolds,BA
Analyst
Janelle Price-Kelly, BA
Computer Consultant
Alfred A. Cecchetti, MS
Measurement Consultant
Fred B. Bryant, PhD
Information Specialist
Consultant
Ellen Detlefson, PhD
Business Services Manager . Diane Cadwell
Copy Assesser
Tracey Handlovic, BA
File Processor
Betty Hubbard
Phone: (412) 687-6850
Fax: (412) 687-5213
E-mail: [email protected]
Since the measuring device has been
constructed by the observer ...
we have to remember that what we observe
is not nature in itself but nature
exposed to our method of questioning.
Lubben Social Network Scale
The Lubben Social Network Scale (LSNS;
Lubben, 1988) has been used in a wide array of
studies, and in both research and practice
settings, since it was first reported more than a
decade ago (Lubben; Lubben, Weiler, & Chi,
1989; Siegel, 1990; Mor-Barak & Miller, 1991;
Mor-Barak, Miller, & Syme, 1991; Potts,
Hurwicz, Goldstein, & Berkanovic,
1992;
Werner Karl Heisenberg
The Behavioral Measurement Letter
2
Vol. 7, No.2, Winter 2002
Refinements to the Lubben Social Network Scale:
The LSNS-R (continued)
primary social relationships. Scores for each
LSNS item range from zero to five, with lower
scores indicating smaller networks. The scale
has been found to have relatively good internal
consistency among a widely diverse set of study
populations (ex = 0.70). Factor analyses on the
LSNS suggest that it measures three different
types of social networks: family networks,
friendship
networks,
and interdependent
relationships
(Lubben,
1988; Lubben &
Gironda, 1997).
Hurwicz & Berkanovic, 1993; Rubenstein, L. et
al., 1994; Rubenstein, R., Lubben, & Mintzer,
1994; Dorfman, Walters, Burke, Hardin, &
Karanik, 1995; Luggen & Rini, 1995; Lubben &
Gironda, 1997; Okwumabua, Baker, Wong, &
Pilgrim, 1997; Mor-Barak, 1997; Gironda,
Lubben, & Atchison, 1998; Chou & Chi, 1999;
Martire, Schulz, Mittelmark, & Newsom, 1999;
Mistry, Rosansky, McGuire, McDermott, &
Jarvik, 2001). Further, the LSNS has been
employed in a variety of ways, including use as
a control variable as well as an outcome variable
in health and social science studies. It also has
been used as a screening tool for health risk
appraisals and as a "gold" standard by which to
evaluate other social network assessment
instruments.
Methodology
The main purpose of the work presented here is
to address deficits that became apparent in the
original LSNS as it was used with diverse
populations over the past decade. But because
the LSNS was found to have relatively stable
reliability and validity across this wide array of
settings, any proposed modifications were not to
jeopardize the relatively strong psychometric
properties of the original LSNS.
The LSNS was developed as an adaptation of the
Berkman-Syme Social Network Index (BSSNI;
Berkman & Syme, 1979). Whereas the LSNS
was developed specifically for use among
elderly populations, the BSSNI was initially
developed for a study of an adult population that
purposefully excluded older persons. The LSNS
is based on items borrowed from questionnaires
used in the original epidemiological study for
which the BSSNI was constructed. However, the
LSNS excluded BSSNI items dealing with
secondary social relationships (viz., group and
church
membership)
because
these
organizational
participation
items showed
limited variance when used with older
populations, especially those having large
numbers of frail elderly persons (Lubben, 1988).
In contrast, the LSNS elaborated on an array of
items dealing with the nature of relationships
with family and friends, in view of the growing
body of empirical data suggesting that the
structure and functions of kinship and friendship
networks are particularly salient to the health
and well-being of older persons.
Reliability
is a fundamental
issue in
psychological measurement (Nunnally, 1978).
One important type of measurement reliability is
internal consistency, i.e., the extent to which
items within' a scale relate to the latent variable
being measured (DeVellis, 1991; Streiner &
Norman, 1995). Cronbach's (1951) coefficient
alpha was chosen to examine the internal
consistency of the LSNS and modifications
designed to improve upon the original version.
The acceptable range of coefficient alpha values
employed here was 0.70 to 0.90 (Nunnally;
DeVellis) because assessment instruments with
reliability scores higher than 0.90 are likely to
suffer from excessive redundancy, whereas those
with alpha less than 0.70 are likely to be
unreliable (Streiner & Norman). A further test of
item homogeneity used was the item-total test
score correlation
(DeVellis;
Streiner
&
Norman). Here acceptable values of the itemtotal score correlation were 0.20 and greater
(Streiner & Norman).
Principal
component
analysis
looks for
underlying (latent) components that account for
most of the variance of a scale (Stevens, 1992).
Principal component analysis with varimax
rotation was used here to explore the component
structure of various versions of the LSNS to see
The LSNS total scale score is computed by
summing ten equally weighted items that
quantify structural and functional aspects of
Vol. 7, No.2, Winter 2002
3
The Behavioral Measurement Letter
Refinements to the Lubben Social Network Scale:
The LSNS-R (continued)
they are not sure as to which aspect of the
double-barreled question they should respond
(DeVellis, 1991; Streiner & Norman, 1995).
Disaggregating double-barreled questions as per
the third objective not only helps respondents in
answering, it allows researchers to determine the
extent to which each part of the original question
helps to define a particular construct.
if the modified versions conformed in actuality
to the hypothesized structure. Although more
sophisticated methods exist to examine factor or
latent variable structures, such as maximum
likelihood factor analysis and confirmatory
factor analysis, many scholars contend that
principal component analysis is both adequate
and yet more practical than more sophisticated
techniques, for principal component analysis is
mathematically easier to manage, easier to
interpret, and yields results similar to those from
maximum likelihood factor analysis (Nunnally,
1978; Stevens). The size of the sample used in
the analyses discussed below is adequate to
conduct principal component analysis according
to general sample size guidelines (Stevens;
Guadagnoli & Velicer, 1988).
Plan and Procedures
Production of the LSNS- R progressed along a
series of four analytical steps to address the
objectives stated above. In each step, alpha
reliability coefficients and the results of a series
of principal component factor analyses were
examined to determine whether and the extent to
which items corresponded to the latent structural
components of family networks and friendship
networks.
Four Objectives Used in Refining the LSNS
The four steps are summarized in Figure 1. In
the first step, reliability statistics were obtained
for the original LSNS administered to the
sample described above. These values then
served as reference points for comparison with
values obtained for subsequent modifications.
The work to refine the LSNS has four principal
objectives. One was to distinguish between and
better specify the nature of family and friendship
social networks. A second was to replace, where
feasible, items in the original LSNS that have
small statistical variance. A third objective was
to disaggregate "double-barreled"
questions.
The fourth was to produce a parsimonious
instrument to encourage and facilitate its use in
research and practice settings where time
constraints or other issues preclude using longer
social support network instruments.
In the second analytical step, two items from the
original LSNS scale - L9 C'Helps others with
various
tasks")
and
L 10
C'Living
arrangements") - were dropped because they
demonstrated limited response variation among
a number of sample groups including the present
one. Furthermore, neither of these items helps to
distinguish
between family networks and
friendship networks better than the other items
in the original LSNS.
With regard to the first two objectives above, it
should be noted that social and behavioral
measures are purposely designed to discriminate
among groups for a certain construct, and so
lack of variation within a given item limits a
scale's ability to identify and discriminate
among variations (DeVellis, 1991; Streiner &
Norman, 1995). Thus, eliminating items with
limited statistical variance generally increases a
scale's overall sensitivity and specificity, and
that in turn improves its effectiveness in
measuring constructs of interest (McDowell &
Newell, 1987; DeVellis).
The L9 item was originally included in the
LSNS in part because social exchange theory
suggests that a reciprocal social relationship is
stronger than one that is unidirectional (Jung,
1990; Burgess & Huston, 1979). Thus, rather
than only capturing what others do for the older
person being assessed, it is desirable to include
items that also assess what the older person does
for other people, i.e., items should be included to
assess reciprocity of social support within
kinship and friendship networks. Moreover, in
past studies L9 generally
demonstrated
insufficient item variance and thus was a ,good
candidate for elimination or replacement.
Double-barreled items are those in which two
different questions are contained in one item.
Such items often confuse respondents because
The Behavioral Measurement Letter
4
Vol. 7, No.2, Winter 2002
Refinements to the Lubben Social Network Scale:
The LSNS-R (continued)
with family members may serve different
functions than confidant relationships with
friends (Keith, Hill, Goudy, & Power, 1984).
The final result of the four-step process is a
revised version of the LSNS, the LSNS-R.
The LlO item on living arrangements also had
not worked out well over time. When the
original LSNS was constructed, both living
arrangements and marital status were common
items included in measures of social support
networks.
It therefore
seemed
entirely
appropriate to include an item merging these
two related constructs. However, the LlO item
has been the worst performing item on the LSNS
across different settings. Part of the problem has
been scoring it, which is constrained by the
limited number of response options available as
well as by disagreements among scorers in
assigning ordinal weights to specific response
options. Perhaps most important, marital status
and living arrangements are generally not
malleable nor appropriate for intervention, so
items concerning them should not be included in
any case.
Plan of Analysis
Step 1:Analyze original LSNS
Original LSNS Items:
Ll Family: Number seen or heard from per month
L2 Family: Frequency of contact with family
member most in contact
L3 Family: Number feel close to, talk about private
matters, call on for help
L4 Friends: Number feel close to, talk about private
matters, call on for help
L5 Friends: Number seen or heard from per month
L6 Friends: Frequency of contact with friend most
in contact
L7 Confidant: Has someone to talk to when have
important decision to make
L8 Confidant: Others talk to respondent when they
have important decision to make
L9 Helps others
Ll 0 Living arrangements
Step 2: Eliminate items with limited variation
In the third analytical step, two "doublebarreled" questions were disaggregated. The two
items, L3 (relatives) and L4 (friends) ask, "How
many (relatives) (friends) do you feel close to?
That is, how many of them do you feel at ease
with, can talk to about private matters, or call
on for help?" In this step, L3 and L4 were each
recast as two distinct questions. One asks, "How
many (family members) (friends) do you feel at
ease with such that you can talk to them about
private matters?" whereas the other asks, "How
many (family members) (friends) do you feel
close to such that you can call on them for
help?" The first of these substitute questions
examines somewhat intangible or expressed
support, whereas the other taps into more
tangible support, such as help with running an
errand. Both types of support have been
suggested as important aspects of social support
networks (Litwak, 1985; Sauer & Coward,
1985).
Items eliminated: L9 and Ll 0
Step 3: Uncouple double-barreled questions
Items modified: L3 and L4 each split into two
separate questions
L3A Family: Number feel at ease with whom you
can talk about private matters
L3B Family: Number feel close to whom you can
call on for help
L4A Friends: Number feel at ease with whom you
can talk about private matters
L4B Friends: Number feel close to whom you can
call on for help
Step 4: Distinguish between source and target
confidant relationships with family and friends
Items modified: L7 and L8 each split into separate
questions for family and friends
L7A Family: Respondent functions as confidant to
other family members
L7B Friends: Respondent functions as confidant to
friends
L8A Family: Respondent has family confidant
L8B Friends: Respondent has friend who is a
confidant.
In the fourth step, items that identify both the
targets and sources of respondents' confidant
relationships were constructed and tested. Here
the two confidant relationship items in the
original LSNS (L7 and L8) were recast to
distinguish between confidant relationships with
family members and those with friends. These
changes recognize that confidant relationships
Vol. 7, No.2, Winter 2002
Data Source
The data are from a survey of older white, nonHispanic Americans in Los Angeles County,
California done between June and November
1993. A self-weighting, multistage probability
5
The Behavioral Measurement Letter
Refinements to the Lubben Social Network Scale:
The LSNS·R (continued)
indicates that disaggregation
of "doublebarreled" questions (items L3 and L4) was quite
beneficial. In Step 4, the product resulting from
including items that distinguish between source
and target confidant relationships with family
members and those with friends had further
increased internal consistency.
sample was selected from 861 census tracts in
the area in which the white, nonHispanic
population exceeded any other single racial or
ethnic group. This sampling strategy insured a
high level of homogeneity in the sample.
Table 1
Cronbach's Alpha Value by Product of Each
Step of Analysis
The first three sampling stages were: (a) random
selection of tracts, (b) random selection of
blocks, and (c) random selection of households
within selected blocks. Households were then
contacted by telephone to determine the age and
ethnicity of household members. All white, nonHispanic persons aged 65 or over in each
household were thus identified and then
potential participants were randomly selected
from this pool. Of the 265 older persons thus
selected, 76 percent agreed to be interviewed,
resulting in a final sample of 201. The sample
included 130 women (65%) and 71 men (35%)
and had a mean age of 75.3. Additional details
on the sample are reported elsewhere (Villa,
Wallace, Moon, & Lubben, 1997; Moon,
Lubben, & Villa, 1998; Pourat, Lubben,
Wallace, & Moon, 1999; Pourat, Lubben, Yu,
&Wallace, 2000).
S~
a
1. Original LSNS; LO-item scale
2. Items L9 and LIO dropped; 8-item scale results
3. L3 & L4 split; l O-item scale results
4. L7 & L8 split; 12-item scale (LSNS-R) results
.66
.67
.73
.78
Factor Analysis
Principal component factor analyses were
performed for each step of the revision to
explore for latent factors and to determine
whether the final modified version has latent
structural components corresponding to both
kinship and friendship networks. The number of
factors found in each step of the analysis was
determined
by considering
factors with
eigenvalues over one (Kaiser, 1960) and by
identifying the elbow in the screenplot tests
(Cattell, 1966). The factor loadings were
subjected to varimax rotation.
Results
Cronbach Alpha Values for the Products
of Each Step
Table 1 presents internal consistency values for
the products of the four analytical steps. The
Cronbach alpha value for the original 10-item
LSNS scale administered to the present sample
(ex = 0.66) is slightly lower than those previously
reported (Lubben, 1988; Lubben & Gironda,
1997) and below the desired standard for
internal consistency. The Cronbach alpha values
increased in each subsequent step in the
analysis, indicating
that each successive
modification contributed to improving the final
product's internal consistency. Although the
variant produced in Step 2 has two items less
than the original LSNS, the alpha value is
slightly higher than that for the original LSNS,
suggesting that dropping items L9 and LIO was
appropriate.
Table 2 shows the rotated factor matrix for the
original LSNS administered to the present
sample. Although previous studies have reported
three factors (Lubben, 1988; Lubben & Gironda,
1997), the rotated factor structure for the LSNS
here showed a two-factor solution, with one
factor consisting largely of family-related items
and the other consisting primarily of items
concerning friendships. However, the former,
i.e., the family factor, also incorporates items
concerning confidant relationships
and the
"helps others" items. Both confidant items
clearly load onto the family factor in this step,
but the "living arrangements" item cross-loads
on both the family and friend factors.
Further, the greatly improved alpha value
obtained for the variant produced in Step 3
The Behavioral Measurement Letter
6
Vol. 7, No.2, Winter 2002
Refinements to the Lubben Social Network Scale: The LSNS-R (continued)
Table 2
Step 1: Original 1O-Item LSNS Factor Matrix
L3
L8
Ll
L2
L9
L7
Item
Family: discuss private matterslcall on for help
Is confidant
Family: number in contact
Family: frequency of contact with family member most in contact
Helps others
Has confidant
Family
Factor
.7298
.6743
.6633
.6629
.5702
.5391
Friend
Factor
.1242
.0377
-.0639
-.0761
-.0269
.2047
L4
L5
L6
LlO
Friends: discuss private matters/call on for help
Friends: number in contact
Friends: frequency of contact with friend most in contact
Living arrangements
.2606
.2072
-.1082
-.3490
.7807
.7520
.5225
.4461
L3
L2
Ll
L8
L7
Item
Family: discuss private matterslcall on for help
Family: frequency of contact with family member in contact
Family: number in contact
Is confidant
Has confidant
Family
Factor
.7722
.7393
.6807
.6455
.5622
Friend
Factor
.1255
-.1103
-.0218
.0850
.2040
L4
L5
L6
Friends: discuss private matters/calIon for help
Friends: number in contact
Friends: frequency of contact with friend most in contact
.2105
.1514
-.1230
.8162
.7810
.5558
Table 3
Step 2: Factor Matrix after Eliminating Items L9 and L10
Table 4
Step 3: Factor Matrix after Disaggregating Items L3 and fA
Ll
L3A
L3B
Item
Family: number in contact
Family: discuss private matters
Family: call on for help
Family
Factor
.8133
.8020
.7775
Friend
Factor
.0029
.0959
.1226
L4B
L5
L4A
L6
Friends:
Friends:
Friends:
Friends:
.0522
.1873
.1632
-.1817
.7984
.7653
.7504
.4877
.1707
-.0319
.0523
.0685
L7
L8
L2
Has confidant
Is confidant
Family: frequency of contact
.0425
.2365
.4253
.1524
.1647
-.1524
.8276
.6689
.6166
call on for help
number in contact
discuss private matters
frequency of contact
"confidant" items were found to clearly load on
the family factor. No heavy cross-loading was
found for the scale variant produced in this step.
In the second step, items L9 and LlO were
eliminated as planned due to their general poor
performance in previously discussed studies.
Similar problems are demonstrated in the
current study by the heavy cross-loading found
for LlO in Step 1. As in Step 1, a two-factor
structure was found (Table 3) and the
Vol. 7, No.2, Winter 2002
Confidant
Factor
.0604
.2442
.2545
Table 4 shows the factor structure found in Step
3. In this step, the double-barreled items L3 and
L4 ("talk about private matters" and "callon
7
The Behavioral Measurement Letter
.~-----~----~
Refinements
to the Lubben Social Network Scale: The LSNS-R
-.----
(continued)
Table 5
Step 4: LSNS-R Factor Matrix
Family
Factor
Item
L3B
L8A
L7A
L3A
L2
Ll
L8B
L7B
L6
L5
L4B
L4A
Family:
Family:
Family:
Family:
Family:
Family:
call on for help
has confidant
is confidant
discuss private matters
frequency of contact with family member most in contact
number in contact
Friends: has confidant
Friends: is confidant
Friends: frequency of contact with friend most in contact
Friends: number in contact
Friends: call on for help
Friends: discuss private matters
-.0352
.0848
.2490
-.0292
.0514
-.1576
.2327
-.0024
-.0384
.2320
-.1438
.1565
.0487
.1907
-.1428
.8800
.8488
.5493
.1279
.1140
.1856
.0915
.0839
.0922
.0594
.2711
.4920
.8467
.7663
.6028
the original
LSNS and
Conclusion
As gerontologists and geriatricians begin to
identify means to increase active life expectancy
rather than mere life expectancy, it is likely that
older persons' social support networks will be
shown to. be necessary to healthy aging. This
means that there is increasing need for a variety
of reliable and valid social support scales for use
in research and practice settings. The work
discussed here should be viewed as part of an
ongoing pursuit of such well-constructed social
integration scales, for improved measures of
social support networks are essential to
understanding better the reported link between
social integration and health. Such improved
knowledge
thus
will
enhance
future
gerontological research, geriatric care, and the
quality of life of the elderly.
Step 4 involved distinguishing family and
friends as both possible sources and possible
targets of confidant relationships, and resulted in
the LSNS-R. Principal component
factor
analysis in this step (Table 5) revealed a single,
clean family factor and two friendship factors.
The friendship confidant items (L7B, L8B) and
the frequency of contact with a friend item (L2)
constitute one of the friendship factors, while the
remaining friendship items make up a second
friendship factor. The item on being able to talk
to a friend about private matters (L4A) loads on
both friendship factors.
From applied research and clinical perspectives,
there is growing pressure to develop short and
efficient scales. Some elderly populations are
unable to complete long questionnaires, and
time constraints in most clinical practice settings
necessitate use of efficient and effective
screening tools. Shorter scales require less time
and energy of both the administrator and
respondent. Thus, parsimonious and effective
screening tools are needed that are acceptable to
elders, researchers, and health care providers as
well.
Item-Total Scale Correlations
Item-total scale correlational analysis yielded
coefficients
ranging from 0.27 to 0.75,
indicating that LSNS-R items are sufficiently
homogeneous
and
without
excessive
redundancy. All internal reliability coefficients
fell within the acceptable range suggested by
Steiner and Norman (1995). The correlation
The Behavioral Measurement Letter
Friend
Factor B
.7600
.7402
.7358
.7345
.7134
.6712
coefficient between
LSNS-R was 0.68.
for help") were disaggregated.
Principal
component factor analysis performed here
indicated a three-factor solution, with items
loading on a family factor, a friendship factor,
and a confidant factor. Generally the factors are
clean (i.e., for each item, there is predominant
loading on one factor and little loading on the
others). However, the family "frequency of
contact" item (L2) cross-loads on both the
family and confidant factors.
Friend
Factor A
8
Vol. 7, No.2, Winter 2002
Refinements to the Lubben Social Network Scale: The
LSNS-R (continued)
In summary, development and validation of
social support
network
instruments
are
cumulative and ongoing processes. They require
testing and retesting with diverse populations, in
various research and practice settings, and using
both psychometric and practical standards to
assess their actual utility. In addition, future
analyses of these scales should include
assessment of their sensitivity to various
differences within and between groups, for
example,
cultural
and socio-demographic
differences, or differences in levels of health and
functional status that might affect response
patterns.
For more basic researchers, somewhat longer
research instruments are desirable, because
having a larger number of items as well as better
clarity of concepts generally contributes to a
scale's reliability and facilitate the analysis and
appraisal of subtle differences. But rather than
attempting to design a single social support
network scale for use with all elderly
populations and for all research purposes, it
seems far more practical to design measurement
instruments for specific populations along with
clear indications of how they should be used
(Mitchell & Trickett, 1980). Use of such welltargeted social support network assessment
instruments will yield more valid research
results
than use of less well-targeted
measurement tools.
The four-step process described above continues
in the tradition of instrumentation refinement,
tailoring new measurement tools to address
special needs as well as offering a revised
version
of
the
LSNS
incorporating
modifications
that
greatly
enhance
its
psychometric properties. This work also offers a
paradigm that could be employed to evaluate
and refine other social support network
measurement tools.
The original LSNS was designed specifically for
an elderly population. Although it has proven
adaptable to a variety of settings, some
deficiencies have been noted over the past ten
years of use. The LSNS-R addresses these
problems, resulting in an improved measure of
social support networks. The refinements are
theory driven and involved reworking items in
the original LSNS so that the revised scale can
better measure the distinct aspects of kinship
and friendship networks (Lubben & Gironda, in
press). An abbreviated version of the LSNS-R
has been developed and is reported elsewhere
(Lubben & Gironda, 2000). This six-item scale
(the LSNS-6) can be especially suitable in
practice settings as a screening tool for social
isolation or for more general use in those
research settings where longer social support
network scales cannot be accommodated. For
those social and behavioral researchers desiring
more extensive inquiry into the nature of social
relationships of the elderly, an 18-item version
of the LSNS has also been developed (Pourat, et
al., 1999; Pourat, et al., 2000). The major
advantage of the LSNS-18 over the LSNS-R is
that the former distinguishes friendship ties with
neighbors from those with friends who do not
live in close proximity to the respondent. Such
distinctions are desirable for exploring a
growing number of social and behavioral
research questions regarding the functioning of
social support networks.
Vol. 7, No.2, Winter 2002
References
Berkman, L.P., & Syme, S.L. (1979). Social networks,
host resistance, and mortality: A nine-year follow'-up
study of Alameda County residents. American Journal of
Epidemiology, 109, 186-204.
Burgess, R.I. & Huston,
exchange in developing
Academic Press.
T.L. (Eds.) (1979). Social
relationships.
New York:
Cattell, R.B. (1966). The meaning and strategic use of
factor analysis. In R.B. Cattell (Ed.), Handbook of
multivariate experimental psychology. Chicago: Rand
McNally.
Chou, K.L. & Chi, I. (1999). Determinants
of life
satisfaction in Hong Kong Chinese elderly: A longitudinal
study. Aging and Mental Health, 3:328-335.
Cronbach, L.I. (1951) Coefficient alpha and the internal
structure of tests. Psychometrika, 16,297-334.
DeVellis, R.P. (1991). Scale development:
applications. Newbury Park, CA: Sage.
Theory and
Dorfman, R., Lubben, J.E., Mayer-Oakes, A., Atchison,
K.A., Schweitzer, S.O., Dejong, P., et aI., (1995).
Screening for depression among the well elderly. Social
Work, 40, 295-304.
9
The Behavioral Measurement Letter
Refinements to the Lubben Social Network Scale: The
LSNS-R (continued)
Luggen, A.S., & Rini, A.G. (1995). Assessment of social
networks and isolation in community-based elderly men
and women. Geriatric Nursing, 16, l79-181.
Gironda, M.W, Lubben, J.E., & Atchison, KA. (1998).
Social support networks of elders without children.
Journal of Gerontological Social Work, 27, 63-84.
Martire, L.M., Schulz, R., Mittelmark, M.B., & Newsom,
J.T (1999). Stability and change in older adults' social
contact and social support: The Cardiovascular Health
Study. Journals of Gerontology: Series B: Psychological
and Social Sciences, 54B, S302-S311.
Glass, TA., Mendes de Leon, C.P., Seeman, TE.,
Berkman, L.P. (1997). Beyond single indicators of social
networks: A LISREL analysis of social ties among the
elderly. Social Science and Medicine, 44, 1503-1507.
McDowell, I., & Newell, e. (1987). Measuring health: A
guide to rating scales and questionnaires. New York:
Oxford University Press.
Guadagnoli, E., & Velicer, W (1988). Relation of sample
size to the stability of component patterns. Psychological
Bulletin, 103, 265-275.
Mistry, R., Rosansky, J., McGuire, J., McDermott, c.,
Jarvik, L., and the UPBEAT Collaborative Group (2001).
Social isolation predicts re-hospitalization in a group of
older American veterans enrolled in the UPBEAT
Program. International Journal of Geriatric Psychiatry,
16,950-959.
House, J.S., Landis, KR., & Umberson, D. (1988). Social
relationships and health. Science, 241, 540-545.
Hurwicz, M.L., Berkanovic, E. (1993). The stress process
in rheumatoid arthritis. Journal of Rheumatology, 20, 11.
Mitchell, R., & Trickett, E. (1980). Social networks as
mediators of social support: An analysis of the effects and
determinants of social networks. Community Mental
Health Journal, 16, 27-44.
Jung, J. (1990). The role of reciprocity in social support.
Basic and Applied Social Psychology, 11, 243-253.
Moon, A., Lubben, J.E., & Villa, VM. (1998). Awareness
and utilization of community long term care services by
elderly Korean and nonHispanic White Americans.
Gerontologist, 38, 309-316.
Kaiser, H.P. (1960). The application
of electronic
computers
to factor
analysis.
Educational
and
Psychological Measurement, 20, 141-151.
Keith, P.M., Hill, K, Goudy, W.J., & Power, E.A. (1984).
Confidants and well-being: A note on male friendship in
old age. Gerontologist, 24, 318-320.
Mor-Barak, M.E. (1997). Major determinants of social
networks in frail elderly community residents. Home
Health Care Services Quarterly, 16, 121-137.
Litwak,
E. (1985).
Helping
the elderly:
The
complementary role of informal networks and formal
systems. New York: Guilford.
Mor-Barak, M.E., Miller, L.S., & Syme, L.S. (1991).
Social networks, life events, and health of the poor frail
elderly: A longitudinal study of the buffering versus the
direct effect. Family Community Health, 14, (2), 1-13.
Lubben, J.E. (1988). Assessing social networks among
elderly populations. Family Community Health, 11, (3),
42-52.
Mor-Barak, M.E., & Miller, L.S. (1991). A longitudinal
study of the causal relationship between social networks
and health of the poor frail elderly. Journal of Applied
Gerontology, 10,293-310.
Lubben, J.E., & Gironda, M. (1997). Social support
networks among older people in the United States. Litwin
H. (Ed.) The Social Networks of Older People. Wesport,
CT: Praeger.
Nunnally, J.e. (1978). Psychometric
New York: McGraw-Hill.
Lubben, J.E. & Gironda, M.W (2000). Social Support
Networks. Osterweil, D., Beck, J., & Brummel-Smith, K
(Eds.) In Comprehensive Geriatric Assessment: A Guide
for Healthcare Providers, New York: McGraw-Hill.
Okwumabua, J.O., Baker, P.M., Wong, S.P., & Pilgrim,
B.O. (1997). Characteristics of depressive symptoms in
elderly urban and rural African Americans. Journals of
Gerontology: Biological Sciences and Medical Sciences.
52A, M241-M246.
Lubben, J.E., & Gironda, M.W (in press). Centrality of
social ties to the health and well-being of older adults. In
B. Berkman & L. Harooytan (Eds.). Gerontological social
work in the emerging health care world. New York:
Springer.
O'Reilly, P. (1988). Methodological
issues in social
support and social network research. Social Science
Medicine, 26, 863-873.
Potts, M.K., Hurwicz, M.L., Goldstein,
M.S., &
Berkanovic, E. (1992). Social support, health-promotive
beliefs, and preventive health behaviors among the
elderly. Journal of Applied Gerontology, 11, 425-440.
Lubben, J.E., Weiler, P.G., Chi, I. (1989). Gender and
ethnic differences in the health practices of the elderly
poor. Journal of Clinical Epidemiology, 42, 725-733.
The Behavioral Measurement Letter
theory (2nd ed.).
10
Vol. 7, No.2, Winter 2002
Refinements to the Lubben Social Network Scale: The
LSNS-R (continued)
James Lubben, DSW, MPH is Professor of
Social Welfare and Urban Planning at the
University of California, Los Angeles
(UCLA). Both his DSW and MPH are from
the University of California, Berkeley. He is
Principal Investigator for the Hartford
Doctoral Fellows Program in Geriatric
Social Work, a program administered by
the Gerontological Society of America, and
a consultant to the World Health
Organization-Kobe Centre on health and
welfare systems development for aging
societies. Dr. Lubben's research examines
social behavioral determinants of vitality in
old age, with a particular focus on the roles
of social support networks.
Pourat, N., Lubben, J., Yu, H., & Wallace, S. (2000).
Perceptions of health and use of ambulatory care:
Differences between Korean and White elderly. Journal of
Aging and Health, 12, (1), 112-134.
Pourat, N., Lubben, J., Wallace, S., & Moon (1999).
Predictors of use of traditional Korean healers among
elderly Koreans in Los Angeles. Gerontologist, 39, 711719.
Rubenstein, L.Z., Aronow, H.U., Schloe, M., Steiner, A.,
Alessi, C.A., Yuhas, K.E., Gold, M., Kemp, K.,
Nisenbaum, R, Stuck, A., & Beck, r.c. (1994). A homebased geriatric assessment:
Follow-up
and health
promotion program: Design, methods, and baseline
findings from a 3-year randomized clinical trial. Aging, 6,
105-120.
Melanie Gironda, PhD, MSW is a Lecturer
in the Department of Social Welfare at
UCLA where she teaches courses on social
gerontology and research on aging. She
received both her PhD and MSW from
UCLA. Dr. Gironda is Deputy Program
Director of the Hartford Doctoral Fellows
Program in Geriatric Social Work. Her
research examines loneliness in various
populations of the elderly, with a special
focus on the nature of social support
networks of older adults without children.
Rubenstein RL., Lubben, J.E., & Mintzer, J.E. (1994).
Social isolation
and social support: An applied
perspective. Journal of Applied Gerontology, 13, 58-72.
Sauer, W'L, & Coward, RT. (1985). Social support
networks and the care of the elderly: Theory, research and
practice. New York: Springer.
Siegel, J .M. (1990). Stressful life events and use of
physician services among the elderly: The moderating role
of pet ownership. Journal of Personality and Social
Psychology, 58, 1081-1086.
Stevens, J. (1992). Applied multivariate statistics for the
social sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Alex E. Y. Lee, PhD, MSW is Assistant
Professor in the Department of Social Work
and Psychology at the National University
of Singapore where he teaches courses on
social work and gerontology. He received
both his PhD and MSW from UCLA. His
current research concerns social service
delivery systems for the elderly and familybased gerontological counseling for Asian
elders.
Steiner, A., Raube, K., Stuck, A.E., Aronow, H.U., Draper,
D., Rubenstein, L.Z., & Beck, J. (1996). Measuring
psychosocial aspects of well-being in older community
residents: Performance of four short scales, Gerontologist,
36,54-62.
Streiner, D.J., & Norman,
G.R. (1995). Health
measurement
scales: A practical
guide to their
development and use (2nd ed.). New York: Oxford
University Press.
Villa, Y.M., Wallace, S.P., Moon, A., & Lubben, J.E.
(1997). A comparative analysis of chronic disease
prevalence among older Koreans and nonHispanic Whites.
Journal of Family and Community Health, 20, 1-12.
Winemiller, D.R., Mitchell, M.E., Sutliff, J., & Cline, D.1.
(1993). Measurement strategies in social support: A
descriptive review of the literature. Journal of Clinical
Psychology, 49, 638-648.
In small proportions, we just beauties see,
And in short measures life may perfect be.
Ben Jonson
Vol. 7, No.2, Winter 2002
11
f~~~~~,&,F
I
The Behavioral Measurement Letter
Self- Estimated Intelligence
equality with their mothers, men with their
fathers and men superior to their mothers.
Mothers therefore come out as inferior to
fathers. This pattern has been consistent each
year." Beloff argued that the modesty training
girls receive in socialization accounts for these
data.
Adrian Furnham
Do beliefs about one's ability affect scores on an
ability test? Is it important to believe that you are
bright to do well on an abilitylintelligence test?
What do people think about their own
intelligence? How do they estimate their own
intelligence and that of their relatives? What
determines their self-estimates?
Beloff's research stimulated others to replicate
these gender-related
differences
in selfestimated intelligence (Bennett, 1996, 1997,
2000; Byrd & Stacey, 1993; Furnham & Rawles,
1995, 1999). These studies were in four related
research areas: (a) studies to replicate and
explain these differences in various countries
and relate nationallcultural variables to them; (b)
research
to see if these gender-related
differences can be replicated not only for overall
(general) intelligence, but also for more specific
and multiple intelligences; (c) work to examine
whether these differences occur in estimations
of the intelligence of others, notably male and
female members of one's immediate family
Studies of self-estimated intelligence date back
nearly a quarter of a century (Hogan, 1978).
More recently Beloff (1992) provoked a great
deal of interest in the effects of gender
differences
on self-estimated
intelligence.
Among her Scottish undergraduates, she found a
six-point difference, with males estimating their
score significantly
higher than females.
Additional work led her to conclude that "The
younger women see themselves as intellectually
inferior compared to young men ....Women see
Table I
Results of Studies Where Participants Made an Overall IQ (g) Rating on Themselves and Others.
Study
Women
Men
Beloff (1992) - Scotland
Self
Mother
Father
(N = 502)
120.5
119.9
127.7
(N = 265)
126.9
118.7
125.2
6.4
-1.2
-2.5
Byrd & Stacey (1993) - New Zealand
Self
Mother
Father
Sister
Brother
(N = 105)
12l.9
114.5
127.9
118.2
114.1
(N = 112)
121.5
106.5
122.3
110.5
116.0
-0.4
-9.0
-5.6
-7.7
1.9
Bennett (1996) - Scotland
Self
(N = 96)
109.4
(N = 48)
117.1
7.7
Reilly & Mulhern (1995) - Ireland
Self
Measured
(N = 80)
105.3
106.9
(N = 45)
113.9
106.1
8.6
-0.8
Furnham & Rawles (1995) - England
Self
Mother
Father
(N = 161)
118.48
108.70
114.18
(N = 84)
123.31
109.12
116.09
6.17
0.72
1.91
Furnham & Rawles (1996) - England
Self
(N = 140)
116.64
(N = 53)
120.50
3.9
Furnham & Gasson (1998) - England
Self
Male child (1st child)
Female child (1st child)
(N = 112)
103.84
107.69
102.57
(N = 72)
107.99
109.70
102.36
4.15
2.01
-0.21
Furnham & Reeves (1998) - England
Self
Male child (1st son)
(N = 84)
104.84
116.09
(N = 72)
110.15
114.32
5:31
-1.77
The Behavioral Measurement Letter
12
Difference
Vol. 7, No.2, Winter 2002
Self-Estimated
Intelligence
(continued)
standard deviations above the mean (Furnham,
Clark, & Bailey, 1999) - around 120, whereas
nonstudent British adults believed their IQ is to
be around a half standard deviation above the
norm (Furnham, 2000).
(grandparents, parents, siblings, children); (d)
research to examine the relationship between
estimated intelligence and "actual" intelligence
scores. Furnham (2001) reported on various
such studies, all but one of which showed
significant gender differences in self-rated
overall IQ ranging from 3.9 to 8.6 points.
Multiple Intelligence
Over the past decade there have been many
attempts to redefine intelligence and define
different types of intelligence. Thus we have
emotional intelligence as well as practical
intelligence, for example, but perhaps the idea
that has appealed most to lay people (not
academics, however) is Gardner's
(1999)
multiple intelligence theory. Gardner (1983)
initially argued that there were seven types of
intelligence:
• Verbal or linguistic intelligence (the ability
to use words),
• Logical or mathematical intelligence (the
ability to reason logically, to solve number
problems),
• Spatial intelligence (the ability to find one's
way around one's environment and form
mental images),
• Musical intelligence (the ability to perceive
and produce pitch and rhythm),
• Body-kinetic intelligence (the ability to
carry out motor movements, for example, in
performing
surgery, ballet dancing, or
playing sports),
• Interpersonal intelligence (the ability to
understand people),
• Intrapersonal intelligence (the ability to
understand oneself and develop a sense of
one's identity).
He suggested that the first two are those types
valued at school, the next three are valuable in
the arts, and the latter two constitute a sort of
"emotional intelligence."
Beginning with Beloff (1992), a number of these
studies looked at estimates of relations' IQs
(Furnham, Fong, & Martin, 1999). These studies
showed that people believe there are clear
generational effects in IQ - they believe that
they are a little brighter than their mothers and
certainly much brighter than their grandparents,
while parents tend to believe their children are
brighter than they are. In general, a half standard
deviation (6-8 points) difference was found in
estimated IQ between generations. Gender
differences were also found for estimated IQ of
relatives (Flynn, 1987). Thus people believe
their grandfathers
are brighter than their
grandmothers, their fathers brighter than their
mothers, their brothers brighter than their sisters,
and their sons brighter than their daughters.
Interestingly
for parents estimating
their
children, the results were stronger for first-borns
compared to those born later, indicating the
possible
working
of the principle
of
primogeniture.
These results have been shown to be crossculturally robust as the gender difference effect
has been demonstrated in Asia (Japan, Hong
Kong), Africa (Uganda, South Africa), Europe
(Belgium, Britain, but not Slovakia) and
America (Furnham, et al., 2002) as well as New
Zealand (Furnham & Ward, 2001) and Iran
(Furnham, Shahidi, & Baluch, 2002). Thus there
seems to be a robust and cross-culturally valid
finding that there is a clear, consistent gender
difference in self-rating of overall intelligence,
with males rating themselves and their male
relations higher than females rate themselves
and their female relations.
At least eight studies have examined gender
differences in estimates of Gardner's (1983,
1999) seven multiple intelligences. The results
have been consistent and give an important clue
into gender differences in overall performance.
Most people rate their own interpersonal and
intrapersonal intelligence as very high, about 1
SD (standard deviation) above the mean, and
their musical and body kinetic intelligence as
strictly average, around 100. Of the remaining
three types of intelligence in the Gardner model
It is also interesting to note that it is very rare for
people to rate their scores, or indeed those of
relations, as below average «100 IQ points).
For example, it was found that students tended
to believe their IQ to be about one-and-a-half
Vol. 7, No.2, Winter 2002
13
The Behavioral Measurement Letter
Self-Estimated Intelligence (continued)
Reilly and Mulhern (1995) note that IQestimates research should not be based on the
"assumption that gender differences at group
level represent a generalized tendency on the
part of either gender to either overconfidence or
lack of confidence with regard to their own
intelligence" (Reilly & Mulhern). Studies do
show, however, a tendency for males to
overestimate and females to underestimate their
score [?reference(s)] but this is related, in part,
to the IQ test used.
- verbal, mathematical and spatial, gender
differences were found only in mathematical
and spatial intelligence, particularly in spatial
intelligence
where females typically rate
themselves six to 10 points lower than do males.
No gender differences were found in selfestimations of verbal intelligence (Furnham,
2001).
Some studies have asked participants to estimate
their overall (g, general intelligence) score first
and then their scores on the seven specific
multiple intelligences. This has made it possible
to regress simultaneously the seven multiple
intelligence estimates on the overall intelligence
estimate. These regression studies show that
people believe that mathematical (logical), then
spatial, and then verbal intelligence are the
primary contributors to overall IQ. This, in turn,
allowed for testing the hypothesis that lay
conceptions of intelligence are male normative
in the sense that those abilities that men tend to
be better at are those that most people consider
to be the essence of intelligence. These studies
showed that lay people tend to confuse
mathematical/spatial and overall intelligence,
thereby explaining
the consistent
gender
differences in estimates of overall score. This
means that quite possibly the often-observed and
debated gender difference in spatial IQ accounts
for the difference in overall IQ.
Some researchers have tried to understand and
then increase the correlation between selfestimates and test scores by using more tests on
bigger populations,
yet the correlations
remained the same, as noted above, around r =
0.30 (Borkenau & Liebler, 1993). Perhaps the
explanation should be sought in motivational
factors that may be involved in the selfestimation of intelligence and that may lead to
serious distortions in estimated scores. Thus a
close examination
of the conditions and
instructions under which participants selfestimate their intelligence may provide a clue as
to how they make their self-estimates. For
example, if social norms and conventions in part
dictate how people respond, then anonymity in
responding· may reduce the distortions in selfestimates.
From Whence the Differences?
Gender differences in self-estimated IQ need to
be explained because the issue is so frequently
discussed,
there has been an academic
consensus for many years to the effect that
gender differences are minimal, and since before
the Second World War, test constructors have
been careful to produce IQ tests that minimize
gender differences.
Correlations Between Self-Estimated
and Test-Measured IQ
Are self-estimates accurate? In other words, is
the correlation between (valid) IQ test scores
and self-estimated scores high, low, or "on the
mark"? This is an important question because
some psychologists have suggested that if these
scores correlate highly, self-estimates may serve
as useful measures at a fraction of time, effort,
and costs of administering and scoring IQ tests.
Essentially there are three positions on the
gender differences of estimates issue (Fumham,
20Ul). The "feminist" position was clearly
articulated by Beloff (1992) in her first study,
where she suggested that these differences are
erroneous and simply due to attribution errors.
One explanation she offers is a gender difference
in modesty and humility. "Modesty-training is
given to girls. Modesty and humility are likely to
be connected to the over estimates of women
and for women" (Beloff). Another explanation
Various studies have found that correlations of
actual and self-estimated IQ are around r = 0.30
and that therefore self-estimates cannot serve as
proxy measures of IQ (Paulus, Lysy, & Yik,
1998). One study, however, examined the effect
of outliers and concluded that, if these outliers
are removed, self-estimated
and actual IQ
correlate highly (r > 0.90).
The Behavioral Measurement Letter
14
Vol. 7, No.2, Winter 2002
Self-Estimated Intelligence
(continued)
and failure, and, ultimately, performance on
ability-related tasks. Thus the importance of
studies of self-estimated intelligence may lie not
only in exploring lay theories of intelligence, but
also in understanding the self-fulfilling nature of
self-evaluation of ability. "Because of the
serious implications of under-estimations for
self-confidence and psychological health more
attention should be devoted to the investigation
of gender differences in the accuracy of selfevaluations.
Such research will not only
elucidate the underlying processes of selfevaluation biases and therefore be of theoretical
interest but will also be of practical value by
suggesting ways of eliminating women's underestimation of their performance" (Beyer).
she proposes is that because IQ is correlated
with occupational grade and men tend to occupy
certain higher status occupations more than
women (for political rather than ability reasons
in her view), females tend to believe that they
are less intelligent than males. There seem to be
no direct data to test Beloff's assumptions, and
they look a little outdated, particularly in the
case of America where students still show the
same gender difference pattern despite their
more equal socialization and occupational
choice. However, unusual data from Slovakia
may be evidence for this position. Slovakian
females awarded themselves higher overall (g)
and verbal scores than did equivalent Belgian
and British student groups (Furnham, Rakow,
Samany-Schuller,
& De Fruyt, 1999). The
authors
offered
the following
possible
explanation for this unique group of confident
females: It may well be that under the pressure
of socialist governments of Eastern Europe, the
role of females in society was somewhat
different from that of women in capitalist
Western Europe - the former took a more active
role in the economy and were socialized
differently in school. In fact, Slovakian society
attached high prestige to education and the
government made a consistent effort to improve
the position of women in society by encouraging
them to obtain educational qualifications,
facilitating their employment in traditionally
male-dominated
occupations,
and
even
mandating a given percentage of women in its
parliament. Another explanation they present is
that among the various nationalities studied, the
Slovakian women had the most experience in
taking intelligence tests and were therefore
presumably more likely to recognize that gender
differences in intelligence are actually very
small (Furnham, et al.).
Beyer (1998) reviewed studies concluding that
individuals make poor self-evaluators. In one
such study, the correlation between medical
students' self-rated knowledge and exam grades
was found to be almost exactly zero (r = -0.01).
In another study Beyer reviewed, the correlation
between self-perceptions of one's own physical
attractiveness and judges' ratings of their
attractiveness was 0.22. Another study she
considered found a correlation of 0.32 between
how one judged ones own performance on a test
of managerial-skill and how independent experts
rated it. "Interestingly, outside evaluators seem
to be better assessors of a target's performance
that the target her/himself' (Beyer). Thus it
seems that while self-estimates of intelligence
may not be useful as proxy IQ tests, various
studies (Paulus et al., 1998) have shown that
they could be very useful in explaining some of
the variability in actual test results through the
self-expectation
effect that Beyer (1999)
discussed.
The work of Dweck and colleagues
is
particularly salient here (Dweck & Bempechat,
1983; Mueller & Dweck, 1998). Dweck argued
that lay persons tend either to believe that
intelligence
is malleable (incremental
lay
theorists) or else it is fixed (entity lay theorists).
These beliefs logically relate to goal setting,
motivation, and attribution for success and
failure. Further, they relate to one's perceptions
of others. While these beliefs are not necessarily
related to actual ability, they can have powerful,
even paradoxical, behavioral impacts. Thus
Mueller and Dweck pointed out if a young
Consequences of Gender Differences
in Ability Estimates
In a series of studies, Beyer (1990) demonstrated
gender differences in self-expectations, selfevaluations, and performance on ability-related
tasks. Her results support the "male hubris,
female humility" thesis of Beloff. Further, she
argued that gender differences
in selfevaluations affect expectancies
of success
Vol. 7, No.2, Winter 2002
15
The Behavioral Measurement Letter
Self-Estimated Intelligence
Beyer, S., (1990). Gender differences in the accuracy of
self-evaluation of performance. Journal of Personality and
Social Psychology, 59, 960-970.
(continued)
person believes him/herself to be intelligent,
entity theory predicts that they are likely to be
less motivated to work hard to achieve their
goals.
Beyer, S., (1998). Gender differences in self-perception
and negative recall bias. Sex Roles, 38, 103-133.
Beyer, S., (1999). Gender differences in the accuracy of
grade expectations and evaluations. Sex Roles, 41, 279296.
The interaction between beliefs about one's level
of intelligence on the one hand and the
changeability of intelligence on the other is
potentially very important. Thus entity lay
theorists who believe themselves to be below
average on overall or specific intelligences may
shun intellectual tasks, set themselves low goals,
and become depressed and ineffectual. In
contrast, entity lay theorists who rate themselves
way above average may be complacent and lazy,
believing their "natural wit" sufficient to let
them succeed at most academic and work
assignments. However, lay incrementalists who
are low self-raters may, if so moved, be prepared
to undertake tasks or training that they believe
will increase their intelligence. High self-rating
incrementalists, in contrast, may also believe
that they need to work hard to maintain their
levels of intelligence.
Borkenau, P, & Liebler, A., (1993). Convergence of
stranger ratings of personality and intelligence with selfratings, partner-ratings,
and measured intelligence.
Journal of Personality and Social Psychology, 65, 546553.
Byrd, M., & Stacey, B., (1993). Bias in IQ perception. The
Psychologist, 6, 16.
Dweck, c., & Bempechat, J., (1983). Children's theories
of intelligence: Consequences for learning. In S. Paris, G.
Olson, and H. Stevenson (Eds.), Learning and motivation
in the classroom. Hillsdale, NJ: Erlbaum.
Flynn, 1., (1986). Massive IQ games in 14 nations: What
IQ tests really measure. Psychological Bulletin, 101, 171191.
Furnham, A., (2001). Self-estimates
of intelligence:
Culture and gender differences in self and other estimates
of general (g) and multiple intelligences. Personality and
Individual Differences, 31,1381-1405.
Conclusion
Furnham, A., Clark, K., & Bailey K., (1999). Sex
differences
in estimates
of multiple intelligences.
European Journal of Personality, 13,247-259.
The measurement of intelligence has always
been a controversial topic. So, too, have the
topics of lay people's beliefs about their own
intelligence and the extent to which it can be
changed. Research relating to these topics has
yielded evidence of a "self-Pygmalion effect" in
that beliefs about the nature of intelligence and
one's intelligence level can profoundly influence
not only how one performs on a test, but one's
personal motivations in academic and work
settings throughout life (Spitz, 1999).
Furnham, A., Fong, G., & Martin, N., (1999). Sex and
cross-cultural differences in the estimated multi-faceted
intelligence quotient score for self, parents, and siblings.
Personality and Individual Differences, 26, 1025-1034.
Furnham, A., Rakow, T., Samany-Schuller, I., & De Fruyt,
F., (1999). European differences in self-perceived multiple
intelligences. European Psychologist, 4, 131-138.
Furnham, A., & Rawles, R., (1995). Sex differences in the
estimation of intelligence. Journal of Behavior and
Personality, 10,741-748.
References
Furnham, A., & Rawles, R., (1999). Correlations between
self-estimated
and psychometrically
measured
IQ.
Journal of Social Psychology, 139,405-410.
Beloff, H., (1992) Mother, father and me: Our IQ. The
Psychologist, 5, 309-311.
Bennett, M., (1996) Men's and women's self-estimates of
intelligence. Journal of Social Psychology, 136, 411-412.
Furnham, A., Shahidi, S., & Baluch, B., (2002 in press).
Sex and culture differences in self-perceived and family
estimated
multiple
intelligence.
A British- Iranian
comparison. Journal of Cross-Cultural Psychology.
Bennett, M., (1997) Self-estimates of ability in men and
women. Journal of Social Psychology, 137,540-541.
Bennett, M., (2000). Gender differences in the se1festimation of ability. Australian Journal of Psychology,
52,23-28.
The Behavioral Measurement Letter
16
Vol. 7, No.2, Winter 2002
Self-Estimated Intelligence
(continued)
HaPI Advisory Board
Furnham, A., & Ward, c., (2001 in press). Sex
differences, test experience and the self-estimation of
multiple
intelligence.
New Zealand
Journal
of
Psychology.
Aaron T. Beck, MD
University of Pennsylvania School of Medicine
Timothy C. Brock, PhD
Ohio State University, Psychology
Gardner, H., (1983). Frames of mind: A theory of multiple
intelligences. New York: Basic Books.
Gardner, H., (1999). Intelligence
Basic Books.
William C. Byham, PhD
Development Dimensions International
reframed. New York:
Donald Egolf, PhD
University of Pittsburgh, Communication
Hogan, H., (1978). IQ self-estimates
of males and
females. Journal of Social Psychology, 106, 137-138.
Mueller, c., & Dweck, c., (1998). Praise for intelligence
can undermine children's motivation and performance.
Journal of Personality and Social Psychology, 75, 33-52.
Sandra J. Frawley, PhD
Yale University School of Medicine,
Medical Informatics
Paulus, D., Lysy, D., & Yik, M., (1998). Self-report
measures of intelligence: Are they useful as proxy IQ
tests. Journal of Personality, 66,523-555.
David F. Gillespie, PhD
Washington University
George Warren Brown School of Social Work
Reilly, J., & Mulhern, G., (1995). Gender difference in
self-estimated IQ: the need for care in interpreting group
data. Personality and Individual Differences, 18, 189-192.
Robert C. Like, MD, MS
University of Medicine and Dentistry of
New Jersey, Robert Wood Johnson Medical School
Spitz, H., (1999). Beleaguered Pygmalion: A history of
the controversy over claims that teacher expectancy raises
intelligence. Intelligence, 27, 199-234.
Joseph D. Matarazzo, PhD
Oregon Health Sciences University
Dr. Adrian Furnham is Professor of
Psychology at London University (UCL)
where he has taught for 20 years. He holds
doctoral degrees from both Oxford
University and UCL. He is the author of
36 books and 450 peer-reviewed papers.
Dr. Furnham is currently working on a
book about managerial incompetence.
Vickie M. Mays, PhD
University of California at Los Angeles, Psychology
Michael S. Pallak, PhD
Behavioral Health Foundation
Kay Pool, President
Pool, Heller & Milne, Inc.
---
--~~--------
Ora Lea Strickland, PhD~ ~~;A~----L
Gerald Zaltman, PhD
Harvard University Graduate School of
Business Administration
Every tool carries with it the spirit
by which it has been created.
Stephen J. Zyzanski, PhD
Case Western Reserve University School of
Medicine
Werner Karl Heisenberg
Vol. 7, No.2, Winter 2002
Emory University Woodruff School of Nursing
17
The Behavioral Measurement Letter
Ten Commandments for Selecting
Self-Report Instruments
only a single measure of a particular research
construct makes it impossible to know the
degree to which the irrelevancies in the measure
affect the obtained results. Moreover, diversity
in operationalization helps us better understand
not only what we're measuring, but also what
we're not measuring. Thus, in the long run,
employing only the most commonly used
instruments limits and weakens the body of
scientific knowledge.
Fred B. Bryant
Selecting appropriate measurement instruments
is among the tasks researchers most frequently
face. Yet, surprisingly little has been written
about how best to go about the process of
instrument selection. Given the prevalence of
self-report methods of measurement in the social
sciences, the task of selecting an instrument
most often involves choosing from among a set
of seemingly relevant questionnaires, surveys,
inventories, checklists, and scales.
Although it is difficult to devise a universally
applicable set of rules for selecting measurement
instruments, it is possible to suggest some
general guidelines that researchers can use in
choosing appropriate self-report measures. What
follows, then, is a set of precepts and principles
for selecting instruments for research purposes,
along with concrete examples illustrating each.
Note that the order of presentation is not
intended to reflect the guidelines' relative
importance
in the process of instrument
selection. In fact, each principle is essential in
selecting the right measurement tool for the job.
For example, a researcher who wishes to
measure depression in college students might
locate dozens of potentially useful self-report
instruments designed to assess this construct.
Indeed, the September 2001 release of the
Health and Psychosocial Instruments (HaPI)
database contains 105 primary records of selfreport instruments with the term "depression" in
the title, excluding measures developed for use
with children or the elderly, those written in
foreign languages, and those assessing attitudes
toward, knowledge of, or reactions to depression
rather than depression per se. The seemingly
appropriate instruments thus identified include
the Beck Depression Inventory (BDI; Beck,
1987), Hamilton Rating Scale for Depression
(HAM-D;
Hamilton,
1960), Center
for
Epidemiologic Studies Depression Scale (CESD; National Institute of Mental Health, 1977),
and Self-Rating Depression Scale (SDS; Zung,
1965), to name just a few. How should the
researcher decide which to use?
1. Before choosing an instrument,
define the
construct you want to measure as precisely as
feasible. Ideally, researchers should not rely
merely on-a label or descriptive term to represent
the construct they wish to assess, but should
instead define the construct of interest clearly
and precisely in theory-relevant terms at the
outset. Being unable to specify beforehand what
it is you want to measure makes it hard to know
whether or not a particular instrument is an
appropriate measure of the target construct
(Bryant, 2000). Potentially useful strategies for
defining research constructs are to draw on
published literature reviews that synthesize
available theories concerning a particular
construct, or to review the published literature
on one's own in search of alternative theoretical
definitions.
If you can find alternative
conceptual definitions of the target construct,
then you can choose from among them a
particular conception that resonates with your
own thinking. The process of explicitly
conceptualizing the construct that you wish to
measure
is known
as "preoperational
explication" (Cook & Campbell, 1979).
One strategy for selecting instruments is to
employ only those most commonly used in
published studies. Not only is this strategy
simple
and straight-forward,
but some
researchers follow it in the hope of increasing
the likelihood that their research will be
published.
However, it limits conceptual
definitions to those created within the theoretical
frameworks of commonly used instruments.
Over time, this effectively constricts the
generalizability of research on these constructs.
Further, all measurement
instruments
tap
irrelevancies that have nothing to do with the
constructs they are intended to assess. Using
The Behavioral Measurement Letter
Imagine, for example, that a clinical researcher
wants to use a self-report measure of shyness. A
18
Vol. 7, No.2, Winter 2002
Ten Commandments
for Selecting Self-Report
Instruments
(continued)
appropriate self-report instrument in each case
will be the one whose underlying conceptual
definition most closely corresponds to that of the
researcher. The sociologist, on the one hand,
might be studying people's reactions to
homeless
adults
from
a macro-level,
sociocultural perspective. If so, then she might
begin by defining guilt to be a pro social emotion
experienced when one perceives oneself as
being better off than another person who is
disadvantaged.
Consistent
with
this
conceptualization is Montada and Schneider's
(1989) three-item measure of "existential guilt,"
conceived as prosocial feelings in reaction to the
socially disadvantaged. The psychologist, on the
other hand, might be studying personality from
a micro-level, individual perspective. If so, then
she might begin by defining guilt to be a
dispositional feeling of regret or culpability in
reaction to perceived personal or moral
transgressions. Consistent with this conceptual
definition is Kugler and Jones' (1992) 98-item
Guilt Inventory, which includes a separate
subscale specifically tapping personal guilt as an
unpleasant emotional trait. Clearly, instruments
must have underlying conceptual definitions that
match your own conceptual
framework
(Brockway & Bryant, 1998).
wide variety of potentially relevant measures
can be found in the Health and Psychosocial
Instruments database, depending on how the
researcher conceptually defines shyness. Is
shyness (for which there are 30 primary records
in the database) synonymous with introversion
(30 primary records), timidity (17 primary
records), emotional insecurity (2 primary
records), social anxiety (63 primary records),
social fear (1 primary record), social phobia (29
primary records), social avoidance (4 primary
records), or social isolation (76 primary
records)? The clearer and more precise the
initial conceptual definition, the easier it will be
to find appropriate measurement tools. An added
benefit of precisely specifying target constructs
at the outset is that it helps to focus the research.
Although precise preoperational explication is
the ideal when selecting measures, in actual
practice it is often difficult beforehand to specify
a clear conceptual definition of the target
construct. Many times the published literature
does not provide explicit alternatives and this,
then, forces researchers to explicate constructs
on their own -- the equivalent of trying to define
an unknown word without having a dictionary.
In actual practice, researchers often begin by
selecting a particular instrument that appears
useful, thus adopting by default the conceptual
definition of the target construct that underlies
the chosen instrument. Truly, therefore, an
available instrument
often dictates one's
conceptual definition post hoc.
3. Never decide to use a particular instrument
based solely on its title. Just because the name
of an instrument includes the target construct
does not guarantee that it either defines this
construct the same way you do or even measures
the construct at all. Don't let the title lead you to
select an inappropriate instrument.
As a case in point, consider a developmental
psychologist who wants to measure parents'
psychological involvement in their families.
Based on its promising title, the researcher
decides to use the ParentiFamily Involvement
Index (PFII; Cone, DeLawyer, & Wolfe, 1985).
After obtaining the instrument and inspecting its
constituent items, the researcher realizes to his
chagrin that the PFII requires a knowledgeable
informant (e.g., a teacher or teacher's aide) to
indicate whether or not the particular parent of a
school-aged child has engaged in each of 63
specific educational activities. Based on an
underlying
conceptualization
of family
involvement in psychological terms, a more
appropriate instrument would be a measure
2. Choose the instrument that is designed to
tap
an
underlying
construct
whose
conceptual definition most closely matches
your own. Carefully consider the theoretical
framework on which the originators based their
instrument. Select an instrument that stems from
a theory that defines the construct the same way
you do, or at least in a way that does not
contradict your conceptual orientation, for the
theoretical underpinnings of the instrument
should be compatible with your own conceptual
framework.
Consider a sociologist and a psychologist, each
of whom wants to measure guilt. The most
Vol. 7, No.2, Winter 2002
19
The Behavioral Measurement Letter
Ten Commandments
for Selecting Self-Report
Instruments
(continued)
types of reliability (e.g., internal consistency,
split-half, parallel-forms, interrater, test-retest)
and of reliability coefficients (e.g., Cronbach's
alpha, coefficient kappa, intraclass correlation,
KR-20). Similarly, there are numerous types of
validity (e.g., construct, concurrent, criterionreferenced, convergent, discriminant). Thus a
host of specialized statistical tools has been
developed to quantify both reliability (Strube,
2000) and validity (Bryant, 2000). Given the
numerous types of reliability and reliability
coefficients, validity, and tools used to assess
reliability and validity, instrument selection
requires at least a basic understanding
of
psychometrics
and of principles of scale
validation in order to make informed judgments
of instrument quality. When there is no evidence
concerning an instrument's reliability or validity,
measurement becomes a "shot in the dark."
developed by Yogev and Brett (1985) that
assesses parental involvement in terms of the
degree to which parents identify psychologically
with and are committed to family roles. This
example shows clearly that you can't judge an
instrument by its title any more than you can
judge a book by its cover (Brockway & Bryant,
1997).
4. Choose an instrument for which there is
evidence of reliability and validity. Reliability
'i
i
is measurement that is accurate and consistent.
Good reliability in measurement strengthens
observed statistical relationships -- the more
reliable the instrument, the smaller will be the
error in measurement, and the closer observed
results will be to actual results. For example,
imagine a medical researcher who wants to
determine whether an experimental antipyretic
agent reduces fever more rapidly than available
antipyretics, but who is using an unreliable
thermometer that gives different readings over
time, even when body temperature is stable.
These inconsistencies in measurement make it
nearly impossible
to assess temperature
accurately, and greatly decrease the likelihood of
finding any experimental effects.
5. Given a choice among alternatives, select
an instrument whose breadth of coverage
matches your own conceptual scope. If you
define your target construct as having a wide
range of requisite constituent components, then
choose an instrument whose items tap a broad
spectrum
of content
relating
to those
components. On the other hand, if you define
your target construct in a way that specifies a
narrower set of conceptual components, then
choose an instrument that has a more restrictive
and specific content.
Data supporting the validity of an instrument
increase one's confidence that it really measures
what it is designed to measure. For example, the
medical researcher referred to above can be
more confident that the thermometer actually
measures temperature if its readings correlate
highly with those of infrared-telemetry or bodyscanning
devices.
Although
instrument
developers sometimes report reliability and
validity data, such empirical evidence is often
available only in published studies that have
used the given measure. As a rule, avoid judging
the validity of an instrument by the content of its
constituent items. What an instrument appears to
measure "on its face" (i.e., face validity) is not
necessarily what it actually measures. As in the
case of an instrument's title, what you see is not
necessarily what you get.
Breadth of coverage varies widely across
available instruments. For example, to measure
coronary-prone Type A behavior, alternatives
include: the Student Jenkins Activity Survey
(Glass, 1977) that taps the Type A components
of hard-driving competitiveness and speedimpatience; the Type A Self-Rating Inventory
(Blumenthal,
Herman,
O'Toole,
Haney,
Williams, & Barefoot, 1985) that taps harddrivingness and extraversion; the Type A/Type B
Questionnaire (Eysenck & Fulker, 1983) that
taps tension, ambition, and activity; the Time
Urgency and Perpetual
Activation
Scale
(Wright, 1988) that taps activity and time
urgency; and the Self-Assessment Questionnaire
(Thorensen, Friedman, Powell, Gill, & Ulmer,
1985) that taps hostility, time urgency, and
competitiveness.
Clearly, your choice of
Judging the quality of research evidence
concerning measurement reliability and validity
can be difficult and confusing. There are various
The Behavioral Measurement Letter
20
Vol. 7, No.2, Winter 2002
I
Ten Commandments for Selecting Self-Report
Instruments (continued)
seeks a global summary of patients' overall life
satisfaction, whereas the other wants to compare
levels of satisfaction across important aspects of
patients' lives. The first researcher could use the
five-item Satisfaction with Life Scale (Diener,
Emmons, Larsen, & Griffin, 1985) to obtain a
global total score. The second researcher could
use the 66-item Quality of Life Index (Ferrans &
Powell, 1985) to obtain individual scores for
four separate satisfaction subscales: Health/
Functioning, Socioeconomic,
Psychological/
Spiritual, and Family.
instrument depends on the specific components
of Type A behavior that you want to investigate.
The choice between general versus specific
measures of a given construct may also depend
on your particular research question. Consider
the process of coping, for example. Numerous
instruments exist for measuring people's general
style of coping in response to stress. However, if
you want to study coping in relation to a specific
problem or stressor, a host of other measures
have been developed to assess individual
differences in coping with such specific
concerns as arthritis, asthma, cancer, chronic
pain, diabetes, heart disease, hypertension,
stroke, multiple sclerosis, spinal cord injury,
HIV/AIDS,
rape,
sexual
abuse,
sexual
harassment, pregnancy, miscarriage, childbirth,
economic pressure, job stress, unemployment,
depression, bereavement,
natural disasters,
prison confinement, test anxiety, and war
trauma, to name just a few. Compared to a
broad-band measure of general coping, a
narrow-band measure of coping that is specific
to a particular stressor would be expected to
show stronger relationships with reactions to
that specific stressor.
7. Choose an instrument with a time frame
and response format that meet your needs.
Don't use a "trait" measure (i.e., an instrument
that defines the underlying construct as a stable
disposition) to assess a transient, situational
"state" that you expect to change over time. And
don't modify the labels on a response scale (e.g.,
from "rarely" to "never") or the time frame of
measurement (e.g., from "in general" or "on
average" to "during the past day" or "in the past
hour") unless you have no other recourse.
Substantial changes in an instrument's response
format or time frame can compromise its
construct
validity
and therefore
require
revalidation.
In measuring hostility, for example, the choice
of an appropriate instrument would depend on
whether you conceive of hostility as a transitory,
variable state or a stable, dispositional trait. An
appropriate tool for measuring "state" hostility
might be the 35-item Current Mood Scale
(Anderson, Deuser, & DeNeve, 1995), which is
designed to assess situational hostility, whereas
an appropriate tool for measuring "trait"
hostility might be the 50-item Cook-Medley
Hostility Scale (Cook & Medley, 1954), which
is based on the Minnesota
Multiphasic
Personality Inventory (MMPI; Hathaway &
McKinley, 1989) and conceptualizes hostility as
a personality trait. Clearly, when selecting
instruments you need to distinguish carefully
between states and traits.
6. Select an instrument that provides a level
of measurement appropriate to your research
goals. Some instruments are based on a
theoretical model in which the underlying
construct is assumed to be "unitary." Such
instruments provide only a general, global "total
score" that summarizes the overall level of
responses. Other instruments are based on a
theoretical framework in which the latent
construct
is
considered
to
be
"multidimensional." Such instruments provide
multiple "subscale scores," each of which taps a
separate dimension of the underlying construct.
Thus, if you want to gather global summary
information about a target construct, then use a
unitary instrument. If you want to gather
information about multiple facets of a target
construct,
then use a multidimensional
instrument.
8. Match the reading level required to
understand the items in the instruments you
select to the age of the intended respondents.
In studying children or adolescents, for example,
avoid using an instrument designed for use with
adults. When in doubt, use word-processing' or
Imagine two nursing researchers, each of whom
wishes to measure patients' life satisfaction. One
Vol. 7, No.2, Winter 2002
21
The Behavioral Measurement Letter
Ten Commandments for Selecting Self-Report
Instruments (continued)
have access to the detailed scoring instructions
contained in the test manual. Clearly, then,
administering an instrument is one thing, but
scoring it can be an entirely different matter.
linguistic software to determine the reading
ability level required
to understand
an
instrument's constituent items.
10. Rather than choosing only one measure,
when feasible use multiple measures of the
construct you wish to assess. A central tenet of
classical measurement theory is that any single
way of measuring a construct has unavoidable
idiosyncrasies that are unique to the measure
and have nothing to do with the underlying
conceptual variable. By studying what multiple
measures of the same construct have in
common,
researchers
can converge
or
triangulate on the referent construct. Using
multiple measures also allows an assessment of
the generalizability of results across alternative
operational or conceptual definitions, to probe
the generality versus specificity of effects. And
in the long run, using multiple instruments will
advance our understanding of the targeted
construct much further than simply using a
single instrument.
Imagine a researcher who wants to measure
depression in children. Depending on the
average age of the subjects, the researcher could
choose from a variety of different self-report
instruments specifically designed to assess
depression in children of various ages, including
those with a first-grade reading level (Children's
Depression Inventory; Kovacs, 1985), children
age 7-13 (Negative Affect Self-Statement
Questionnaire; Ronan, Kendall, & Rowe, 1994),
children age 8-12 (Childhood Depression
Assessment Tool; Brady, Nelms, Albright, &
Murphy,
1984), or children
age 8-13
(Depression
Self-Rating
Scale; Asarnow,
Carlson, & Guthrie, 1987). In any case, the
researcher
studying childhood
depression
should avoid adopting an instrument designed to
tap depression in adults.
Even when following the ten guidelines for
instrument selection discussed above, you will
still sometimes face difficult, highly subjective
decisions 'in choosing appropriate measures. For
example, which mood measure is more
appropriate:
one that uses a seven-point
response scale, or one that uses a four-point
response scale; one that includes a specific label
for each individual point on its response scale, or
one that has labels only at its endpoints; one that
assesses the frequency with which respondents
experience certain feelings, or one that taps the
percentage of time respondents experience
certain feelings? Given the subjectivity of such
decisions, it makes all the more sense to use
multiple measures whenever possible so as to
evaluate the generalizability of results across
alternative operational definitions of the same
underlying construct.
9. Never use an instrument unless you know
how you'll score it and how you'll analyze it.
This rule may seem self-evident, but it is
sometimes violated unintentionally. No matter
how interesting or important an instrument
seems, it is useless unless you can convert
responses to it into meaningful data. Sometimes
the scoring rules are difficult to obtain or hard to
follow, particularly when an instrument consists
of multiple composite subscales, reverse-scored
items, or item-specific scoring weights. This
suggests that researchers should first make sure
they know how to score an instrument before
they administer it.
Consider the SF-12 (Ware, Kosinski, & Keller,
1996), a 12-item self-report instrument designed
to measure functional health status. At first
glance, it might appear simple enough to score
this instrument by simply summing or averaging
responses to its constituent items. But the test
manual for the SF-12 (Ware, 1993) specifies a
complex set of mathematical computations
designed to weight and combine the 12 item
scores to produce composite scores reflecting
mental, physical, and total functioning. Users
cannot score the SF-12 correctly unless they
The Behavioral Measurement Letter
To
recap,
I
Commandments"
instruments:
have
suggested
"Ten
for selecting self-report
1. Before choosing an instrument, define
the construct you want to measure as
precisely as feasible.
22
Vol. 7, No.2, Winter 2002
-
-
Database Services, creator of the HaPI database,
to secure permission to use an instrument, and to
obtain a hardcopy of it along with any scoring
instructions. For a reasonable fee, BMDS will
perform these services.
Ten Commandments
for Selecting Self-Report
Instruments
(continued)
2. Choose the instrument that is designed to
tap an underlying construct whose
conceptual
definition
most closely
matches your own.
References
3. Never decide to use a particular
instrument based solely on its title.
Anderson, c.x.. Deuser, W.E., & DeNeve, K.M. (1995).
Hot temperatures, hostile affect, hostile cognition, and
arousal: Tests of a general model of affective aggression.
Personality and Social Psychology Bulletin, 21, 434-448.
4. Choose an instrument for which there is
evidence of reliability and validity.
5. Given a choice among alternatives,
select an instrument whose breadth of
coverage matches your own conceptual
scope.
Asarnow, J.R., Carlson, G.A., & Guthrie, D. (1987).
Coping strategies, self-perceptions, hopelessness, and
perceived family environments in depressed and suicidal
children. Journal of Consulting and Clinical Psychology,
55,361-366.
6. Select an instrument that provides a level
of measurement appropriate to your
research goals.
Beck, A.T (1987). Beck Depression Inventory (BDI). San
Antonio, TX: Psychological Corporation.
7. Choose an instrument with a time frame
and response format that meet your
needs.
Blumenthal, lA., Herman, S., O'Toole, i.c., Haney, TL.,
Williams, R.B., Jr., & Barefoot, J.e. (1985). Development
of a brief self-report measure of the Type A (coronary
prone) behavior pattern. Journal of Psychosomatic
Research, 29, 265-274.
8. Match the reading level required to
understand the items in the instruments
you select to the age of the intended
respondents.
Brady, M.A., Nelms, B.C., Albright, A.V, & Murphy,
C.M. (1984)". Childhood depression: Development of a
screening tool. Pediatric Nursing, 10, 222-225,227.
9. Never use an instrument unless you
know how you'll score it and how you'll
analyze it.
10.
Brockway, lH., & Bryant, FB. (1997). Teaching the
process of instrument selection in family research. Family
Science Review, 10, 182-194.
Rather than choosing only one measure,
when feasible use multiple measures of
the construct you wish to assess.
Following these guidelines
select instruments wisely.
Brockway, J.H., & Bryant, FB. (1998). You can't judge a
measure by its label: Teaching students how to locate,
evaluate, and select appropriate instruments. Teaching of
Psychology, 25, 121-123.
will help you to
Bryant,
FB.
(2000). Assessing
the validity
of
measurement. In L.G. Grimm & P.R. Yarnold (Eds.),
Reading and understanding more multivariate statistics
(pp. 99-146). Washington, DC: American Psychological
Association.
Addendum
Obtaining Instruments Once Identified
Adhering to these ten guiding principles can
help you identify appropriate measurement
instruments, but then you need to obtain them
and permission to use them. Indeed, physical
availability may ultimately dictate instrument
choice. An instrument may be unavailable for a
variety
of reasons,
including
copyright
restrictions, being out-of-print, or death of the
primary author. Given such obstacles, rather
than trying to contact the original developer to
obtain copies of an instrument, it may be best to
contact BMDS, Behavioral
Measurement
Vol. 7, No.2, Winter 2002
Cook,
TD.,
& Campbell,
D.T
(1979).
Quasiexperimentation:
Design & analysis issues for field
settings. Chicago: Rand McNally.
Cook, WW, & Medley, D.M. (1954). Proposed hostility
and pharisaic-virtue scales for the MMPI. Journal of
Applied Psychology, 38, 414-419.
Cone, J.D., DeLawyer, D.D., & Wolfe, VV (1985).
Assessing
parent participation:
The Parent/Family
Involvement Index. Exceptional Children, 51, 4l7-424.
23
The Behavioral Measurement Letter
Ware, J., Jr., Kosinski, M., & Keller, S.D. (1996). A 12Item Short-Form Health Survey: Construction of scales
and preliminary tests of reliability and validity. Medical
Care, 34, 220-233.
Ten Commandments for Selecting Self-Report
Instruments (continued)
Diener, E., Emmons, R.A., Larsen, R.I., & Griffin, S.
(1985). The Satisfaction with Life Scale. Journal of
Personality Assessment, 49, 71-75.
Wright, L. (1988). The Type A behavior pattern and
coronary artery disease: Quest for the active ingredients
and the elusive mechanism. American Psychologist, 43, 214.
Eysenck, H., & Fulker, D. (1983). The components of
Type A behavior and its genetic determinants. Personality
and Individual Differences, 4, 499-505.
Yogev, S., & Brett, J. (1985). Patterns of work and family
involvement among single-and dual-earner couples.
Journal of Applied Psychology, 70, 754-768.
Ferrans, c.E., & Powell, M.J. (1985). Quality of Life
Index: Development
and psychometric
properties.
Advances in Nursing Sciences, 8, 15-24.
Glass, D.C. (1977). Behavior patterns, stress,
coronary disease. Hillsdale, NJ: Lawrence Erlbaum.
Zung, WWK (1965). A Self-Rating Depression Scale,
Archives of General Psychiatry, 12, 371-379.
and
Fred Bryant is Professor of Psychology at
Loyola University, Chicago. He has roughly
90 professional publications in the areas of
social psychology, personality psychology,
measurement, and behavioral medicine. In
addition, he has coedited 6 books, including
Methodological Issues in Applied Social
Psychology (1993; New York: Plenum). Dr.
Bryant has extensive consulting experience
in a wide variety of applied settings,
including work as a research consultant for
numerous marketing firms, medical schools,
and public school systems; a
methodological expert for the u.s.
Government Accounting Office; and an
expert witness in several federal court cases
involving social science research evidence.
He is currently on the Editorial Board of
the journal Basic and Applied Social
Psychology. His current research interests
include happiness, the measurement of
cognition and emotion, and structural
equation modeling.
Hamilton, M. (1960). A rating scale for depression.
Journal of Neurology, Neurosurgery, and Psychiatry, 23,
56-62.
Hathaway, S.R., & McKinley, J.c. (1989). Minnesota
Multiphasic Personality Inventory (MMPI). Minneapolis,
MN: NCS Assessments.
Kovacs, M. (1985). The Children's Depression Inventory
(CDI). Psychopharmacology Bulletin, 21, 995-998.
Kugler, K, & Jones, WH. (1992). On conceptualizing and
assessing guilt. Journal of Personality and Social
Psychology, 62, 318-327.
Montada, L., & Schneider, A. (1989). Justice and
emotional reactions to the disadvantaged. Social Justice
Research, 3, 313-344.
National Institute of Mental Health (1977). Center for
Epidemiologic
Studies Depression
Scale (CES-D).
Rockville, MD: Author.
Ronan, KR., Kendall, rc, & Rowe, M. (1994). Negative
affectivity in children: Development and validation of a
self-statement
questionnaire.
Cognitive Therapy and
Research, 18, 509-528.
Strube, M.I. (2001). Reliability and generalizability
theory. In L.G. Grimm & P.R. Yamold (Eds.), Reading and
understanding more multivariate statistics (pp. 23-66).
Washington, DC: American Psychological Association.
Remember me,
I'm HaPI at BMDS!
Thoresen, C.E., Friedman, M., Powell, L.H., Gill, J.J., &
Ulmer, D. (1985). Altering the Type A behavior pattern in
postinfarction
patients. Journal of Cardiopulmonary
Rehabilitation, 5, 258-266.
HaPI
Ware, J.E., Jr. (1993). SF-12 Health Survey (SF-12).
Boston, MA: Medical Outcomes Trust.
I
Health and Psychosocial Instruments
The Behavioral Measurement Letter
24
Vol. 7, No.2, Winter 2002
HaPI Thoughts
...•...••.....•....•.......•..................•.......•.....••...........................•...•..........
Search for measurement instruments
in the Health and Psychosocial Instruments (HaPI)
database with over 105,000 records of measurement
instruments online or on CD-ROM!
·'~
~. *-
Produced by BMDS • Behavioral Measurement
if
~
Database Services
SUBSCRIBE TO HaP\!
Call Ovid Technologies at 800-950-2035 ext 6472
for pricing and to order today!
Obtain copies of measurement instruments
from BMDS Instrument Delivery!
Call BMDS at 412-687-6850, fax 412-687-5213, or
e-mail [email protected]
($20/$30 processing/handling fee)
•••••.•••••.••••.•.••...•..•...•......•.•......................•....•.•••••.........•••...•••..••.......
HaPI Thoughts
Good News!
Ovid Technologies
now comprises
both the Ovid and SilverPlatter
platforms.
Health and Psychosocial
Instruments
(HaPI) continues
on the Ovid platform
and can be found in the categories
of Medicine
& Allied Health. and Social
Sciences.
Measurement
Assistance?
Yes indeed! It's just a phone call away and with a smile you get searching
tips and suggestions
for quick retrieval of the instruments
you need.
Call Evelyn Perloff. PhD. for Measurement
Assistance
from BMDS at 412687-6850.
Vol. 7, No.2, Winter 2002
25
The Behavioral Measurements Letter
•• •••• ••••• ••• • • • • • • ••• • ••••••••• ••••••
The
BEHAVIORAL
MEASUREMENT
Letter
Behavioral
Measurement
Database
Services
• ••• • ••••• • •• •• • ••• • ••••• •• • •••••••• ••• • • • • •• • • •••
In This Issue:
• Introduction to This Issue
Al K. DeRoy
1
• Refinements to the Lubben
Social Network Scale: The LSNS-R
James Lubben, Melanie Gironda, and
Alex Lee
2
• Self-Estimated Intelligence
Adrian Furnham
12
• Ten Commandments for Selecting
Self- Report Instruments
Fred B. Bryant
18
• HaPI Thoughts
25
'1VnI3.LVWa3.LVa
L8LO-ZSZSI
Vd
'q'i3mqsn1d
• L8ZOlI
X0S: Od
Sg:J1AJgS
gSBqBlUa
lUgrnglnSBgw
·Saws
IB101ABqgs:
.1()1J()7 lUgrnglnSBgw
IB10~ABqgg
()If.L

Documentos relacionados