Revised April 18, 2001

Problems with the WAIS Intelligence Test 1938 - 1997

Judith M. Collins and John E. Hunter

Michigan State University


Citation: Collins, J. M., & Hunter, J. E. (2001). Problems with the WAIS Intelligence Test 1938 - 1997. Symposium Presentation at the Annual Conference of the Society for Industrial and Organizational Psychologists, San Diego, April, 2001.

Problems with the WAIS Intelligence Test 1938 - 1997

For over six decades, researchers and clinical practitioners have relied on the Weschler Adult Intelligence Scale (WAIS). The WAIS is the most widely used test of intelligence for assessing and evaluating patients, students, employees, criminals and other population subgroups (Gregory, 1999). Currently, the WAIS dominates the practice of intelligence testing (Flanagan, McGrew, & Ortiz, 2000). One reason for the longevity and widespread use of the WAIS may be due to literature reviews (e.g., Matazarro, 1985) showing its reliability. However, for years there has been controversy about the theoretical and empirical validity of the WAIS (Flanagan, McGrew, & Ortiz, 2000).

For example, based on a comprehensive examination of the psychometric characteristics and factorial content of the WAIS tests, Frank (1983) concluded that the WAIS was inadequate for use by clinicians. Others also find the WAIS limited and lacking in reliability and validity (Kaufman, 1992, Truch, 1993). In particular, evidence reported for the WAIS validity is actually closer to alternate forms of reliability (Kaufman,1985; 2000). Of perhaps greatest concern is the factorial structure of the WAIS (Kaufman, 2000), which has prompted hundreds of factor analytic studies over the years. At least 18 confirmatory factor analysis (Flanagan et al., 2000) have been performed on WAIS data with no two same results although most find either a three or four factor structure and not the two factor structure that has characterized the WAIS for several decades.

From the earliest version (Weschler, 1939) to its most recent (Psychological Corporation, 1997), the WAIS consists of two subscales--Verbal and Performance (although since then other subtests have been deleted or added). The Verbal subscale comprises five subtests, and the Performance subscale comprises six. However, depending on the particular WAIS version, the numbers of subtests have varied, although, in general, the WAIS has changed very little over the years.

Scores on the Verbal and Performance subscales represent an individual’s intelligence in verbal and analytic reasoning, controlling for age; and the totaled subscale scores converted to a single, Full scale IQ score representing general intelligence, or "g." However, the interpretation of these scores as legitimate measures of verbal and analytical abilities and as a measure of g rely on some evidence for their construct, which, as mentioned above, has been challenged.

Specific examples include Ward, Ryan, & Axelrod (2000) who failed to replicate the two-factor (verbal and performance) model for the WAIS-III; Grady (1983) who reported evidence for a one-factor model but not for a two-factor Performance and Verbal model; Waller & Waldman (1990) who found evidence for a three-factor model; and others who report a four-factor structure (e.g., Flanagan, 2000 for a review). The failure to replicate WAIS research on factorial structures is troubling because of the stability of cognitive abilities across gender, race and diverse situations throughout the lifespan (Anastasi, 1983; Carroll, 1993).

These reports of the lack of construct validity for the WAIS are problematic because critical decisions on people’s lives are based on WAIS intelligence (IQ) scores. Also, because of it lengthy and widespread use, evidence for the lack of validity would be troublesome for science because the WAIS has now been used to generate an enormous body of literature.

Fortunately now, however, the WAIS validity can be either confirmed or disconfirmed using psychometric methods that were not available in its early development. For example, one way is to examine the WAIS validity by estimating the variability surrounding a meta-analyzed population effect size and then comparing this value with those from other IQ tests. In this study, therefore, we will use data from one sub-population to conduct a meta-analysis of WAIS and other tests’ IQ scores free of sampling error to answer the question, "What is the population effect size and the variability of that effect for the WAIS and in comparison with other IQ tests"? This meta-analysis will reveal the variabilities surrounding the "population" effect sizes for the various tests and the credibility ranges for those values.

A second way to examine the WAIS is to use confirmatory factor analysis (CFA) to statistically model the relationships between WAIS scores and the latent construct called general intelligence, or "g." The WAIS is built on exploratory procedures that do not take into account measurement errors; however, most measures of latent constructs contain sizeable measurement errors which CFA can estimate independent of the latent construct (Jöreskog & Sörbom, 1993). When controlling for measurement error, better tests can be made of the relationships among variables in a priori specified models. Therefore, in the addition to the meta-analysis of IQ tests, we will also use this second CFA approach to examine WAIS tests from the original (1938) to the present (1997) versions, to answer the question, "What is the factorial structure of the WAIS when controlling for measurement error"?

METHOD

Part I: Meta-Analysis of IQ Tests

Sample and Procedure

The data consisted of the means and standard deviations of IQ scores for the sub-population of criminals from 232 independent studies with a total sample size of N = 53,242. The primary meta-analysis for any given question begins with a pair of meta-analyses on the means and standard deviations of the relevant studies. Specific formulas are given in Hunter and Collins (2000) and a computer program to compute them is available from the authors. Normative statistics show that for the general population, the mean level of intelligence is 100 and the standard deviation is 15. These normative data can be used to compute d-values comparing the mean for a criminal population to the general population. These computations are also presented in Hunter and Collins (2000).

RESULTS

Part I: Meta-Analysis of IQ Tests

Interpreting SD

The interpretation of the results in Table 1 begins with the standard deviations for the  statistics. Relative to their d-values, the standard deviations for a test would be small in magnitude. Larger standard deviations indicate the presence of moderators in the database. In Table 1, the mean IQ score for all IQ tests in the meta-analysis was 91.18 (SD = 7.44) and the population effect size = - .59 (SD= .50). The standard deviation in relation to the effect size is quite large, indicating the presence of one or more moderator variables. The standard deviations for the WAIS only are also relatively large in size.



Go to Table 1


The WAIS produces estimates of "Performance IQ" and "Verbal IQ" as well as an estimate of general intelligence called "Full IQ." The following mean IQ scores and population effect size s were found for the criminal sub-population and the general population: WAIS Full: 93.68 (SD = 6.47) and = - .42 (SD= .44); WAIS Performance: 95.15 (SD = 6.68) and = - .32 (SD= .45); Verbal IQ: 92.59 (SD = 20.81) and = - .49 (SD= 1.29) (Table 1).

Free of sampling error, these population effect sizes ranged from -.32 (Performance) to -.49 (Verbal). The standard deviations are large in relation to the effect size values, indicating that the samples are not homogeneous. That is, the "overall" meta-analytic results comparing the general population IQ scores and special (criminal) population IQ scores suggest moderators in the database. Two potential moderators are age of the subjects and the type of intelligence test used in the studies.

Interpreting Moderator Effects

Age. The first explanation for the variability in the criminal population would be age. Statistics show that more crimes are committed at relatively younger ages and also that IQ differs for different age groups. We therefore conducted a subset meta-analysis comparing juveniles versus adults. We used only studies that specifically identified the samples as either juvenile or adult (Table 1).

In comparison to the overall meta-analysis, the mean and population effect size for juveniles was 87.15 (SD = 7.26) and = - .86 (SD= .49) revealing a large difference between juvenile delinquents and the general population (Table 1). For the adults in Table 1, the mean is 92.69 (SD = 5.56) and = - .49 (SD= .38). These differences are comparable to the relative differences in Table 1 between the Weschler Intelligence Scale for Children (WISC; 89.61 (SD = 8.10) and = - .69 (SD= .55) and the Weschler Intelligence Scale for Adults (WAIS; 93.68 (SD = 6.47) and = - .42 (SD= .44).

One interpretation is that the large difference between the juveniles (-.86) and the adults (-.49), or between the WISC (-.69) and the WAIS (-.42), are due to age. One explanation would be that lower IQ juveniles commit crimes for modest stakes, so there are larger differences between the juveniles and the general population. In contrast, the mean differences would be smaller for brighter people who wait for better opportunities with higher stakes that present themselves in adulthood (e.g., white-collar crimes of embezzlement versus theft of merchandise from a store). Nonetheless, large standard deviations and credibility ranges remained, indicating additional moderator influences. A second likely moderator would be the type of intelligence test, because of their variability in design.

Type of Test. Although intelligence tests vary in the numbers of subtests and what specific type of intelligence each subtest measures, all intelligence tests measure the same latent construct, g. We therefore used the entire sub-population of criminals to compare population effect sizes on different IQ tests. For these analyses, we compared the "Full IQ" results with results from other tests that make no reference to "Performance IQ."

Table 1 lists the meta-analyzed population effect sizes for the following intelligence tests: WAIS (= - .42, SD= .58); WISC (= - .69, SD= .55); WAIS/WISC combined within samples ( = - .53, SD= .42); OTIS ( = - .59, SD= .26; STANFORD-BINET (= - .71, SD= .39); other less well known tests (= - .34, SD= .65); and tests that were not identified by name ( = - .57, SD= 1.11). The effect size differences for the WAIS are very different from the results for the others tests.

SUMMARY DISCUSSION

Part I: Meta-Analysis of IQ Tests

The d value for the Stanford Binet (-.71) and the Otis (-.59) are very different from the WAIS Full (-.42). Also, in comparison with the Full WAIS (-.42), the Verbal d-value

(-.49) is larger and the Performance d-value (-.32) is smaller. The d-value for the Full is about the average of the d-values for the Performance and Verbal subtests; however, the Full d would be expected to be larger because it is the aggregate of Verbal and Performance.

It is clear that the problem is with the Performance scale. The low d-value for the Performance scale is attenuating the d-value for the Full WAIS, which would be expected to be closer to the larger d-values for the other full IQ scales. However, comparing the WAIS d statistics with the Stanford Binet (-.71) and the Otis (-.59), the results for the Full WAIS are -.42; the Verbal is somewhat closer in magnitude (-.49); and the d-value for Performance is small (-.32). If the Performance scale were a strong measure of intelligence, then the d-value (i.e., validity if converted to r) would be larger.

There are at least two interpretations of these results for Performance. Suppose that intelligence is correlated with criminal behavior, as has been highly reported (Gordon, 1976; 1986; 1987). First, if intelligence tests differed only in random error or in irrelevant trivial content, the correlation between each test and criminality would depend on how well that test measures intelligence--the higher the validity, the higher the correlation. In these data, we find large differences between the WAIS Performance and other measures of intelligence, and the WAIS Verbal is in between the WAIS Performance and the other measures. If the above supposition is true, then these data suggest that the WAIS Performance has severe problems and the WAIS Verbal test has minor problems.

A second, alternative interpretation is that the Performance scale is the best true measure of intelligence, in which case all other scales must then have a contaminating factor that is correlated with criminality. That is, the higher correlation for other scales would not be due to the fact that they are better measures of intelligence but rather that they measure a contaminating variable that is correlated with criminality.

This would mean that there are severe problems with the Stanford Binet, the Otis, and the WAIS Verbal. However, there is little empirical evidence for the lack of validity for the Stanford Binet and the Otis intelligence tests whereas there is cumulative evidence for the lack of validity of the WAIS (Frank, 1983; Grady, 1983; Kaufman,1985; 1992; 2000; Waller & Waldman, 1990; Ward, Ryan, & Axelrod, 2000).

In summary, the meta-analytic results suggest serious problems with the WAIS, particularly with the WAIS Performance scale. In the next section we use confirmatory factor analysis to examine the validity of the substests of the Performance and Verbal scales.

METHOD

Part II: Confirmatory Factor Analysis of the WAIS

WAIS Subtests

In the above meta-analysis we examined the mean scores on the WAIS Full scale and the Performance and Verbal subscales. Now we examine the following subtests of the subscales.

Weschler (1944) developed the Verbal scale based on the theory that five subtests explain verbal ability: Information, Comprehension, Digit Span, Arithmetic, and Similarities. In addition, a Vocabulary subtest was developed for use as an "alternate" (p. 77). Only later did Vocabulary become a standard subtest of the Verbal WAIS.

Similarly, the Performance IQ Scale was also composed of five subtests: Picture Arrangement, Picture Completion, Block Design, Object Assembly, and Digit Symbol.

Appendix A lists these subtests and their descriptions taken from Weschler (1939) and reported in subsequent editions and WAIS manuals.

The Procedure

We used LISREL 8.0, confirmatory factor analysis (Jöreskog & Sörbom, 1993), and a meta-analyzed correlation matrix to estimate a WAIS measurement model. We obtained from the literature four primary correlation matrices from the first (1938) version to the recent 1997 version that each contained the same numbers and types of subtests.

The subtests are Comprehension, Information, Digit Span, and Arithmetic (from the Verbal subscale), and Picture Arrangement, Picture Completion, Block Design, Object Assembly and Digit Symbol (from the Performance subscale). For the meta-analysis, we computed a frequency-weighted average of the four primary matrices composed of these correlated variables. Appendix B-B4 lists the four primary matrices, their sources and samples sizes, and the meta-analyzed matrix. The total sample size for the meta-analyzed matrix was 1,171.

For the measurement model, we used the maximum likelihood method of parameter estimation and the following fit indices to judge the model fit.

Fit Indices. Bollen & Long (1992) recommends using several fit indices, not just one or two, to determine the fit of models. LISREL estimates report a chi-square statistic, which is sample size dependent. It is therefore especially important with the present large sample to use several fit indices, in addition to the chi-square. We used four: Root Mean Square Error of Approximation (RMSEA; Steiger & Lind, 1980); Adjusted Goodness of Fit Index (AGFI; Jöreskog & Sörbom, 1986); the Normed Fit Index (NFI; Bentler & Bonett, 1980); and the Parsimony Normed Fit Index (PNFI; James, Mulaik, & Brett, 1982).

Values indicating model fit are RMSEA, .05 - .07; AGFI, and NFI, .90 or greater; and PNFI, approximately .50 when goodness-of-fit indices are in the range of .90 (Bollen, 1989; Mulaik, James, Van Altine, Bennett, Lind, & Stilwell, 1989).

Model Specification

We estimated two models, both testing the Weschler (1938) theory. In model one, the Verbal subtests were specified to load on the Verbal factor, and the Performance subtests were specified to load on the Performance Factor. In model two, all subtests were allowed to load on a single factor, general mental ability (GMA).

RESULTS

Part II: Confirmatory Factor Analysis of the WAIS

Model One

The chi-square with 26 degrees of freedom = 392.638 (p = .0); RMSEA = .11;

AGFI = .872; NFI = .87; and PNFI = .63. The chi-square is expectedly large, due to the large sample size. However, the magnitudes for all of the other indices all indicate a lack of fit for the Verbal and Performance model (Figure 1; Table 2).



Go to Table 2


Model Two

The chi-square with 27 degrees of freedom = 541.071 (p = .0); RMSEA = .13;

AGFI = .83; NFI = .82; and the PNFI = .61. Thus, the GMA model with the nine subtests also does not fit the data well (Figure 2; Table 2).

These results do not support the two factor model of intelligence proposed by Weschler (1944) and which has been the foundation for subsequent WAIS versions since. The failure to estimate the hypothesized models is consistent with the literature reporting problems with the WAIS and also with the above meta-analytic results.

The single major problem with the WAIS from its very beginning was the lack of theoretical grounding (Flanagan et al., 2000). Research over the years has attempted to fit the WAIS into some theoretical structure, based primarily on exploratory factor analytic results. But confirmatory research is based on a priori specific theoretical models. We therefore went to the literature beginning with Weschler (1944) in an attempt to find common theoretical interpretations and methods of administration of the WAIS subtests. Using the information we found, we respecified and reestimated the WAIS factor structure, in model three.

Model Three

One common characteristic of the Performance subtests is the requirement of psychomotor manipulations requiring finger dexterity, hand-eye coordination, and writing speed. Three Performance subtests require psychomotor abilities: Block Design and Object Assembly require manipulating blocks and objects, and Digit Symbol requires coding--substituting symbols for numbers on a worksheet. These tasks are considerably different from the other Performance subtests, Picture Completion and Picture Arrangement which both call for visual manipulation of information (although, to a lesser extent, some psychomotor skills are also involved in the Picture Arrangement task).

However, because of their highly "visual" component, we grouped together Picture Completion and Picture Arrangement, and considered as another hypothesized factor the other three Performance subtests: Block Design, Object Assembly, and Digit Symbol.

There is one consistency in the literature, which is that Comprehension and Information come together in a single factor, although, again to some extent, these tasks require psychomotor skills (speed of articulation, requiring muscular ability), they both involve knowledge of facts, events, and other information. Flanagan et al. (2000) and others associate these two WAIS subtests with crystallized intelligence (Cattell, 1941). We therefore considered Comprehension and Information as a factor independent of the others.

But perhaps the greatest uncertainty as to the factorial structure and interpretation is for the Digit Span and Arithmetic tests. McGrew (1999) pointed out that there should be a cause for concern of the ever-changing interpretation of these two tests--most recently renamed in WAIS manuals as the "Freedom from Distractability Factor." There is considerable ambiguity in the literature that has given rise to a wide range of interpretations of these two subtests (Flanagan et al., 2000; Kamphaus, 1993; Kaufman, 1994). Their salient common feature, however, is that both involve mental manipulation of numerical values. In this way, these two subtests differ from all the others. We therefore grouped these two together.

In summary, the nine subtests can be theoretically classified into four groups:

Group A: Comprehension and Information; Group B: Digit Span and Arithmetic; Group C: Picture Arrangement and Picture Completion; and Group D: Block Design, Object Assembly, and Digit Symbol. These factors and their intercorrelations are presented in Table 3.


Go to Table 3

Results for Model Three

Model three fit the data fairly well. The chi-square with 21 degrees of freedom = 143.367, (p = .0); RMSEA = .07; AGFI = .94; NFI = .93; and the PNFI = .56. The large and significant chi-square is due to the large sample size, and the RMSEA is within the range that indicates a model fit, although it is in the upper end of that range. However, the AGFI, NFI and the PNFI indices all point to a good model fit.

However, the Lagrangian test statistic (Bollen, 1989) indicated a better model fit were the error terms for Picture Completion and Object Design allowed to intercorrelate. We therefore made this one change, allowing those correlated errors, and reestimated the model. When we did this, model 3 (a) fit the data extremely well.

For model 3 (a), the chi-square with 20 degrees of freedom = 112.263, (p = .0); RMSEA = .06; AGFI = .95; NFI = .96; and the PNFI = .53. Although the chi-square is still large and significant as can be expected with the large sample, the remaining indices indicate a good model fit (Figure 3; Table 2). The factor loadings in Figure 3 are as follows: For factor A, 8 = .74, Comprehension; 8 = .81, Information; 8 = .59, digit Span; 8 = .63, Arithmetic; 8 = .70, Picture Arrangement; 8 = .64, Picture Completion; 8 = .87, Block Design; 8 = .59, Object Assembly; 8 = .50, Digit Symbol.

OVERALL DISCUSSION

Previous exploratory factor analyses have shown inconsistent results for the WAIS. This stems from two facts. First, the WAIS has a very complicated structure with only a small number of subtests, which were not systematically put together. Second, the sample size for most test correlation matrices has been small, fewer than N = 200. Thus the factor structure is unstable in the face of sampling error.

The present combined matrix has a sample size of 1,171 and contains most of the WAIS subtests. The one factor model fits these data very poorly. The residuals for the one factor model clearly show two contrasting clusters: "Comprehension and Information" vs. "Object Assembly and Digit Symbol." The other tests form a gradient between these two. The four cluster model in Table 3 fit the data well.

Referring to Table 3, the first cluster is Comprehension and Information. These are both excellent markers for verbal aptitude. Both are power tests.

The fourth cluster consists of Object Assembly, Block Design, and Digit Symbol. Digit Symbol and Object Assembly are classic tests of psychomotor ability. The correlation between this factor and the verbal aptitude factor is .47, which is very close to the correlation between general cognitive ability and general psychomotor ability as found in the U.S. Job Service Database for the GATB. Block Design fits better with this cluster than with the third cluster though it is between.

The second cluster is made up of Arithmetic and Digit Span. Both correlate with verbal aptitude but are also highly correlated with psychomotor ability. The Arithmetic test matches results for highly speeded tests of numerical operations. It does not match the results for arithmetic reasoning which is the better marker for intelligence. Arithmetic and Digit Span are statistically perfectly parallel and form a good cluster. It is not clear why this is true for Digit Span.

The Third cluster is made up of Picture Arrangement and Picture Completion. Both are highly speeded tests, but they also involve more thinking than is true of psychomotor ability tests in the fourth cluster.

The four factors almost fit the pattern for a Guttman scale. As we go from the first factor to the fourth, each factor requires less thinking and more speed.

The split between the first two clusters shows why the Verbal model failed. Weschler’s theory assumes that both measure the same "verbal" factor. Instead, Arithmetic and Digit Span are highly speeded. The split between the third fourth clusters shows why the Performance model failed. According to Weschler’s theory, these tests all measure the same undefined "Performance" factor. The fourth cluster is a relatively pure measure of psychomotor ability. But the third cluster has tests that require a modest amount of thinking, in addition to speed.

The factor analysis offers one possible explanation for the results comparing criminals to the general population. The differences detected by the Performance scale have been deluded by the contaminating factor of speed. The results suggest that intelligence plays a major role in criminal behavior while psychological speed plays no role at all.

This would also explain why the Verbal IQ measure did more poorly than conventional measures such as the Otis and the Stanford Binet. It is highly contaminated with speed. The U.S. military has consistently found speeded tests to have lower predictive validity for job performance in all forms of work.

It is highly questionable whether psychomotor abilities should be regarded as cognitive abilities in the same light as verbal subtests (Carroll, 1993). In fact, technically, even though the physical movements for some subtests differ from the general processing speed and cognitive reaction-time task requirements for the Verbal subtests, "normal reading speed is to some extent governed by a psychomotor component" (Carroll, 1993, p. 536). The question raised, therefore, is "how much of the WAIS actually measures "cognitive" ability?

This is a serious question with potentially grave implications. Hundreds if not thousands of studies over several decades have been conducted using WAIS tests for clinical research. Furthermore, hundreds of people’s lives have been "directed" by the results of the WAIS--adults and children alike are classified and categorized and institutionalized according to intelligence test scores. Carroll (1993) discusses at length the difficulties in sorting out statistical variance in intelligence tests that is attributed to speed and reaction-time and level of ability. Less well researched is the variance attributable to psychomotor and not cognitive ability.
 

Table 1. Effect Size and Standard Deviation Values for Differences in IQ: Criminals vs. General Population
 
                 
Variable
K
N
SD 
SD 
80% Credibility Range
                 
All IQ Tests 
232
53,424
91.18
7.44
14.76
-.59
.50
-1.23 - .05
                 
Juveniles
123
16,540
87.15
7.26
14.25
-.86
.49
-1.49 - -.23
                 
Adults
67
21,850
92.69
5.56
15.99
-.49
.38
-0.97 - -.01
                 
WAIS
62
6,350
93.68
6.47
13.09
-.42
.44
-0.99 - .14
                 
Perform
107
10,152
95.15
6.68
13.17
-.32
.45
-0.91 - .26
                 
Verbal 
118
10,786
92.59
20.81
15.46
-.49
1.29
-2.28 - 1.29
                 
WISC
47
3,315
89.61
8.10
12.19
-.69
.55
-1.40 - .01
                 
WAIS/WISC
13
810
92.03
6.02
13.53
-.53
.42
-1.07 - .00
                 
Otis
13
10,313
91.13
3.85
13.67
-.59
.26
-0.92 - -.26
                 
S.Binet
33
20,746
89.32
5.86
17.02
-.71
.39
-1.22 - -.21
                 
Others
33
8,634
94.95
9.65
13.47
-.34
.65
-1.16 - .49
                 
No Name
31
3256
91.45
16.10
13.16
-.57 
1.11
-1.99 - .85
 
Return to Text


Note: K = number of means in the meta-analysis; N = total sample size across all means; = sample size weighted mean;

SD = standard deviation of  is the mean standard deviations for all the studies in the meta-analysis; = true effect size;

SD  = true variation after accounting for sampling error; 80% credibility range, computed using SD  = the 10% worst case and

10% best case  values; WAIS-Weschler Adult Intelligence Scale; WISC-Weschler Intelligence Scale-Children; Others includes

tests for which there were insufficient numbers to compute separate meta-analysis.


Table 2. Fit Indices for the Confirmatory Factor Analysis Models
 
 
Model  df 2 RMSEA  AGFI  NF PNFI
Model One 26 392.638, p = .0 .11 .87 .87 .63
Model Two 27 541.071, p = .0 .13 .83 .82 .61
Model Three 21  143.367, p =.0 .07 .94 .95 .56
Model Three (a) 20 112/263, p = .0 .06 .95  .96 .53

Note:

RMSEA = Root Mean Square Error of Approximation; AGFI = Adjusted Goodness of Fit Index; NFI = Normed Fit Index; PNFI = Parsimony Normed Fit Index.

Return to text


Table 3. The Four Factors and the Factor Correlations
 
 
Factor A B C C
A 1.00
B .80 1.00
C .80 .87 1.00
D .46 .65 .70 1.00

Note:

Factor A = Comprehension and Information; Factor B = Arithmetic and Digit Span’;

Factor C = Picture Arrangement and Picture Completion; Factor D = Object

Assembly, Block Design, and Digit Symbol.

Return to Text


References

Anastasi, A. (1983). Traits, states, and situations: A comprehensive view. In H. Wainer & S. Messick (Eds.),
    Principals of modern psychological measurement: A Festschrift for Frederic M. Lord (pp. 345-356).
    Hillsdale, NJ: Erlbaum.

Bentler, P.M., & Bonett, D.G. (1980). significance tests and goodness-of-fit in the analysis of covariance
    structures. Psychological Bulletin, 88, 588-600.

Bollen, K.A. (1989). Structural equations with latent variables. New York, NY: John Wiley& Sons.

Bollen, K. A., & Long, J. S. (1992). Testing structural equation models: Introduction.
    Manuscript used in course on structural equation modeling at the University of Michigan, Ann Arbor, July
   1992. The manuscript later appeared as an introductory chapter to Testing Structural Equation Models,
    edited by Kenneth A. Bollen and J. Scott Long, Sage University Press.

Brown, M.W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K.A.
    Bollen & J.S. Long (Eds.), Testing structural equation models (pp. 445-455). Newbury Park, CA: Sage.

Byrne, B.M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts,
    applications, and programming. Mahwah, NJ: Lawrence Erlbaum Associates.

Carroll, J.B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York, NY: Cambridge
    University Press.

Cattell, R.B. (1941). Some theoretical issues in adult intelligence testing. Psychological Bulletin, 38, 592.

Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: A critical experiment. Journal of Educational
    Psychology, 54, 1-22.

Flanagan, D.P., McGrew, K.S., & Ortiz, S.O. (2000). The Weschler Intelligence Scales and Gf-Gc theory.
    Boston, MA: Allyn and Bacon.

Frank, G. (1983). The Wechsler enterprise: An assessment of the development, structure, and use of the
   Wechsler tests of intelligence. New York: Pergamon.

Gordon, R.A. (1976). Prevalence: The rare datum in delinquency measurement and its implications for the
    theory of delinquency. In M.W. Klein (ed.), The Juvenile Justice System. Beverly Hills, CA: Sage
    Publications, pp. 201-284.

Gordon, R.A. (1986). Scientific justification and the race-IQ-delinquency model. In Timothy F. Hartnagel and
    Robert A. Silverman (eds.). Critique and Explanation: Essays in Honor of Gwynne Nettler. New Brunswick,
    NJ: Transaction Books, pp. 91-131.

Gordon, R.A. (1987). SES versus IQ in the race-IQ-delinquency model. International Journal of Sociology and
    Social Policy, 7, 30-96.

Gregory, R.J. (1999). Foundations of intellectural assessment. Needham Heights, MA:
    Allyn & Bacon.

Herrnstein, R. J., & Murray, C. (1994). The bell curve. New York, NY: The Free Press.

Hunter, J.E., & Collins, J.M. (2000). Meta-Analysis for a Special Population. Michigan State University, East
    Lansing, MI. Manuscript in progress.

James, L.R., Mulaik, S.A., & Brett, J.M. (1982). Causal analysis: Assumptions, models, and data. Beverly Hills, CA: Sage.

Jöreskog & Sörbom, (1986). LISREL VI: Analysis of linear structural relationships by
    maximum likelihood and least square methods. Morresville, IN: Scientific Software, Inc.

Kamphaus, R.W. (1993). Clinical assessment of children’s intelligence. Boston: Allyn
    and Bacon.

Kaufman, A. S. (1985). Review of the Wechsler Adult Intelligence Scale-Revised. The ninth mental
    measurements yearbook, 1699-1703. Lincoln, NE: The Buros Institute of Mental Measurements.

Kaufman, A.S. (1994). Intelligence testing with the WISC-III. New York: Wiley.

Kaufman, A.S. (2000). Tests of intelligence. In Robert J. Sternberg (ed.), Handbook of ntelligence.
    Cambridge, UK: Cambridge University Press.

Matarazzo, J. D. (1985). Review of the Wechsler Adult Intelligence Scale-Revised. The ninth mental
    measurements yearbook, 1703-1705. Lincoln, NE: The Buros Institute of Mental Measurements.

Matarazzo, J.D. (1972). Weschler’s Measurement and Appraisal of Adult Intelligence.
    Baltimore, MD: The Williams & Wilkins Company.

McGrew, K.S. (1999). The Weschler freedom-from-distractibility index: A tale of three  subtests.
    CommuniquJ , 27(8), 24.

Mulaik, S.A., James, L.R., Van Altine, J., Bennett, N., Lind, S., & Stilwell, C.D. (1989).
    Evaluation of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105, 430-445.

O’Grady, K. E. (1983). A confirmatory maximum likelihood factor analysis of the WAIS-R. Journal of
    Consulting and Clinical Psychology, 51, 826-831.

Psychological Corporation. (1997). WAIS-III WMS-III technical manual. San Antonio, TX: Author.

Plake, B.S., Gutkin, T.B., Wise, S.L., & Kroeten, R. (1987). Confirmatory factor analysis of the WAIS-R:
    Competition of models. Journal of Psychoeducational Assessment, 3, 267-272.

Steiger, J.H., & Lind, J.C. (1980, June). Statistically based tests for the number of common factors. Paper
    presented at the Psychometric Society Annual Meeting, Iowa City, IA.

Waller, N.G., & Waldman, I.D. (1990). A reexamination of the WAIS-R factor structure. Psychological
    Assessment: A Journal of Consulting and Clinical Psychology, 2, 139-144.

Ward, L. C., Ryan, J. J., & Axelrod, B. N. (2000). Confirmatory factor analyses of the WAIS-III standardization
    data. Psychological Assessment, 12, 341-345.

Weschler, D. (1939). The measurement of adult intelligence, 1st ed., Baltimore, MD: Waverly Press, Inc.

Weschler, D. (1944). The measurement of adult intelligence, 3rd ed., Baltimore, MD: The Williams &
    Wilkins Company.


APPENDIX A

The WAIS Subtests and Summary Descriptions


Subtest What It Measures
Comprehension Degree of social acculturation
Information Factural knowledge: Persons, places, common phenomena
Digit Span Auditory memory for numbers
Arithmetic Basic computational skills
Picture Arrangement Sequential thinking; ability to see relationships
Picture Completion Recognition of missing objects in a picture
Block Design Reasoning, problem solving, spatial visualization
Object Assembly Ability for form visual concepts quickly and then translate them into rapid hand responses
Digit Symbol Learning of new task; visual-motor dexterity; clerical speed

Note: Summary descriptions from Gregory (1999).



APPENDIX B

Correlation Matrices for the Confirmatory Factor Analysis

Correlation Matrix from Weschler (1944)
 
Subtest CO IN DS AR PA PC BK OB SM
Comprehension (CO) 1.00
Information (IN) .705 1.00
Digit Span (DS) .594 .534 1.00
Arithemetic (AR) .438 .372 .470 1.00
Picture Arrangement (PA) .477 .451 .459 .341 1.00
Picture Completion (PC) .492 .465 .420 .288 .482 1.00
Block Design (BK) .416 .357 .352 .274 .359 .467 1.00
Object Assembly (OB) .597 .516 .519 .416 .365 .534 .506 1.00
Digit Symbol (SM) .563 .516 .552 .523 .516 .433 377 .613 1.00

Note: N = 235; 35-49 Age Group; Correlations from Weschler (1944).



APPENDIX B1

Correlation Matrix from Weschler (1944)
 
 
Subtest CO IN DS AR PA PC BK OB SM
Comprehension (CO) 1.00
Information (IN) .668 1.00
Digit Span (DS) .444 .484 1.00
Arithmetic (AR) .517 .596 .443 1.00
Picture Arrangement (PA) .391 .384 .264 .366 1.00
Picture Completion (PC) .456 .465 .297 .403 .389 1.00
Block Design (BK) .465 .488 .399 .514 .484 .566 1.00
Object Assembly (OB) .286 .224 .155 .233 .272 .439 .536 1.00
Digit Symbol (SM) .478 .561 .539 .429 .444 .400 .538 .319 1.00

Note: N = 235; 20-34 Age Group; Correlations from Weschler (1944).


APPENDIX B2

WAIS III Correlation Matrix from Ward, Ryan, and Axelrod, 2000
 
 
Subtest CO IN DS AR PA PC BK OB SM
Comprehension (CO) 1.00
Information (IN) .72 1.00
Digit Span (DS) .41 .41 1.00
Arithmetic (AR) .57 .62 .50 1.00
Picture Arrangement (PA) .50 .54 .34 .44 1.00
Picture Completion (PC) .51 .50 .29 .39 .52 1.00
Block Design (BK) .49 .45 .30 .45 .49 .53 1.00
Object Assembly (OB) .45 .37 .22 .34 .46 .53 .57 1.00
Digit Symbol (SM) .38 .42 .37 .43 .43 .45 .46 .36 1.00

Note: N = 175; 55-90 Age Group.


APPENDIX B3

WAIS-R Covariance Matrix from Smith, Kokmen, Tangalos, & Kurland (1992)
 
 
Subtest CO IN DS AR PA PC BK OB SM
Comprehension (CO) 7.89
Information (IN) 4.74 7.99
Digit Span (DS) 2.79 2.90 7.96
Arithmetic (AR) 2.76 3.43 2.88 7.49
Picture Arrangement (PA) 2.77 2.99 3.32 2.42 6.46
Picture Completion (PC) 3.37 3.31 2.14 2.44 3.17 8.13
Block Design (BK) 2.05 2.67 2.28 2.79 2.87 2.98 7.23
Object Assembly (OB) 1.21 1.41 1.27 1.62 2.08 2.91 4.00 8.21
Digit Symbol (SM) 1.54 1.87 1.93 1.79 1.56 1.63 2.87 2.10 7.43

Note: N = 526; 55-97 Age Group; We standardized this matrix to be used in the meta-analysis as a correlation matrix, to be consistent with the other three correlation matrices in the meta-analysis.


APPENDIX B4

Meta-Analyzed Correlation Matrix
 
Subtest CO IN DS AR PA PC BK OB SM
Comprehension (CO) 1.00
Information (IN) .60 1.00
Digit Span (DS) .35 .36 1.00
Arithmetic (AR) .36 .44 .37 1.00
Picture Arrangement (PA) .39 .42 .46 .35 1.00
Picture Completion (PC) .42 .41 .27 .31 .44 1.00
Block Design (BK) .27 .35 .30 .38 .42 .39 1.00
Object Assembly (OB) .15 .17 .16 .21 .29 .36 .52 1.00
Digit Symbol (SM) .20 .24 .25 .24 .23 .21 .39 .27 1.00

Note: N = 1,171; Frequency-weighted mean correlation matrix using data reported in
the above Appendices B - B3.


Figure 1

Model One


Figure 2

Model Two


Figure 3

Model 3 (a)