A take down of Visual Puzzles

Jan 19, 2025

In the WAIS 4, which came out in 2008, they used the 4-factor model - Verbal, Working Memory, Processing Speed and Perceptual Reasoning. In the WAIS 5, which came out late 2024, they have switched to the 5-factor model - Verbal, Working Memory, Processing Speed, Visuospatial (VSI) and Fluid Reasoning (FRI). The only difference was to split Perceptual Reasoning into two factors: Visuospatial and Fluid Reasoning.

I agree with this change and I think most people would. It seems obvious that spatial ability is quite different from logical reasoning and should not be combined as if it is the same. But why did the WAIS 4, which was the gold standard IQ test for some decades, believe they were the same?

I believe the answer lies in their two subtests: Block Design (BD) and Visual Puzzles (VP).

It has been my belief for some years now that these subtests, which previously purported to measure Perceptual Reasoning and now purportedly measure Visuospatial, are in fact poor indicators of one’s true Visuospatial ability. And because of this, the factor analysis became muddied and no clear spatial factor was discovered.

Two classes of approximation are forced when the number of indicators are limited. In the four-factor models of WAIS-IV scores, reported in prior clinical studies of the 10 subtest data sets, the statistical discrimination between the five factors is diminished (Bowden et al., 2011; Wechsler, 2008b). Estimation of the five-factor models in a clinical 10-subtest data set preserves the distinction between the five broad abilities, but requires freeing up parameters of secondary interest (like cross-loadings) or using constituent indicators in lieu of composite scores. The fit of a more complex model is likely to be better but more susceptible to “improper solutions” during estimation (Brown, 2006; McDonald, 1989). The empirically observed consequence of decreased degrees of freedom is an increase in estimation failures.1

I’m not a stats guy, so I can’t be sure exactly what they are saying here. Therefore, I am speculating - I can’t know precisely why they went with 4 factors instead of 5. However, I am not speculating when I say that BD and VP are poor spatial tests, because I can prove it.

According to the same 2023 study I just referenced, BD and VP have a Gv loading of 0.8. (Gv is the general factor of Visuospatial ability). This is a strong loading, so how can they be bad tests?

Visual Puzzles: select 3 answers to build the image at the top

Lets start with Visual Puzzles:

1. It is a 2D test

Do I need to explain this point? 2D will almost always be inferior to 3D tests when it comes to Gv. I know of one good 2D test that I might put in the elite spatial test category, and it is not VP. Block Design is also mostly 2D, there is some rotation but its mostly incidental.

2. VP is not a “pure” measure of Visuospatial ability:

Visual Puzzles correlated significantly with measures of visuospatial reasoning, verbal learning and recall, mental flexibility, processing speed, and naming, which accounted for 50% of the variance in Visual Puzzles performance. The results indicate that Visual Puzzles is not a pure measure of visuoperceptual reasoning, at least in a mixed clinical sample, because memory, mental flexibility, processing speed, and language abilities also contribute to successful performance of the task. Thus it may be important to consider other aspects of cognitive functioning when interpreting Visual Puzzles performance.2

That was from a 2011 study, and I take that to mean that its index loading is not as strong as it could be because its shared among other indexes. Its not really possible to have a subtest with very strong Gv loading and also strong loading on PSI, WMI and FRI.

3. The sex difference is small

This is the strongest argument. Sex differences in spatial ability load on Gv. Meaning that items with a larger sex difference also have larger Gv loadings. The same goes for subtests – those with larger sex difference also have larger Gv loadings (unless it’s a sports quiz or something).

The sex differences in Visual Puzzles (and Block Design) is about 3-4 IQ points. Quite negligible. The sex difference in serious spatial tests involving mental rotation or mentally changing perspective is 9-12 IQ points.3 That’s really all you need to know.

But it doesn’t matter right? I can hear you saying it. None of that matters, because VP has a Gv loading of 0.8, case closed.

WRONG

VP and BD have Gv loadings of 0.8 compared to other tests in the WAIS. Factor analysis is relative, is not an absolute measure of Gv loading. VP measures spatial ability a lot more than: Vocabulary, General Knowledge, Arithmetic, Digit Span etc. This is not impressive, as these subtests are not designed to measure Gv at all. VP only has a high Gv loading because there is nothing good to compare it to.

Im not a stats guy, and I’ve never done a factor analysis and I hope I never have to, but I know a spatial test when I see one. And VP and BD do not pass the smell test. If you threw in a serious, hard-hitting spatial test like Mental Rotation, I guarantee you the loading of Visual Puzzles would drop a lot.

Given the evidence presented here, I think it is safe to assume that VP and BD have Gv loadings of about 0.6.

When making an IQ test, index loading (aka group factor loading) is more important than g-loading. The g-loading comes from many batteries and breadth of group factors. VP has a solid g-loading, but a poor index loading. You could include it in an IQ test, but you also need to put in a proper spatial test or two, to make sure you are measuring Gv well enough.

A 1985 meta-analysis also found a difference of 0.73 SD

This is a 1995 meta-analysis - they found exactly the same thing, a difference of about 0.73SD

Just showing that sex differences in spatial ability are found in all ages

Block Design, 2nd from the bottom, was clearly chosen for the WAIS because it has almost no sex difference

The 40 item Mental Rotation Test should be more reliable, so the true sex gap in spatial ability might be 0.70 SD or 10.5 IQ points

Sudarshan NJ, Bowden SC. Common Factor Structure of the Ten Subtest Wechsler Adult Intelligence Scale-Fourth Edition in a Clinical Sample and 15 Subtest Version in the Standardization Sample. Arch Clin Neuropsychol. 2023 Nov 22;38(8):1646-1658. doi: 10.1093/arclin/acad035. PMID: 37222085; PMCID: PMC10681435.

Fallows, Robert & Pella, R. & McCoy, Karin & O'Rourke, Justin & Hilsabeck, R.. (2011). What Does WAIS-IV Visual Puzzles Measure?. Archives of Clinical Neuropsychology. 26. 552-553.

Voyer, D., Voyer, S., & Bryden, M. P. (1995). Magnitude of sex differences in spatial abilities: A meta-analysis and consideration of critical variables. Psychological Bulletin, 117(2), 250–270. doi:10.1037/0033-2909.117.2.250

Yoon, So. (2012). Psychometric properties of the Revised Purdue Spatial Visualization Tests: Visualization of Rotations (the Revised PSVT:R).

I have also seen first hand data from my CASA test, the updated Guay’s Visualization of Views test had a sex gap of 10-15 IQ points, although it had a high floor.

Breaking New Ground

Discussion about this post