Paul Maxim's "Renorming Ron Hoeflin's Mega Test"
Published with permission from the author
1. Ron Hoeflin's "Sixth Norming of the Mega Test."
2. Fred Vaughan's "Intelligence Filters." (appears in the online version of Gift of Fire)
3. Kevin Langdon's Reply to Paul Maxim. (Gift of Fire no. 81, republished and revised here)
4. Darryl Miyaguchi's Comments on Paul Maxim's renorming.
5. Darryl Miyaguchi's Generic I.Q. Chart
6. SAT percentile ranks table
RENORMING RON HOEFLIN'S MEGA TEST Copyright (C) 1996 by PAUL MAXIM Reference Data Used: 1. Data sheets provided by Dr. Hoeflin, showing Mega test scores for 531 testees, plus prior test scores they reported, including the following: 220 on SAT, 106 on GRE, 76 on LAIT, 80 on Cattell, 75 on CTMM, 46 on Stanford-Binet, 34 on WAIS, 28 on AGCT, and 28 on MAT. 2. Score breakdowns for 5,157,642 SAT testees during the period 1984-1989. 3. GRE norming data from ETS. 4. The IQ scale chart ("Selectivity by IQ") ascribed to Kjeld Hvatum. 5. A description by Dr. Hoeflin of his Mega test norming process, as con- tained in his letter to Mr. Maxim of December 19, 1995. My renorming does not attempt to theoretically analyze that norming process, but rather replicates it, using the same data Dr. Hoeflin used; hence, it constitutes an audit of his results. NOTE: This article has not been reviewed by, or approved by, Dr. Hoeflin. Description. The method Dr. Hoeflin used to norm his Mega test, based on 693 reports he received of prior test scores, can be illustrated in terms of his largest prior score sample, consisting of 220 SAT scores. (All statements herein pertain to SAT before its recent recentering.) a) In order to norm the Mega test at the "4-sigma" level (IQ 164), it is first necessary to determine which score level on SAT corresponds to "4-sigma" in the general population, equivalent to the top .00333% of all individuals. b) Since SAT testees are slightly more intelligent than the population as a whole, by a factor of about 4/3, their "4-sigma" percentile is raised to .00444%, representing the top 1/22,500th of all SAT testees. c) This percentage is then applied to total number of scores for the period tabulated, indicating that 229 scores were at the "4-sigma" level or above. d) Beginning with the top SAT score level of 1600 on Math plus Verbal (a "perfect" score), one can then count downward 229 scores until the "4-sigma" level is reached for this data aggregate. e) Dr. Hoeflin's norming assumed that 1550 on SAT represented its "4-sigma" level. However, his own data reveals the following numbers: From this data, it can be interpolated that a score of 1565 1591-1600 - 35 on SAT represents the "4-sigma" level; that is, all scores 1581-1590 - 8 at 1565 or above number .00444% of the total. 1571-1580 -149 1561-1570 - 71 f) It is now necessary, from inspection of the prior SAT scores reported by Mega testees, to determine how many of them were at 1565 or above. Dr. Hoeflin presumed that his top ten SAT scorers were in the "4-sigma" category; their scores were as follows: 1595, 1586, 1582, 1580, 1570, 1565, 1560, 1560, 1556, and 1555. However, only six of these scores are at or above 1565, and so it is they which represent the true "4-sigma" group. g) Once it is determined that the top six SAT scores represent the "4-sigma" category on this test, the next step is to pick out, among those 220 testees reporting SAT scores, their top six Mega scores: these are 44, 44, 43, 43, 40, and 39. In other words, an implicit assumption is made that the SAT-reporting group will attain the same number of "4-sigma" scores on the Mega test as they did on SAT; hence, the lowest of those six top Mega scores represents the as- sumed "4-sigma" threshold on Mega, for the SAT-reporting group. This score was 39. h) This putative "4-sigma" level on Mega will now have to be combined and correlated with the "4-sigma" level derived from a similar procedure applied to testees reporting GRE scores, to testees reporting CTMM scores, etc. Then, when all applicable "4-sigma" levels have been determined, a weighted average can integrate them in terms of sample size, so as to arrive at one "4-sigma" score level for the Mega test as a whole. GRE Sample. Dr. Hoeflin did not report to me the GRE (combined Math plus Ver- bal) score level he equated with "4-sigma," but data emanating from ETS indi- cates that this should be "1620." The problem in attempting to norm a high- IQ test on GRE scores at the 4-sigma level is that, around the late 1970's, ETS reduced the ceiling on each GRE from 900 to 800, making it impossible for any testee, from that point on, to record a "4-sigma" score. A number of Dr. Hoeflin's Mega testees apparently took the GRE before its ceiling restrictions were imposed, since three of them reported scores of 1620 or above. However, since the Mega test was normed around 1986, it is possible that one or more of the 106 testees reporting GRE scores was prevented, by the ceiling, from at- taining a "4-sigma" score. In order to "adjust" for this possibility, I have increased the number of GRE scores at the "4-sigma" level from three to four. The next step (as in the case of the SAT sample) is to count off the top four Mega test scores attained by testees in the "GRE group"; these were as fol- lows: 43, 43, 41, and 40. This indicates that a raw score of 40 on the Mega test corresponds to "4-sigma" (1620) in the GRE sample. LAIT Sample. This represented the next most numerous group, accounting for 11% of all prior score reports used by Dr. Hoeflin. In his letter to me dated December 19, 1995, Dr. Hoeflin wrote: "Of 77 (Mega testees) who reported LAIT scores, 16 scored IQ 164 or higher on LAIT. For this same sample of 77 people, the top 16 scored 33 or higher on my Mega test. This puts the four sigma level (for the Mega test at) 33..." But there is a problem here, that Dr. Hoeflin has failed to examine critically. To begin with, the incidence of putative "4-sigma" scores in the LAIT sample (16 of 77) is far higher than that of any other sample, and in percentage terms (20.78%) is over six thou- sand times higher than the incidence of "4-sigma" in the general population (.00333%). This stems directly from the fact that LAIT is an inflationary test, and generated numerous invalid "4-sigma" scores. This distortion may also be noted in the fact that, in order for 20.78% of any sample to score at the "4-sigma" level or above, the entire sample would require a mean IQ over 3.2 sigma -- that is, over IQ 152, which is about ten points higher than the general profile for MEGA testees, and for LAIT testees as well. Dr. Hoeflin's Mega test data indicates that LAIT scores were, on average, 4 IQ points higher than Mega scores, which in turn were about one point higher than CTMM scores, even though the CTMM was taken, on average, about a decade earlier than Mega. This means that LAIT scores were at least five points higher, on average, than scores emanating from a professional test such as CTMM, which was normed on about 60,000 members of the general population, not on prior, anecdotally-reported test scores. Furthermore, Dr. Hoeflin's data indicates that 17 LAIT scores at the 4-sigma level and above were, on average, eight IQ points higher than Mega test scores attained by the same testees, with a mean time interval between tests of about six years. Now, when scores from an inflated test (LAIT) are used uncritically to norm a subsequent test (Mega), the second test tends to acquire some of the inflationary characteristics of its predeces- sor, producing what might be called the "house of cards" syndrome -- that is, one inaccuracy piled atop another. To compensate for this factor, I deducted four points from each reported LAIT IQ score used by Dr. Hoeflin, which re- duced the number of putative "4-sigma" LAIT scores from 16 to six. (This is probably still too high, since six 4-sigma scores out of 77 is still 2300 times greater than the incidence of 4-sigma in the population as a whole.) Follow- ing this adjustment, the six highest Mega scores attained by the LAIT testees were as follows: 44, 43, 42, 41, 40, and 39, which accords very well with the "4-sigma" level on the Mega test suggested by the SAT and GRE samples. Hence, the 4-sigma level for the LAIT sample on the Mega test is pegged at "39." CTTM Sample. Dr. Hoeflin's data notes only two Mega testees who reported prior CTMM scores over "4-sigma," which on this test is 164 IQ; these scores were 179 and 174. Similarly, among the 75 Mega testees who reported CTMM scores, the two highest Mega scores were both "43," indicating that this score should be presumed to represent the "4-sigma" level on Mega. But for some rea- son unknown to me, Dr. Hoeflin did not use this figure verbatim; he instead "adjusted" it downward to "40," which seems to have no statistical justifica- tion. Hence, I have used the original CTMM-generated Mega score of 43 in my renorming of Dr. Hoeflin's test. Stanford-Binet Sample. 46 Mega testees reported prior scores on the Stanford- Binet, including a hefty score of "230" attained by Marilyn vos Savant, who also scored "46" on Mega. For some reason, Dr. Hoeflin did not use Marilyn's score in his Mega test norming -- perhaps because it is identifiably a youth- ful score, but I have included it in my renorming. With Marilyn's score in- cluded, the S-B testees manifested a mean IQ of 146.9, about 7 IQ points high- er than the CTMM testees, which suggests that there may have been other youth- ful scores (aside from Marilyn's) included in this sample. There are five Stanford-Binet scores over 4-sigma, and the corresponding top five Mega test scores (for the S-B group) are 46, 40, 34, 32, and 29. This points toward 29 as the raw score on Mega which best corresponds to "4-sigma" in the norming sample, but is ten points below the other indications. WAIS Sample. 34 Mega testees reported prior WAIS scores, representing 4.9% of the total scores used by Dr. Hoeflin. Two of these scores were "4-sigma" (164 and 162, since the standard deviation on WAIS is 15), and the two top Mega scores attained by the WAIS testees were 34 and 33. Cattell, AGCT, and MAT Samples. Since there is no indication that any of the testees in these samples reported a "4-sigma" score, they cannot be used for norming the Mega test at the 4-sigma level. Combining Sample Results. The various "4-sigma" levels (on the Mega test) gen- erated by each prior test sample cited above may now be combined into one, by means of a "weighting" technique, so that the effect of each sample will be proportional to its size; this is done by computing a weighted average, as fol- lows: Prior Test Per Cent of the 4-Sigma Level on Mega Utilized Norming Sample Test for this Sample Product SAT 31.7% 39 1,236.3 GRE 15.3 40 612.0 LAIT 11.0 39 429.0 CTMM 10.8 43 464.0 Stanford-B. 6.7 29 194.3 WAIS 4.9 33 161.7 80.4% 3,098.7 The next step is to divide the sum of products (3,098.7) by the sum of weights (80.4), so as to arrive at a quotient of 38.54, representing the putative "4- sigma" level on the Mega test for this group of 531 testees; another group might yield slightly different results. It would also be helpful to repeat the above norming process at the 3-sigma and 2-sigma levels, and then draw a smooth curve through all points; however, it should be fairly obvious that it is the norming at the 4-sigma level which most directly affects societies such as Pro- metheus and Mega. At this point, before equating "38.54" raw points on Mega with 164 IQ on Stanford-Binet, a slight adjustment must be made to account for "regression to the mean." Since we are dealing with the very top scores, and since a decade or more elapsed, on average, between taking the Mega test and its predecessors, this group of testees should expect to score slightly lower on the Mega than on their prior tests, all other factors being equal. This means that the "4-sigma" level we computed for the Mega test corresponds to slightly less than "164 IQ" on the Stanford-Binet, and the simplest way to in- corporate this adjustment is to set the "4-sigma" level on the Mega test at 39. If the "4-sigma" level on Mega is moved upwards by three raw points, from Ron Hoeflin's value of "36" to "39," this also implies that the raw score corres- ponding to the 4.75 sigma level must be moved up by a similar amount, from its former setting of "43," to a new value of "46." Another important factor to be considered in appraising the accuracy of the Mega test norming is whether or not it is generating too many IQ's at the 4.75 sigma level. In 1992, the Wall Street Journal reported that 12 such scores had resulted from 4,000 Mega test administrations, for a percentage in- cidence of .3% -- but this is 3,000 times greater than the incidence of 4.75 sigma in the general population. It can be shown that, in order for .3% of the sample to score at a mean (not a threshold) IQ of "176," the entire sample would need a mean IQ of about 155. However, Hoeflin's norming sample of 531 testees (82% of which consisted of OMNI readers, and 18% of high-IQ society members) had a mean IQ somewhere in the low 140's, while his overall sample of 4,000 (over 90% of whom were OMNI readers) probably had a mean IQ below 140. At a 141 IQ level (5 in 1,000), five thousand tests would have to be administered to yield the expectation of only one "176 IQ" score; this again leads me to believe that only one, or pos- sibly two, valid "Mega-level" IQ's have ever been identified through the Mega testing program. Here is a tabular summary showing (by sample type) the incidence of "4-sigma" scores used in norming -- and in renorming -- the Mega test: Name of Size of Norm- ---Incidence of "4-Sigma" Scores--- Prior Test ing Sample As Used by Hoeflin -As Corrected-- No. Per Cent No. Per Cent SAT 220 10 4.5% 6 2.7% GRE 106 ? ? 4 3.8 LAIT 77 16 20.8 6 7.8 CTMM 75 2 2.7 2 2.7 Stanford-B. 46 4 8.7 5 10.8 WAIS 34 2 5.9 2 5.9 Cattell ) AGCT ) MAT ) These tests were not used for norming at the 4-sigma Miscellaneous) level, since no "4-sigma" scores were reported. In the above Table, the one glaring anomaly which stands out is the extremely high incidence of "4-sigma" scores (20.8%) attributed to the LAIT norming sample. But in his letter to me of December 19, 1995, Dr. Hoeflin stated as follows: "Of 25,000 (testees) who took the LAIT, let's say 625 scored at or above the four sigma level. Of 4,000 who took my Mega test, almost exactly 100 scored at or above the four sigma level. If we compare these ratios, we find that they are identical, namely: 625/25,000 = 100/4,000 = 1/40." (This equals 2.5%) Herein lies another conundrum: If Dr. Hoeflin believes it reasonable for 2.5% of all LAIT testees to score "4-sigma," why did he uncritically use (for purposes of norming the Mega test) a LAIT sample whose incidence of "4-sigma" scores was eight times greater, and did he not realize that this would inflate his Mega test? In the March 1996 issue of Dr. Hoeflin's magazine, The Puzzler (P. 16), Mr. Langdon noted that only one out of three of the LAIT testees he originally "qualified" for admission into the Four Sigma Society had true 164 IQ's. In ad- dition, Dr. Hoeflin noted, "Langdon's second norming actually lowered everyone's IQ at the upper levels by 5 IQ points compared with his first norming." In other words, both Langdon and Hoeflin were well aware of the inflationary ef- fects of LAIT on testee IQ assessments. Going back even further in time (1986), Dr. Hoeflin stated, in Gift of Fire: "I do not trust the norming of the Langdon test, and would prefer that we (i.e., the Prometheus Society) adopt a more strin- gent norming procedure...Inflated IQ standards that are not in harmony with real- world facts strike me as dishonest...The Mega Society likewise has far more mem- bers than its purported one-in-a-million standard warrants." But if Dr. Hoeflin advocated "more stringent norming procedures" for the LAIT in 1986, one wonders why he didn't use "more stringent procedures" to norm his own Mega test around the same time, particularly as regards his casual acceptance of anomalous "4- sigma" levels in his LAIT-derived sample? To answer my own question, the only reason I can see why Dr. Hoeflin did not utilize the "more stringent" norming standards he recommended to Mr. Langdon is because this would have reduced the membership in Dr. Hoeflin's Mega Society to two persons. In other words, Dr. Hoeflin's 1986 statement in Gift of Fire was completely correct -- too bad he didn't stick to it!
Return to the Uncommonly Difficult I.Q. Tests page.