1. With the percentages fixed at the lowest 0.5% as presumably deficient and the next 1.0% doubtful, these borderlines for tested deficiency have the advantage of being more conservative than those at present advocated. On the basis of our empirical knowledge this is an important reason for urging borderlines on the scales at least as low as those suggested herein. Disregarding the extremely high borderlines which have fallen into disuse, we still find that social deficiency is often presumed for those testing above the lowest 1%. With the new Stanford scale, Terman presumes “definite feeble-mindedness” below an Intelligence Quotient of .70, below which he finds that 1% of 1000 unselected children fell. I Q's from .70 to .80 would include his uncertain group, which he describes as “border-line deficiency, sometimes classified as dullness, often as feeble-mindedness” (57, p. 79). His tables show 5% below an I Q of .78. We have no results with a random group of adults by which to judge how many would be below these borders. When the I Q has been applied to scores with other scales a larger percentage has often been found to be excluded. Fernald has shown that Haines' suggestion of a coefficient of .75 with the Point scale would exclude 16% of 100 Cincinnati girls selected at random from among those who left school at 14 years to go to work (16).
Unless the examiner wishes to assume that social inefficiency is more frequent than it has been demonstrated by the practical tests of life, the success of those who have low quotients should make him exceedingly cautious about accepting the various borderlines which have been suggested by those who have not tested their criteria by the percentage method. It is not merely that the borderlines should be lowered, but that they should be lowered under some consistent plan so that we should know as much as is possible about their significance in the prediction of ultimate social inefficiency, and that we should be able to readjust them on the basis of new data or to new scales.
With the Point scale Yerkes and Wood say regarding “the coefficient of intelligence .70, which we accept as the upper limit of intellectual inadequacy or inferiority”: “Our data indicate that grades of intellectual ability measured by the coefficient .70 or less are socially burdensome, ineffective, and usually a menace to racial welfare” (226). With the most reliable part of their data, that for children from 8-13, this coefficient excludes the lowest 8.39%. Moreover, the lowest group for which they suggest a borderline, the dependents, falls at .50 or below and includes 1.05%.
2. A second practical advantage of the percentage borderlines on the scale is that they make no assumption as to the uniformity of the norms for the different ages. Except for the Stanford and the Jaederholm scales, there is little evidence that the age norms exclude equivalent portions of the children at the different life ages.
Goddard's Table I gives the data from which the following percentages of those who pass the norm are calculated, not counting those above 11 years, since the older groups are clearly affected by selection:—5 yrs., 88%; 6 yrs., 79%; 7 yrs., 81%; 8 yrs., 51%; 9 yrs., 60%; 10 yrs., 73%; 11 yrs., 44%. Kuhlmann's figures when using his own revised scale with public school children including the seventh grade, are:—6 yrs., 100%; 7 yrs., 95%; 8 yrs., 90%; 9 yrs., 87%; 10 yrs., 81%; 11 yrs., 80%; 12 yrs., 57%. It is clear that any change in the test norm from age to age must disturb the quotient which is based on these norms, although it would not affect the intelligence coefficient with the Point scale.
3. A third advantage of the percentage method arises from the fact that we cannot presume that the same ratio in terms of the scale units will exclude the same degrees of ability at different ages even when the norms for these ages are properly adjusted. The earlier results with the Stanford revision show a large variation as to the percentage excluded by the same I Q at different ages. For example, an I Q of .76 would have shut out 1% of 117 non-selected 6-year-olds, 2% of 113 9-year-olds and 7% of 98 13-year-olds. The lowest 1% of the last group was below a borderline of .66 (197).
With widely varying norms of the other scales, the I Q borderlines show much greater variation. In a recent review of the evidence, including Descoudres' report (96) on retesting the same children for several years Stern recognizes that an I Q index is not constant after 12 years (187). Doll records decided changes in quotients for the same individual at different ages (99). So far as the 1908 scale is concerned, using Goddard's data, our Table V shows that at five years of age the lowest 1.8% would fall at or below a quotient of .40, at eight years the lowest 1.9% would show a quotient of .62 or less, and at 15 years the lowest 2.8% fall below a quotient of .75. The rough tentative approximation of scale limits which I have suggested for the lowest 1.5% shows that a series of quotients for children from 5 to 15 years of age would be below .75 at every age and below .65 for half of these ages. For the presumably deficient group the quotients would be still lower in order to be as conservative as the borderlines that I have suggested with the Binet scale as at present standardized.
With the coefficient of intelligence and the Point scale, the Yerkes and Wood data show that their borderline of .70 excluded 13% of 196 children 8 and 9 years of age, while it excluded only 5% of each of the next two groups of double ages. With the group of 237 18-year-old Cincinnati working girls it excluded only 3% (226).
The data at present available thus indicate that we should not expect to find the same ratio at different ages excluding similar percentages. If the ratios have a value for comparing individuals of different ages, they seem to fluctuate so decidedly from age to age that they can hardly be trusted for stating the borderlines of deficiency without empirical confirmation for each age.
Pearson found that the children of the older ages in the special classes were more and more deficient, measured in terms of the standard deviation of the normal group. This shift on the average was four months of mental age downward for each year of life during the period 7-14 which he studied. It makes uncertain the definition of the borderline in terms of a constant multiple of the deviation or of a constant quotient, unless this shift is shown to be due to imperfections of the tests which can be corrected, or to changes in the selection of the tested groups at advanced ages.
Pearson's suggestion of -4 S. D. as a borderline with the Jaederholm data gives some very curious results with the group of children in the special schools at Stockholm. Under his interpretation at life-ages 8-11 from 0 to 5.2% of the pupils in these classes would be regarded as deficient, while for life-ages 12-14, 15.2% to 44.4% are beyond -4 S. D. In passing it is to be noted that if one accepted Pearson's suggestion that the borderline should be fixed at -4 S. D., in case the distribution of mental capacity were strictly normal, only four children in 100,000 would be found deficient, according to the probability tables.
With the method of the standard deviation it would be necessary either to show that the deviation was constant in terms of the year units or else to restate the borderline for different ages in terms of the scale units. The irregularity of the norms with the Binet scale could also be allowed for, of course, by stating different quotients for the different ages, but when this readjustment is required for either the ratio or the deviation in terms of the scale units, these methods lose all their advantage of simplicity. Instead of one ratio or one multiple of the years of deviation, we might have a different statement for each life-age. With the percentage method there would be only one statement of the borderline for all ages in terms of percentage, although the scale positions change which shut out the same lowest percentage.
4. All the quotient methods of defining the borderline encounter a serious practical difficulty in fixing the borderline for the mature, so that it will be equivalent to that for the immature. With the Stanford scale in calculating the quotient for adults, no divisor is used over 16 years. Yerkes and Bridges also think that this is about the time that the development of capacity ceases. Kuhlmann and others use 15 as the highest divisor. Wallin objects to either of these ages being used as the age of arrest of mental development (15, p. 67). Both the methods of the standard deviation and percentage have a similar difficulty, in that the borderline for the mature has to be empirically determined on a test scale. In this dilemma, however, the data collected with the random group of 15-year-olds in Minneapolis and published in the present study, places the borderline for the mature on either the 1908 or 1911 Binet scale in a much safer position, so far as empirical data is concerned, than the borderline for the mature for any other scale. This is true whether that borderline be then stated in terms of either the quotient or percentage methods. Translated into terms of the quotient, our percentage borderlines for the mature with these scales, below X for presumably deficient and below XI for the uncertain, would amount to quotients .60 and .66 on the basis of our findings with this random group of children who have presumably about reached adult development. Pearson does not attempt to define any borderline for the adults on the basis of the deviation, since Jaederholm tested only children. Moreover, this is not possible empirically with our group of 15-year-olds, since we tested only the lower extreme of this group.
Unfortunately, the borderlines of the mature for the Stanford and other scales depend upon empirical results obtained not with random groups, but upon a composite of selected groups of adults built up by the investigator on an estimate that this combined group represents a random selection among those with a typical advance in development, an almost superhuman task. Fortunately the empirical determination of this borderline for the mature might be improved later by obtaining data on less selected groups. The clearer significance of the empirical data for the borderline for the mature which I have presented for the Binet 1908 and 1911 scales from a random group of 15-year-olds seems to be an important practical advantage. It provides an empirical basis for judging the implication of test results with adults. It gives adults the benefit of the doubt if they improve after 15 years of age.
5. Compared as to their popular significance, there is no doubt that the lowest 0.5% of the individuals of a particular age has very much more significance to those not familiar with detailed statistical practise than a coefficient or a multiple of the standard deviation. A statement that an adult has only the tested ability of a child of 7 years is certainly much more impressive than his score in other quantitative terms. It will probably always be desirable, therefore, to supplement any other method of scoring by a statement of the individual's test age.
With our present series of tests, the percentage method will best provide a concept of the equivalence of the borderlines at different ages provided the form of the distribution does not remain uniform. I discussed this question briefly in connection with units of measurement. In considering curves of development, I assembled some of the evidence which makes the assumption of normal distribution or even of a constant skewness at least uncertain. In my opinion the weight of the evidence is against the hypothesis that the distributions retain a constant form during the period of development. If this were clearly demonstrated, both the ratio methods and deviation would fail to express equivalent borderlines for the different ages with the Binet scales. A fixed multiple of the standard deviation or a fixed quotient would exclude different percentages of the population at each age when the skewness varied. By reference to Figures 3 and 5, it can be seen that, if our physical units in which we expressed the measurement were uniform and ability always extended to the same absolute zero point, it is true that .01 of the physical units reached by the best at each age would be the same relative amount of ability of the best at each age, stated in physical units, regardless of the form of the distributions. Such a concept, however, has an unknown biological or social significance so far as I can see, except for a constant form of distribution. The same relative physical score compared with the highest at each age, theoretically might exclude the lowest 40% of one age group, for example, and only 10% of another group provided the distribution varied enough in form. The concept of the same relative amount of ability measured in physical units, so soon as the form of distribution varies from age to age, thus loses significance in terms of the struggle for existence. In that struggle, a vital question is—do the individuals at different ages have to struggle to overcome the same relative number of opponents of better ability at their age? If they do, the individuals might properly be regarded as in equivalent positions in the struggle for social survival, disregarding how far the next better individual is above them on the objective scale. This is the concept accepted by the percentage definition of the borderline as the best available under uncertain forms of distribution.
The recent rapid perfection of objective scales to measure educational products, like ability in handwriting, etc., in equal units running to an absolute zero of ability, suggests that it might be possible ultimately to state the borderline of deficiency in terms of the same relative objective distance between the best and zero ability at each age on a scale of general ability. This ideal could be approached, for example, with the Sylvester form-board test in which the units are seconds required to complete the same task, if we could agree upon a maximum number of seconds without success which should mean no ability, and if this zero should remain the same at each age. It would only be necessary to take, for example, the best position or the median or the upper quartile at each age as the other point of reference. We could then say that a borderline in physical units was always, for example, .01 of the median record at each age above zero. Such a method would provide relatively equal objective borderlines at each age and it would afford a measure which would take into account the ability of the individuals to be competed against instead of merely counting them as the percentage method must. It would be better than a description in units of the standard deviation in that its significance would be more easily understood if the form of distribution varied with age.
To demonstrate its worth, however, this method of defining the borderline in terms of the same proportion of the physical difference between zero and the median at each age, would also have to provide a better prediction of ultimate social failure. It would have to be shown that individuals below the relative objective borderline at maturity were below the same relative objective borderline during immaturity. Moreover, it would have to be shown that this relationship was closer than it would be with percentile records. It is a form of this relative objective measurement which Otis advocates in his “absolute intelligence quotient,” which he proposes as logically the best measure of ability. It consists of the ratio of the score of the individual measured in equal absolute units of intelligence, divided by his age (163).
While a relative objective borderline might under certain circumstances afford a better criterion than the same lowest percentage of individuals, there are two very serious practical difficulties which at present make it impossible. In the first place, with the exception of a few motor tests, there are no test results with children of different ages measured in terms of equal objective units for the same task. Even if the Binet year units are equal, as applied to the same task, there is no accurate means of dividing the year units into smaller physical units on the basis of scores with the tests. This makes the use of the Binet scale impossible and we should be forced back upon such tests as the form-board, the ergograph, etc., for which we should have to agree upon an absolute zero of ability. Moreover, mental tests do not lend themselves to measurement in terms merely of rapidity in doing the same task or in terms of other equal physical units since the quality of the work also has to be evaluated and this is usually done in units assumed arbitrarily to measure equivalent degrees of perfection.
The second practical difficulty which at present makes a relative objective borderline impossible is that we know nothing as to the prediction of social failure and success from relative positions on the objective scale used even with the few isolated tests that might be made available. Until we have data on this question, as well as scales of tests for native ability that are measurable to zero ability in objective terms, the percentage method affords the only available way of stating equivalent borderlines when the form of distribution changes.
If the age of arrest of development shifts either earlier or later with different degrees of capacity, then there seems to be no logical escape from a change in the form of distribution. Stern recognized this when he concluded that idiots reach an arrest of development earlier than those better endowed, so he stated that his quotient would not hold for them. He said:
“The feeble-minded child, it must be remembered, not only has a slower rate of development than the normal child, but also reaches a stage of arrest at an age when the normal child's intelligence is still pushing forward in its development. At this time, then, the cleft between the two will be markedly widened.
“From this consideration it follows that the mental quotient can hold good as an index of feeble-mindedness only during that period when the development of the feeble-minded individual is still in progress. It is for this reason that there is no use in calculating the quotient for idiots, because, in their case the stage of arrested development has been entered upon long before the ages at which they are being subjected to examination” (188).
Perhaps the most interesting characteristic of the percentage method is that it automatically adjusts itself to any form of distribution. In case the distributions of ability turn out to be normal for each age and the arrests of development for different degrees of ability distribute alike, then the borderline fixed by the percentage method becomes identical with the corresponding borderlines by the quotient, deviation, or relative objective distance. It can be directly translated into a quotient or a multiple of the standard deviation. This fact affords a good check upon the empirical borderlines fixed by the percentage method for different ages. If the distribution is normal, the lowest 1.5% and 0.5% would be identical with -2.17 S. D. and -2.575 S. D. in samples of 10,000 cases. We may check these percentage borderlines by Goddard's results for ages 5-11 tested with the 1908 Binet scale. I have given the standard deviation for the ages 5-11 with this data in Chap. XIII a, 2. Applying the criterion of 2.575 S. D. to these deviations, we find that to be in the lowest 0.5%, if the distribution were normal, would be about a year less of deficiency than we have suggested, while Pearson's borderline of -4 S. D. would be close to that we suggest. The empirical data thus suggest that the assumption of a normal distribution is faulty at the borderline or else Goddard's data is incorrect for fixing the limits on the scales. I have already given the evidence for supposing that the distribution is skewed during the years of growth.
When approximately random samples are not available, a multiple of the deviation of an efficient group such as -4 S. D. at the particular age seems to afford a practical way of discovering a tentative borderline until a random sample can be measured. The serious theoretical objections to such a procedure as a regular method is that the efficient group would be selected by the subjective standard of somebody's opinion and that the form of distribution of ability may vary from age to age.
Recalling the practical advantages of the percentage method which we enumerated in the preceding section, we can now better understand the value of a method that is not disturbed by the form of distribution of mental capacity which may ultimately be found to prevail at different ages. It is safer at present to assume that the distributions do change enough in form at the lower end seriously to affect the borderlines of deficiency as defined by other methods. If, however, the form of distribution remains uniform, it would first be necessary for those advocating the use of any of the other quantitative definitions to show that the units of their scales are equal under some reasonable hypothesis. A ratio or a deviation statable only in scale units which are not demonstrably equal is a hazard, with the chances badly weighted against its reliability. So far as both the Binet and the Point scales are concerned we have found that the units are not equal. A quotient or coefficient arrived at by assuming their equality is sure to mean seriously erroneous fluctuations in the borderlines.
Referring to the percentage method, Yerkes and Wood say: “Frequency of occurrence is unquestionably a useful datum, which should be presented, if not instead of, then in addition to, certain other statistical indices which possess greater scientific value” (226). These other indices require both equal scale units and uniform distributions from age to age. The ratio and deviation methods fail at present in both of these particulars, so that it seems necessary to depend upon the percentage definition of tested deficiency, incomplete as that may be.
This leaves us in the unfortunate situation that the borderline positions on the scale will have to be stated separately for each age and will have to be found empirically. Moreover, we shall need to determine more accurately in what lowest percentage an individual must test in order reasonably to predict that he will require social care for the good of himself and society.
As soon as anybody can discover a means of defining the borderline, which is equally accurate and significant, and which, in addition to counting the proportion of better individuals to be met in the competition of life, will also evaluate the distance they are above the borderline, we all shall be eager to accept this better criterion of deficiency. A form which it might take is that of relative objective distance between zero and median ability. If measurable in equal objective units, this would be independent of the form of distribution and would improve the quantitative description of equivalent deficiency, provided that it also forecasted future social failure as well as the percentage method.
What form of stating the borderline of tested deficiency may ultimately meet with approval, a verbal definition of feeble-mindedness will never remain an ideal scientific statement until it finds expression in quantitative terms.