The measurement of intelligence

FOOTNOTES:

[7] See p. 13 ff.

[8] See p. 169 ff. of reference 2, at end of this book

[9] See p. 182 ff. of reference 2 at end of this book.

CHAPTER III
DESCRIPTION OF THE BINET-SIMON METHOD

Essential nature of the scale.

The Binet scale is made up of an extended series of tests in the nature of “stunts,” or problems, success in which demands the exercise of intelligence. As left by Binet, the scale consists of 54 tests, so graded in difficulty that the easiest lie well within the range of normal 3-year-old children, while the hardest tax the intelligence of the average adult. The problems are designed primarily to test native intelligence, not school knowledge or home training. They try to answer the question “How intelligent is this child?” How much the child has learned is of significance only in so far as it throws light on his ability to learn more.

Binet fully appreciated the fact that intelligence is not homogeneous, that it has many aspects, and that no one kind of test will display it adequately. He therefore assembled for his intelligence scale tests of many different types, some of them designed to display differences of memory, others differences in power to reason, ability to compare, power of comprehension, time orientation, facility in the use of number concepts, power to combine ideas into a meaningful whole, the maturity of apperception, wealth of ideas, knowledge of common objects, etc.

How the scale was derived.

The tests were arranged in order of difficulty, as found by trying them upon some 200 normal children of different ages from 3 to 15 years. It was found, for illustration, that a certain test was passed by only a very small proportion of the younger children, say the 5-year-olds, and that the number passing this test increased rapidly in the succeeding years until by the age of 7 or 8 years, let us say, practically all the children were successful. If, in our supposed case, the test was passed by about two thirds to three fourths of the normal children aged 7 years, it was considered by Binet a test of 7-year intelligence. In like manner, a test passed by 65 to 75 per cent of the normal 9-year-olds was considered a test of 9-year intelligence, and so on. By trying out many different tests in this way it was possible to secure five tests to represent each age from 3 to 10 years (excepting age 4, which has only four tests), five for age 12, five for 15, and five for adults, making 54 tests in all.

List of tests.

The following is the list of tests as arranged by Binet in 1911, shortly before his untimely death:—

Age 3:

Points to nose, eyes, and mouth.
Repeats two digits.
Enumerates objects in a picture.
Gives family name.
Repeats a sentence of six syllables.

Age 4:

Gives his sex.
Names key, knife, and penny.
Repeats three digits.
Compares two lines.

Age 5:

Compares two weights.
Copies a square.
Repeats a sentence of ten syllables.
Counts four pennies.
Unites the halves of a divided rectangle.

Age 6:

Distinguishes between morning and afternoon.

Defines familiar words in terms of use.

Copies a diamond.

Counts thirteen pennies.

Distinguishes pictures of ugly and pretty faces.

Age 7:

Shows right hand and left ear.
Describes a picture.
Executes three commissions, given simultaneously.
Counts the value of six sous, three of which are double.
Names four cardinal colors.

Age 8:

Compares two objects from memory.
Counts from 20 to 0.
Notes omissions from pictures.
Gives day and date.
Repeats five digits.

Age 9:

Gives change from twenty sous.
Defines familiar words in terms superior to use.
Recognizes all the pieces of money.
Names the months of the year, in order.
Answers easy “comprehension questions.”

Age 10:

Arranges five blocks in order of weight.
Copies drawings from memory.
Criticizes absurd statements.
Answers difficult “comprehension questions.”
Uses three given words in not more than two sentences.

Age 12:

Resists suggestion.
Composes one sentence containing three given words.
Names sixty words in three minutes.
Defines certain abstract words.
Discovers the sense of a disarranged sentence.

Age 15:

Repeats seven digits.
Finds three rhymes for a given word.
Repeats a sentence of twenty-six syllables.
Interprets pictures.
Interprets given facts.

Adult:
1. Solves the paper-cutting test.
2. Rearranges a triangle in imagination.
3. Gives differences between pairs of abstract terms.
4. Gives three differences between a president and a king.
5. Gives the main thought of a selection which he has heard read.

It should be emphasized that merely to name the tests in this way gives little idea of their nature and meaning, and tells nothing about Binet’s method of conducting the 54 experiments. In order to use the tests intelligently it is necessary to acquaint one’s self thoroughly with the purpose of each test, its correct procedure, and the psychological interpretation of different types of response.[10]

In fairness to Binet, it should also be borne in mind that the scale of tests was only a rough approximation to the ideal which the author had set himself to realize. Had his life been spared a few years longer, he would doubtless have carried the method much nearer perfection.

How the scale is used.

By means of the Binet tests we can judge the intelligence of a given individual by comparison with standards of intellectual performance for normal children of different ages. In order to make the comparison it is only necessary to begin the examination of the subject at a point in the scale where all the tests are passed successfully, and to continue up the scale until no more successes are possible. Then we compare our subject’s performances with the standard for normal children of the same age, and note the amount of acceleration or retardation.

Let us suppose the subject being tested is 9 years of age. If he goes as far in the tests as normal 9-year-old children ordinarily go, we can say that the child has a “mental age” of 9 years, which in this case is normal (our child being 9 years of age). If he goes only as far as normal 8-year-old children ordinarily go, we say that his “mental age” is 8 years. In like manner, a mentally defective child of 9 years may have a “mental age” of only 4 years, or a young genius of 9 years may have a mental age of 12 or 13 years.

Special characteristics of the Binet-Simon method.

Psychologists had experimented with intelligence tests for at least twenty years before the Binet scale made its appearance. The question naturally suggests itself why Binet should have been successful in a field where previous efforts had been for the most part futile. The answer to this question is found in three essential differences between Binet’s method and those formerly employed.

1. The use of age standards.

Binet was the first to utilize the idea of age standards, or norms, in the measurement of intelligence. It will be understood, of course, that Binet did not set out to invent tests of 10-year intelligence, 6-year intelligence, etc. Instead, as already explained, he began with a series of tests ranging from very easy to very difficult, and by trying these tests on children of different ages and noting the percentages of successes in the various years, he was able to locate them (approximately) in the years where they belonged.

This plan has the great advantage of giving us standards which are easily grasped. To say, for illustration, that a given subject has a grade of intelligence equal to that of the average child of 8 years is a statement whose general import does not need to be explained. Previous investigators had worked with subjects the degree of whose intelligence was unknown, and with tests the difficulty of which was equally unknown. An immense amount of ingenuity was spent in devising tests which were used in such a way as to preclude any very meaningful interpretation of the responses.

The Binet method enables us to characterize the intelligence of a child in a far more definite way than had hitherto been possible. Current descriptive terms like “bright,” “moderately bright,” “dull,” “very dull,” “feeble-minded,” etc., have had no universally accepted meaning. A child who is designated by one person as “moderately bright” may be called “very bright” by another person. The degree of intelligence which one calls “moderate dullness,” another may call “extreme dullness,” etc. But every one knows what is meant by the term 8-year mentality, 4-year mentality, etc., even if he is not able to define these grades of intelligence in psychological terms; and by ascertaining experimentally what intellectual tasks children of different ages can perform, we are, of course, able to make our age standards as definite as we please.

Why should a device so simple have waited so long for a discoverer? We do not know. It is of a class with many other unaccountable mysteries in the development of scientific method. Apparently the idea of an age-grade method, as this is called, did not come to Binet himself until he had experimented with intelligence tests for some fifteen years. At least his first provisional scale, published in 1905, was not made up according to the age-grade plan. It consisted merely of 30 tests, arranged roughly in order of difficulty. Although Binet nowhere gives any account of the steps by which this crude and ungraded scale was transformed into the relatively complete age-grade scale of 1908, we can infer that the original and ingenious idea of utilizing age norms was suggested by the data collected with the 1905 scale. However the discovery was made, it ranks, perhaps, from the practical point of view, as the most important in all the history of psychology.

2. The kind of mental functions brought into play.

In the second place, the Binet tests differ from most of the earlier attempts in that they are designed to test the higher and more complex mental processes, instead of the simpler and more elementary ones. Hence they set problems for the reasoning powers and ingenuity, provoke judgments about abstract matters, etc., instead of attempting to measure sensory discrimination, mere retentiveness, rapidity of reaction, and the like. Psychologists had generally considered the higher processes too complex to be measured directly, and accordingly sought to get at them indirectly by correlating supposed intelligence with simpler processes which could readily be measured, such as reaction time, rapidity of tapping, discrimination of tones and colors, etc. While they were disputing over their contradictory findings in this line of exploration, Binet went directly to the point and succeeded where they had failed.

It is now generally admitted by psychologists that higher intelligence is little concerned in such elementary processes as those mentioned above. Many of the animals have keen sensory discrimination. Feeble-minded children, unless of very low grade, do not differ very markedly from normal children in sensitivity of the skin, visual acuity, simple reaction time, type of imagery, etc. But in power of comprehension, abstraction, and ability to direct thought, in the nature of the associative processes, in amount of information possessed, and in spontaneity of attention, they differ enormously.

3. Binet would test “general intelligence.”

Finally, Binet’s success was largely due to his abandonment of the older “faculty psychology” which, far from being defunct, had really given direction to most of the earlier work with mental tests. Where others had attempted to measure memory attention, sense discrimination, etc., as separate faculties or functions, Binet undertook to ascertain the general level of intelligence. Others had thought the task easier of accomplishment by measuring each division or aspect of intelligence separately, and summating the results. Binet, too, began in this way, and it was only after years of experimentation by the usual methods that he finally broke away from them and undertook, so to speak, to triangulate the height of his tower without first getting the dimensions of the individual stones which made it up.

The assumption that it is easier to measure a part, or one aspect, of intelligence than all of it, is fallacious in that the parts are not separate parts and cannot be separated by any refinement of experiment. They are interwoven and intertwined. Each ramifies everywhere and appears in all other functions. The analogy of the stones of the tower does not really apply. Memory, for example, cannot be tested separately from attention, or sense-discrimination separately from the associative processes. After many vain attempts to disentangle the various intellective functions, Binet decided to test their combined functional capacity without any pretense of measuring the exact contribution of each to the total product. It is hardly too much to say that intelligence tests have been successful just to the extent to which they have been guided by this aim.

Memory, attention, imagination, etc., are terms of “structural psychology.” Binet’s psychology is dynamic. He conceives intelligence as the sum total of those thought processes which consist in mental adaptation. This adaptation is not explicable in terms of the old mental “faculties.” No one of these can explain a single thought process, for such process always involves the participation of many functions whose separate rôles are impossible to distinguish accurately. Instead of measuring the intensity of various mental states (psycho-physics), it is more enlightening to measure their combined effect on adaptation. Using a biological comparison, Binet says the old “faculties” correspond to the separate tissues of an animal or plant, while his own “scheme of thought” corresponds to the functioning organ itself. For Binet, psychology is the science of behavior.

Binet’s conception of general intelligence.

In devising tests of intelligence it is, of course, necessary to be guided by some assumption, or assumptions, regarding the nature of intelligence. To adopt any other course is to depend for success upon happy chance.

However, it is impossible to arrive at a final definition of intelligence on the basis of a-priori considerations alone. To demand, as critics of the Binet method have sometimes done, that one who would measure intelligence should first present a complete definition of it, is quite unreasonable. As Stern points out, electrical currents were measured long before their nature was well understood. Similar illustrations could be drawn from the processes involved in chemistry physiology, and other sciences. In the case of intelligence it may be truthfully said that no adequate definition can possibly be framed which is not based primarily on the symptoms empirically brought to light by the test method. The best that can be done in advance of such data is to make tentative assumptions as to the probable nature of intelligence, and then to subject these assumptions to tests which will show their correctness or incorrectness. New hypotheses can then be framed for further trial, and thus gradually we shall be led to a conception of intelligence which will be meaningful and in harmony with all the ascertainable facts.

Such was the method of Binet. Only those unacquainted with Binet’s more than fifteen years of labor preceding the publication of his intelligence scale would think of accusing him of making no effort to analyze the mental processes which his tests bring into play. It is true that many of Binet’s earlier assumptions proved untenable, and in this event he was always ready, with exceptional candor and intellectual plasticity, to acknowledge his error and to plan a new line of attack.

Binet’s conception of intelligence emphasizes three characteristics of the thought process: (1) Its tendency to take and maintain a definite direction; (2) the capacity to make adaptations for the purpose of attaining a desired end; and (3) the power of auto-criticism.[11]

How these three aspects of intelligence enter into the performances with various tests of the scale is set forth from time to time in our directions for giving and interpreting the individual tests.[12] An illustration which may be given here is that of the “patience test,” or uniting the disarranged parts of a divided rectangle. As described by Binet, this operation has the following elements: “(1) to keep in mind the end to be attained, that is to say, the figure to be formed; (2) to try different combinations under the influence of this directing idea, which guides the efforts of the subject even though he may not be conscious of the fact; and (3) to judge the combination which has been made, to compare it with the model, and to decide whether it is the correct one.”

Much the same processes are called for in many other of the Binet tests, particularly those of arranging weights, rearranging dissected sentences, drawing a diamond or square from copy, finding a sentence containing three given words, counting backwards, etc.

However, an examination of the scale will show that the choice of tests was not guided entirely by any single formula as to the nature of intelligence. Binet’s approach was a many-sided one. The scale includes tests of time orientation, of three or four kinds of memory, of apperception, of language comprehension, of knowledge about common objects, of free association, of number mastery, of constructive imagination, and of ability to compare concepts, to see contradictions, to combine fragments into a unitary whole, to comprehend abstract terms, and to meet novel situations.

Other conceptions of intelligence.

It is interesting to compare Binet’s conception of intelligence with the definitions which have been offered by other psychologists. According to Ebbinghaus, for example, the essence of intelligence lies in comprehending together in a unitary, meaningful whole, impressions and associations which are more or less independent, heterogeneous, or even partly contradictory. “Intellectual ability consists in the elaboration of a whole into its worth and meaning by means of many-sided combination, correction, and completion of numerous kindred associations.... It is a combination activity.”

Meumann offers a twofold definition. From the psychological point of view, intelligence is the power of independent and creative elaboration of new products out of the material given by memory and the senses. From the practical point of view, it involves the ability to avoid errors, to surmount difficulties, and to adjust to environment.

Stern defines intelligence as “the general capacity of an individual consciously to adjust his thinking to new requirements: it is general adaptability to new problems and conditions of life.”

Spearman, Hart, and others of the English school define intelligence as a “common central factor” which participates in all sorts of special mental activities. This factor is explained in terms of a psycho-physiological hypothesis of “cortex energy,” “cerebral plasticity,” etc.

The above definitions are only to a slight extent contradictory or inharmonious. They differ mainly in point of view or in the location of the emphasis. Each expresses a part of the truth, and none all of it. It will be evident that the conception of Binet is broad enough to include the most important elements in each of the other definitions quoted.

Guiding principles in choice and arrangement of tests.

In choosing his tests Binet was guided by the conception of intelligence which we have set forth above. Tests were devised which would presumably bring into play the various mental processes thought to be concerned in intelligence, and then these tests were tried out on normal children of different ages. If the percentage of passes for a given test increased but little or not at all in going from younger to older children this test was discarded. On the other hand, if the proportion of passes increased rapidly with age, and if children of a given age, who on other grounds were known to be bright, passed more frequently than children of the same age who were known to be dull, then the test was judged a satisfactory test of intelligence. As we have shown elsewhere,[13] practically all of Binet’s tests fulfill these requirements reasonably well, a fact which bears eloquent testimony to the keen psychological insight of their author.

In arranging the tests into a system Binet’s guiding principle was to find an arrangement of the tests which would cause an average child of any given age to test “at age”; that is, the average 5-year-old must show a mental age of 5 years, the average 8-year-old a mental age of 8 years, etc. In order to secure this result Binet found that his data seemed to require the location of an individual test in that year where it was passed by about two thirds to three fourths of unselected children.

It was in the assembling of the tests that the most serious faults of the scale had their origin. Further investigation has shown that a great many of the tests were misplaced as much as one year, and several of them two years. On the whole, the scale as Binet left it was decidedly too easy in the lower ranges, and too difficult in the upper. As a result, the average child of 5 years was caused to test at not far from 6 years, the average child of 12 years not far from 11. In the Stanford revision an effort has been made to correct this fault, along with certain other generally recognized imperfections.

Some avowed limitations of the Binet tests.

The Binet tests have often been criticized for their unfitness to perform certain services which in reality they were never meant to render. This is unfair. We cannot make a just evaluation of the scale without bearing in mind its avowed limitations.

For example, the scale does not pretend to measure the entire mentality of the subject, but only general intelligence. There is no pretense of testing the emotions or the will beyond the extent to which these naturally display themselves in the tests of intelligence. The scale was not designed as a tool for the analysis of those emotional or volitional aberrations which are concerned in such mental disorders as hysteria, insanity, etc. These conditions do not present a progressive reduction of intelligence to the infantile level, and in most of them other factors besides intelligence play an important rôle. Moreover, even in the normal individual the fruitfulness of intelligence, the direction in which it shall be applied, and its methods of work are to a certain extent determined by the extraneous factors of emotion and volition.

It should, nevertheless, be pointed out that defects of intelligence, in a large majority of cases, also involve disturbances of the emotional and volitional functions. We do not expect to find perfectly normal emotions or will power of average strength coupled with marked intellectual deficiency, and as a matter of fact such a combination is rare indeed. In the course of an examination with the Binet tests, the experienced clinical psychologist is able to gain considerable insight into the subject’s emotional and volitional equipment, even though the method was designed primarily for another purpose.

A second misunderstanding can be avoided by remembering that the Binet scale does not pretend to bring to light the idiosyncrasies of special talent, but only to measure the general level of intelligence. It cannot be used for the discovery of exceptional ability in drawing, painting, music, mathematics, oratory, salesmanship, etc., because no effort is made to explore the processes underlying these abilities. It can, therefore, never serve as a detailed chart for the vocational guidance of children, telling us which will succeed in business, which in art, which in medicine, etc. It is not a new kind of phrenology. At the same time, as we have already pointed out, it is capable of bounding roughly the vocational territory in which an individual’s intelligence will probably permit success, nothing else preventing.[14]

In the third place, it must not be supposed that the scale can be used as a complete pedagogical guide. Although intelligence tests furnish data of the greatest significance for pedagogical procedure, they do not suggest the appropriate educational methods in detail. These will have to be worked out in a practical way for the various grades of intelligence, and at great cost of labor and patience.

Finally, in arriving at an estimate of a subject’s grade of intelligence and his susceptibility to training, it would be a mistake to ignore the data obtainable from other sources. No competent psychologist, however ardent a supporter of the Binet method he might be, would recommend such a policy. Those who accept the method as all-sufficient are as much in error as those who consider it as no more important than any one of a dozen other approaches. Standardized tests have already become and will remain by far the most reliable single method for grading intelligence, but the results they furnish will always need to be interpreted in the light of supplementary information regarding the subject’s personal history, including medical record, accidents, play habits, industrial efficiency, social and moral traits, school success, home environment, etc. Without question, however, the improved Binet tests will contribute more than all other data combined to the end of enabling us to forecast a child’s possibilities of future improvement, and this is the information which will aid most in the proper direction of his education.

FOOTNOTES:

[10] See Part II of this volume, and References 1 and 29, for discussion and interpretation of the individual tests.

[11] See Binet and Simon: “L’intelligence des imbeciles,” in L’Année Psychologique (1909), pp. 1–147. The last division of this article is devoted to a discussion of the essential nature of the higher thought processes, and is a wonderful example of that keen psychological analysis in which Binet was so gifted.

[12] See especially pages 162 and 238.

[13] See p. 55.

[14] See p. 17.

CHAPTER IV
NATURE OF THE STANFORD REVISION AND EXTENSION

Although the Binet scale quickly demonstrated its value as an instrument for the classification of mentally-retarded and otherwise exceptional children, it had, nevertheless, several imperfections which greatly limited its usefulness. There was a dearth of tests at the higher mental levels, the procedure was so inadequately defined that needless disagreement came about in the interpretation of data, and so many of the tests were misplaced as to make the results of an examination more or less misleading, particularly in the case of very young subjects and those near the adult level. It was for the purpose of correcting these and certain other faults that the Stanford investigation was planned.[15]

Sources of data.

Our revision is the result of several years of work, and involved the examination of approximately 2300 subjects, including 1700 normal children, 200 defective and superior children, and more than 400 adults.

Tests of 400 of the 1700 normal children had been made by Childs and Terman in 1910–11, and of 300 children by Trost, Waddle, and Terman in 1911–12. For various reasons, however, the results of these tests did not furnish satisfactory data for a thoroughgoing revision of the scale. Accordingly a new investigation was undertaken, somewhat more extensive than the others, and more carefully planned. Its main features may be described as follows:—

1. The first step was to assemble as nearly as possible all the results which had been secured for each test of the scale by all the workers of all countries. The result was a large sheet of tabulated data for each individual test, including percentages passing the test at various ages, conditions under which the results were secured, method of procedure, etc. After a comparative study of these data, and in the light of results we had ourselves secured, a provisional arrangement of the tests was prepared for try-out.

2. In addition to the tests of the original Binet scale, 40 additional tests were included for try-out. This, it was expected, would make possible the elimination of some of the least satisfactory tests, and at the same time permit the addition of enough new ones to give at least six tests, instead of five, for each age group.

3. A plan was then devised for securing subjects who should be as nearly as possible representative of the several ages. The method was to select a school in a community of average social status, a school attended by all or practically all the children in the district where it was located. In order to get clear pictures of age differences the tests were confined to children who were within two months of a birthday. To avoid accidental selection, all the children within two months of a birthday were tested, in whatever grade enrolled. Tests of foreign-born children, however, were eliminated in the treatment of results. There remained tests of approximately 1000 children, of whom 905 were between 5 and 14 years of age.

4. The children’s responses were, for the most part, recorded verbatim. This made it possible to re-score the records according to any desired standard, and thus to fit a test more perfectly to the age level assigned it.

5. Much attention was given to securing uniformity of procedure. A half-year was devoted to training the examiners and another half-year to the supervision of the testing. In the further interests of uniformity all the records were scored by one person (the writer).

Method of arriving at a revision.

The revision of the scale below the 14-year level was based almost entirely on the tests of the above-mentioned 1,000 unselected children. The guiding principle was to secure an arrangement of the tests and a standard of scoring which would cause the median mental age of the unselected children of each age group to coincide with the median chronological age. That is, a correct scale must cause the average child of 5 years to test exactly at 5, the average child at 6 to test exactly at 6, etc. Or, to express the same fact in terms of intelligence quotient,[16] a correct scale must give a median intelligence quotient of unity, or 100 per cent, for unselected children of each age.

If the median mental age resulting at any point from the provisional arrangement of tests was too high or too low, it was only necessary to change the location of certain of the tests, or to change the standard of scoring, until an order of arrangement and a standard of passing were found which would throw the median mental age where it belonged. We had already become convinced, for reasons too involved for presentation here, that no satisfactory revision of the Binet scale was possible on any theoretical considerations as to the percentage of passes which an individual test ought to show in a given year in order to be considered standard for that year.

As was to be expected, the first draft of the revision did not prove satisfactory. The scale was still too hard at some points, and too easy at others. In fact, three successive revisions were necessary, involving three separate scorings of the data and as many tabulations of the mental ages, before the desired degree of accuracy was secured. As finally revised, the scale gives a median intelligence quotient closely approximating 100 for the unselected children of each age from 4 to 14.

Since our school children who were above 14 years and still in the grades were retarded left-overs, it was necessary to base the revision above this level on the tests of adults. These included 30 business men and 150 “migrating” unemployed men tested by Mr. H. E. Knollin, 150 adolescent delinquents tested by Mr. J. Harold Williams, and 50 high-school students tested by the writer.

The extension of the scale in the upper range is such that ordinarily intelligent adults, little educated, test up to what is called the “average adult” level. Adults whose intelligence is known from other sources to be superior are found to test well up toward the “superior adult” level, and this holds whether the subjects in question are well educated or practically unschooled. The almost entirely unschooled business men, in fact, tested fully as well as high-school juniors and seniors.

Figure 1 shows the distribution of mental ages for 62 adults, including the 30 business men and the 32 high-school pupils who were over 16 years of age. It will be noted that the middle section of the graph represents the “mental ages” falling between 15 and 17. This is the range which we have designated as the “average adult” level. Those above 17 are called “superior adults,” those between 13 and 15, “inferior adults.” Subjects much over 15 years of age who test in the neighborhood of 12 years may ordinarily be considered border-line cases.

Fig. 1. DISTRIBUTION OF MENTAL AGES OF 62 NORMAL ADULTS
13 to 13 11	14 to 14 11	15 to 15 11	17 to 17 11	18 to 18 11

1.6%	17.7%	59.7%	16.2%	4.8%

The following method was employed for determining the validity of a test. The children of each age level were divided into three groups according to intelligence quotient, those testing below 90, those between 90 and 109, and those with an intelligence quotient of 110 or above. The percentages of passes on each individual test at or near that age level were then ascertained separately for these three groups. If a test fails to show a decidedly higher proportion of passes in the superior I Q group than in the inferior I Q group, it cannot be regarded as a satisfactory test of intelligence. On the other hand, a test which satisfies this criterion must be accepted as valid or the entire scale must be rejected. Henceforth it stands or falls with the scale as a whole.

When tried out by this method, some of the tests which have been most criticized showed a high degree of reliability; certain others which have been considered excellent proved to be so little correlated with intelligence that they had to be discarded.

After making a few necessary eliminations, 90 tests remained, or 36 more than the number included in the Binet 1911 scale. There are 6 at each age level from 3 to 10, 8 at 12, 6 at 14, 6 at “average adult,” 6 at “superior adult,” and 16 alternative tests. The alternative tests, which are distributed among the different groups, are intended to be used only as substitutes when one or more of the regular tests have been rendered, by coaching or otherwise, undesirable.[17]

Of the 36 new tests, 27 were added and standardized in the various Stanford investigations. Two tests were borrowed from the Healy-Fernald series, one from Kuhlmann, one was adapted from Bonser, and the remaining five were amplifications or adaptations of some of the earlier Binet tests.

Following is a complete list of the tests of the Stanford revision. Those designated al. are alternative tests. The guide for giving and scoring the tests is presented at length in Part II of this volume.

The Stanford revision and extension

Year III. (6 tests, 2 months each.)
1. Points to parts of body. (3 to 4.)
  Nose; eyes; mouth; hair.
2. Names familiar objects. (3 to 5.)
  Key, penny, closed knife, watch, pencil.
3. Pictures, enumeration or better. (At least 3 objects enumerated in one picture.)
  (a) Dutch Home; (b) River Scene; (c) Post-Office.
4. Gives sex.
5. Gives last name.
6. Repeats 6 to 7 syllables. (1 to 3.)

Year IV. (6 tests, 2 months each.)
1. Compares lines. (3 trials, no error.)
2. Discrimination of forms. (Kuhlmann.) (Not over 3 errors.)
3. Counts 4 pennies. (No error.)
4. Copies square. (Pencil. 1 to 3.)
5. Comprehension, 1st degree. (2 to 3.) (Stanford addition.)
  “What must you do”: “When you are sleepy?” “Cold?” “Hungry?”
6. Repeats 4 digits. (1 to 3. Order correct.) (Stanford addition.)
7. Al. Repeats 12 to 13 syllables. (1 to 3 absolutely correct, or 2 with 1 error each.)
Year V. (6 tests, 2 months each.)
1. Comparison of weights. (2 to 3.)
  3–15; 15–3; 3–15.
2. Colors. (No error.)
  Red; yellow; blue; green.
3. Æsthetic comparison. (No error.)
4. Definitions, use or better. (4 to 6.)
  Chair; horse; fork; doll; pencil; table.
5. Patience, or divided rectangle. (2 to 3 trials. 1 minute each.)
6. Three commissions. (No error. Order correct.)
7. Al. Age.
Year VI. (6 tests, 2 months each.)
1. Right and left. (No error.)
  Right hand; left ear; right eye.
2. Mutilated pictures. (3 to 4 correct.)
3. Counts 13 pennies. (1 to 2 trials, without error.)
4. Comprehension, 2d degree. (2 to 3.) “What’s the thing for you to do”:
  1. “If it is raining when you start to school?”
  2. “If you find that your house is on fire?”
  3. “If you are going some place and miss your car?”
5. Coins. (3 to 4.)
  Nickel; penny; quarter; dime.
6. Repeats 16 to 18 syllables. (1 to 3 absolutely correct, or 2 with 1 error each.)
7. Al. Morning or afternoon.
Year VII. (6 tests, 2 months each.)
1. Fingers. (No error.) Right; left; both.
2. Pictures, description or better. (Over half of performance description:) Dutch Home; River Scene; Post-Office.
3. Repeats 5 digits. (1 to 3. Order correct.)
4. Ties bow-knot. (Model shown. 1 minute.) (Stanford addition.)
5. Gives differences. (2 to 3.)
  Fly and butterfly; stone and egg; wood and glass.
6. Copies diamond. (Pen. 2 to 3.)
7. Al. 1. Names days of week. (Order correct. 2 to 3 checks correct.)
8. Al. 2. Repeats 3 digits backwards. (1 to 3.)
Year VIII. (6 tests, 2 months each.)
1. Ball and field. (Inferior plan or better.) (Stanford addition.)
2. Counts 20 to 1. (40 seconds. 1 error allowed.)
3. Comprehension, 3d degree. (2 to 3.) “What’s the thing for you to do”:
  1. “When you have broken something which belongs to some one else?”
  2. “When you are on your way to school and notice that you are in danger of being tardy?”
  3. “If a playmate hits you without meaning to do it?”
4. Gives similarities, two things. (2 to 4.) (Stanford addition.)
  Wood and coal; apple and peach; iron and silver; ship and automobile.
5. Definitions superior to use. (2 to 4.)
  Balloon; tiger; football; soldier.
6. Vocabulary, 20 words. (Stanford addition. For list of words used, see record booklet.)
7. Al. 1. First six coins. (No error.)
8. Al. 2. Dictation. (“See the little boy.” Easily legible. Pen. 1 minute.)
Year IX. (6 tests, 2 months each.)
1. Date. (Allow error of 3 days in c, no error in a, b, or d.)
  1. day of week;
  2. month;
  3. day of month;
  4. year.
2. Weights. (3, 6, 9, 12, 15. Procedure not illustrated. 2 to 3.)
3. Makes change. (2 to 3. No coins, paper, or pencil.)
  10 − 4; 15 − 12; 25 − 4.
4. Repeats 4 digits backwards. (1 to 3.) (Stanford addition.)

Year X. (6 tests, 2 months each.)
1. Vocabulary, 30 words. (Stanford addition.)
2. Absurdities. (4 to 5. Warn. Spontaneous correction allowed.) (Four of Binet’s, one Stanford.)
3. Designs. (1 correct, 1 half correct. Expose 10 seconds.)
4. Reading and report. (8 memories. 35 seconds and 2 mistakes in reading.) (Binet’s selection.)
5. Comprehension, 4th degree. (2 to 3. Question may be repeated.)
  1. “What ought you to say when some one asks your opinion about a person you don’t know very well?”
  2. “What ought you to do before undertaking (beginning) something very important?”
  3. “Why should we judge a person more by his actions than by his words?”
6. Names 60 words. (Illustrate with clouds, dog, chair, happy.)
7. Al. 1. Repeats 6 digits. (1 to 2. Order correct.) (Stanford addition.)
8. Al. 2. Repeats 20 to 22 syllables. (1 to 3 correct, or 2 with 1 error each.)
9. Al. 3. Form board. (Healy-Fernald Puzzle A. 3 times in 5 minutes.)
Year XII. (8 tests, 3 months each.)
1. Vocabulary, 40 words. (Stanford addition.)
2. Abstract words. (3 to 5.)
  Pity; revenge; charity; envy; justice.
3. Ball and field. (Superior plan.) (Stanford addition.)
4. Dissected sentences. (2 to 3. 1 minute each.)
5. Fables. (Score 4; i.e., two correct or the equivalent in half credits.) (Stanford addition.)
  Hercules and Wagoner; Maid and Eggs; Fox and Crow; Farmer and Stork; Miller, Son, and Donkey.

Year XIV. (6 tests, 4 months each.)
1. Vocabulary, 50 words. (Stanford addition.)
2. Induction test. (Gets rule by 6th folding.) (Stanford addition.)
3. President and king. (Power; accession; tenure. 2 to 3.)
4. Problems of fact. (2 to 3.) (Binet’s two and one Stanford addition.)
5. Arithmetical reasoning. (1 minute each. 2 to 3.) (Adapted from Bonser.)
6. Clock. (2 to 3. Error must not exceed 3 or 4 minutes.)
  6.22. 8.10. 2.46.
7. Al. Repeats 7 digits. (1 to 2. Order correct.)
“Average Adult.” (6 tests, 5 months each.)
1. Vocabulary, 65 words. (Stanford addition.)
2. Interpretation of fables. (Score 8.) (Stanford addition.)
3. Difference between abstract words. (3 real contrasts out of 4.)
  Laziness and idleness; evolution and revolution; poverty and misery; character and reputation.
4. Problem of the enclosed boxes. (3 to 4.) (Stanford addition.)
5. Repeats 6 digits backwards. (1 to 3.) (Stanford addition.)
6. Code, writes “Come quickly.” (2 errors. Omission of dot counts half error. Illustrate with “war” and “spy.”) (From Healy and Fernald.)
7. Al. 1. Repeats 28 syllables. (1 to 2 absolutely correct.)
8. Al. 2. Comprehension of physical relations. (2 to 3.) (Stanford addition.)
  Path of cannon ball; weight of fish in water; hitting distant mark.
“Superior Adult.” (6 tests, 6 months each.)
1. Vocabulary, 75 words. (Stanford addition.)
2. Binet’s paper-cutting test. (Draws, folds, and locates holes.)
3. Repeats 8 digits. (1 to 3. Order correct.) (Stanford addition.)
4. Repeats thought of passage heard. (1 to 2.) (Binet’s and Wissler’s selections adapted.)
5. Repeats 7 digits backwards. (1 to 3.) (Stanford addition.)
6. Ingenuity test. (2 to 3. 5 minutes each.) (Stanford addition.)

Summary of changes.

A comparison of the above list with either the Binet 1908 or 1911 series will reveal many changes. On the whole, it differs somewhat more from the Binet 1911 scale than from that of 1908. Thus, of the 49 tests below the “adult” group in the 1911 scale, 2 are eliminated and 29 are relocated. Of these, 25 are moved downward and 4 upward. The shifts are as follows:—

The measurement of intelligence

About This Book

FOOTNOTES:

FOOTNOTES: