Replication Study of the Comparison of IMP Students with Students Enrolled in Traditional Courses on Probability, Statistics, Problem Solving, and Reasoning Norman L. Webb and Maritza Dowling
Wisconsin Center for Education Research School of Education University of WisconsinMadison December 31, 1997 The Interactive Mathematics Program Evaluation Project operates under the auspices of the Wisconsin Center for Education Research and is funded under a contract with the San Francisco State University Foundation, Inc., with resources that are provided by the National Science Foundation (award number ESI9255262). The opinions, findings, and conclusions that are expressed in this paper do not necessarily reflect those of the supporting agencies. Table of Contents Statistics Requirements in University Courses Changing Needs in the Work Place IMP Curriculum Statistics and Problem Solving Appendix: Grade 9 Instrument The Interactive Mathematics Program (IMP) is a fouryear high school mathematics curriculum designed to be aligned with the core curriculum recommended in the National Council of Teachers of Mathematics Curriculum and Evaluation Standards. The curriculum is problembased and incorporates traditional topics including algebra, geometry, and trigonometry with topics given less emphasis in traditional high school programs, especially statistics and probability. Initiated in 1989, IMP is one of the first complete high school mathematics curricula to be developed that puts into practice what is recommended in the NCTM Standards. This report is one of a series attempting to produce evaluative information about IMP and, as a consequence, about assumptions and recommendations in the NCTM Standards. This study is a replication of one conducted in the 199596 school year, to judge whether students enrolled in IMP learn mathematics beyond what students learn in traditional collegepreparatory mathematics classes. The findings from the original study, which used students from only one grade at each of two schools, indicated that IMP students scored higher than students in the traditional mathematics course, controlling for prior achievement, on a test of statistical concepts and procedures and on a performance assessment. The second study, in general, replicates the findings from the original study. Grade 9 students at two high schools, different from the ones in the first study, enrolled in IMP Year 1 scored significantly higher on a fiveitem statistical test than grade 9 students at the same high schools enrolled in the traditional algebra I, geometry, and algebra II course sequence. Grade 10 students at the same two high schools enrolled in IMP Year 2 scored significantly higher on a twoactivity performance assessment requiring reasoning, problem solving, conjecturing, and generalizing. At a third high school, grade 9 IMP students scored significantly lower on the statistical test than grade 9 students whose traditional algebra I course had been supplemented with an instructional unit directly related to what was tested. Grade 10 IMP students at this school scored higher, but not significantly so, than grade 10 students whose traditional geometry course had been supplemented with instruction directly related to content of one of the two performance assessment activities. Statistical significance was judged by analysis of covariance, using grade 8 standardized mathematics achievement scores as the covariable. The findings of the original study and its replication support the claim that students in IMP learn more statistics and are better able to solve complex problems, on the average, than students enrolled in a traditional collegepreparatory mathematics class. Findings are attributable to different content coverage and instructional experiences in IMP, supporting claims by the developers of the curriculum. Replication Study of the Comparison of IMP Students with Students Enrolled in Traditional Courses on Probability, Statistics, Problem Solving, and Reasoning Norman L. Webb and Maritza Dowling Wisconsin Center for Education Research, University of WisconsinMadison Mathematics is an ever expanding field. New mathematics is constantly being created. What mathematics is useful in daily life, different fields of study, and the work place changes over time. This changing nature of mathematics has important implications for what should be incorporated in school mathematics curricula. Over a twoyear period and with input of literally thousands of people, the National Council of Teachers of Mathematics published in 1989 the Curriculum and Evaluation Standards for School Mathematics. This document specified a shift in what should be included in the K12 mathematics curriculum, taking into consideration changing societal goals related to the increased importance of technology and communication. The origins of the NCTM Standards have been traced back to the new mathematics era of the 1960s and to a number of subsequent documents and reports (McLeod, Stake, Schappelle, Mellissinos, & Gierl, 1996). The NCTM Standards is not a total departure from the traditional curriculum, but more of a shift in emphasis in what mathematics should be taught and how mathematics should be taught. Traditional topics of algebra, geometry, trigonometry, and functions remain important components for students to know. However, greater emphasis is sought for conceptual understandings, multiple representations and connections, mathematical modeling, and mathematical problem solving, with less emphasis on memorization of isolated facts and procedures (NCTM, 1989, p. 125). Topics from statistics, probability, and discrete mathematics are recommended to be more central to the 912 mathematics curriculum for all students. An important concern of those who wrote the NCTM Standards was to specify general goals of a rigorous mathematics school curriculum for all students. In 1987, at the time the Standards were being drafted, 77% of high school graduates had earned credits in algebra I, 61% had earned credits in geometry, and 46% had earned credits in algebra II (National Science Board, 1989). Only onehalf of one percent of the 1987 high school graduates had earned any credits in statistics or probability (p. 201). These data indicated at the time a sharp attrition rate in the more advanced mathematics courses and nearly a void in any statistics courses. There also was a large difference by ethnicity in mathematics courses taken by high school graduates in 1987. About 20% more white students had credits in geometry and algebra II than did Hispanic or black students of those who graduated in 1987 (National Science Board, 1996, p. A14). Since 1987, there has been some increase in students taking the more advanced mathematics courses and some reduction of the gap between whites and those from other ethnic groups. Of the 1994 high school graduates, 66% had taken algebra I while in high school (93% at some point before graduation), 70% had taken geometry, and 59% had taken algebra II. The gap in percentage of those receiving credits by ethnicity had narrowed to nearly zero among students with credit in algebra I among whites, blacks and Hispanics and to nearly zero between whites and Hispanics and 14% between whites and blacks of those with credit in geometry. By 1994 there had been a reduction in the difference in percentages of those with algebra II credit among the ethnic groups, but not to the same degree as for algebra I and geometry. A gap of about 10% between whites and Hispanics and 18% between whites and blacks still remained (Smith, Young, Bae, Choy, & Alsalam, 1997, p. 269). Over seven years, the total percentage of high school graduates with credit in algebra II increased by about 10%. The gap between whites and Hispanic students narrowed, but between whites and blacks it remained about the same. A contributing factor to more students taking up to three years of collegequalifying mathematics was the increase in the number of colleges and universities requiring three years of mathematics for admission. By 1994, there were some indications of progress toward the NCTM Standards vision of all students achieving rigorous academic standards, but over 40% of the high school graduates were leaving without three years of collegequalifying mathematics. Less than 3% of the 1994 graduates had any credit in statistics/probability. Development of the Interactive Mathematics Program began in 1989 with funding from the California Postsecondary Education Commission and then continued in 1992 with major funding from the National Science Foundation (Schoen, 1993). The curriculum is designed to be aligned with the core curriculum recommended in the NCTM Standards. Students who complete the four years of IMP are to be prepared both for continued study of mathematics in college and for the world of work (Fendel, Resek, Alper, & Fraser, 1997). The problembased curriculum incorporates traditional branches of mathematics including algebra, geometry, and trigonometry as well as topics that have been given little attention in the traditional high school program, especially statistics and probability. Students are to experiment, investigate, ask questions, make and test conjectures, reflect, and accurately communicate their ideas and conclusions. Statistics Requirements in University Courses Statistics is becoming a more important field of study with greater applications in a number of other academic fields and in the work place. Enrollment in statistics courses at fouryear colleges and universities has increased over the period from 1990 to 1995. Enrollment in undergraduate statistics courses from 198990 to 199495, as measured by fall enrollment in departments of mathematics and departments of statistics, increased by 23% from 169,000 to 208,000 (Loftsgaarden, Rung, &Watkins, 1997, p. 1). Over this same period, even though overall college and university enrollment remained nearly constant, calculuslevel enrollment declined by 18%, to 539,000 students, down from 648,000 students (numbers rounded to 1,000); precalculus enrollment increased by 4%, from 592,000 to 614,000; and remedial level mathematics course enrollment declined by 15%, from 261,000 to 222,000. From 1990 to 1993, total enrollment in institutions of higher education increased by nearly 4% from 13,819,000 to 14,306,000 (Synder & Hoffman, 1995, p. 176). The increased enrollment in statistics courses was due to greater numbers of students taking elementary statistics and not a change in the enrollment in upper level statistics courses. Fall enrollment in statistics courses in twoyear colleges’ mathematics programs also increased by 33% from 1990 to 1995, from 54,000 to 72,000. Knowledge of some statistics and quantitative reasoning are valued in fields other than mathematics and statistics. From 1990 to 1996, Steve Bauman, a professor in the department of mathematics at the University of WisconsinMadison, and others conducted a study of the mathematics expectations for courses other than mathematics and statistics (Bauman, 1997). Professors of upper level courses, those most often taken by students in their third or fourth year at the university, were interviewed, generally over two or more occasions, and asked to identify examples of mathematical activities they expected their students to know at the beginning of their courses. Analyses were done for three types of courses—those that did not require calculus, those that required one semester of calculus, and technical courses that required more than one semester of calculus. Nine natural science courses (e.g., astronomy, botany, and genetics) and eight social sciences courses (e.g., education policy, journalism, and psychology) were included in the analysis for the firstlevel courses, those that did not require calculus. For only one of these courses, a political science course, did a professor identify that students needed to use the quadratic formula to solve an equation or to solve a polynomial equation by factoring. For 8 of the 17 courses, students were expected to know how to evaluate a linear expression and to solve for a variable. For all but one of the 8 social sciences courses, students were expected to know how to calculate proportions as percentages. For 7 of the 17 courses, students were expected to be able to compute the mean from a data set and describe the change in the mean if a data value increased. Eight level 2 courses, those requiring one semester of calculus, were included in the study: three science courses (biology, food science, physics) and five business and economics courses (e.g., agricultural economics, economics, and business). In all of the courses, students were expected to know some mathematics, but the nature of the mathematical activity varied. For the largest number of courses, six of the eight, professors expected students to sketch a system of two linear equations and find a solution. For half of the courses, mainly business and economics courses, students were expected to estimate a value, slope, and yintercept from a linear graph and use a graph of profit functions to estimate an interval where profit is positive and the value where profit is maximum. For five of the eight courses, students were expected to do some statistics. For three of the courses, not always the same three, students were expected to compare the mean and standard deviation of monthly growth rates for a sevenelement data set; compute the sample mean of a fiveelement data set and describe the change in mean if a value is increased; and compute a standard deviation (two courses when given the formula and one course without the formula). This study was done at only one institution of higher education, but a very large university with a large number of course offerings. The study also was more exploratory rather than rigorous, without a systematic selection of what courses to study. What the findings do indicate is that professors of a range of courses in fields other than mathematics expect students to have knowledge of statistics. These results help support the view that knowledge of statistics is a prerequisite for study in a range of fields, including agriculture, business, social science, and field sciences. This study, along with the increasing number of students who take statistics courses in higher education, indicates the growing importance of statistics as required knowledge in higher education. Changing Needs in the Work Place Statistics, problem solving, and reasoning are important competencies in the work place strongly influenced by the globalization of commerce and industry and the growth of technology on the job. The SCANS report (United States Department of Labor, 1991) identified five competencies and a threepart foundation of skills needed for solid job performance. One explanatory example for the competency on information is the ability to analyze statistical control charts to monitor error rates. A number of thinking skills were identified as a foundation for all skills including thinking creatively, making decisions, solving problems, and reasoning. The report went on to recommend compatible principles for learning, including the need for students to learn basic skills and problemsolving skills, and the need for practice in the applications of skills. Since the mid1980s, large corporations such as Xerox and Motorola have adopted a total quality approach to operation. This approach is based, in part, on employees being trained to use statistical quality control analysis and statistical tools (Kearns & Nadler, 1992, p. 210; Smith, 1995, p. 334). The knowledge of statistics and the ability to solve problems are becoming more important in both higher education and the work place. However, finding curriculum materials aligned with the Standards remains a challenge (FerriniMundy & Schram, 1997). This study was designed to ascertain whether students enrolled in IMP were gaining knowledge in these academic areas increasing in prominence, a necessary condition for fulfilling the recommendations in the NCTM Standards. IMP Incorporation of Statistics and Problem Solving The IMP curriculum is a fouryear collegepreparatory sequence of courses designed for grades 9 through 12. The IMP curriculum integrates traditional areas of mathematics such as algebra, geometry, and trigonometry with probability, statistics, discrete mathematics, and matrix algebra. Students are challenged to actively explore openended situations in a way that closely resembles the inquiry methods used by mathematicians and scientists. IMP calls on students to experiment with examples, look for and articulate patterns, and make, test and prove conjectures. The problembased curriculum is organized into five to eightweek units centered on a problem or theme. Students engage in solving both routine and nonroutine problems, use graphing calculators, and are encouraged to work cooperatively. Students learn how to calculate simple probabilities, the distinction between theoretical and experimental probabilities, the meaning of a bestfitting line for a set of data, properties of normal and other distributions, the calculation and use of standard deviation, and the notion of testing a null hypothesis and methods to do so. Teaching techniques used in IMP are designed to help students gain deep understanding of mathematical ideas, reason mathematically, and apply mathematics to solve problems. For example, one night's homework involves four sets of data, about which students are to answer a number of questions including spread from the mean, standard deviation, and the similarity among the data sets (Fendel, Resek, Alper, & Fraser, 1997, p. 343). In addition, students are assigned Problems of the Week (POW) to work on for five or more days in addition to daily homework. Students' ability to reason is enhanced in many ways. For instance, students are asked to design experiments, to state their conclusions based on evidence and analyses, to compare mathematical ideas, and to form generalizations from specific situations. Classroom experiences such as presentations, written explanations, and smallgroup activities are structured for students to verbalize their thinking. This verbalization is designed to increase students' understanding and to improve their communications of mathematics. Students are to become more independent learners by using multiple sources of information including their teachers, the textbook, classmates, and references materials. The purpose of this study was to replicate findings from a study conducted at the end of the 199596 school year (Webb & Dowling, 1997). The intent of the original study and of this replication was to evaluate the claim that IMP students achieve higher on statistics and complex problems than students in the traditional college preparatory mathematics. There are multiple reasons for this claim. One main reason is that IMP emphasizes statistics, probability, and solving extended problems more than do most traditional algebra I, geometry, and algebra II curricula. Other reasons are that students in IMP classes will have the opportunity to use statistics and probability in the context of realistic situations and will understand the mathematics better by having to apply it to solve problems. The results from both the original study and its replication are reported here. Both studies specifically targeted areas of mathematics with emerging importance and recommended by national education groups to be incorporated into the high school mathematics curriculum. They were not designed to be a comprehensive study of all that students learn from IMP, but to confirm evidence that students in IMP are developing an understanding of mathematics that includes knowledge of statistics and solving complex problems, two topics normally not emphasized in the traditional mathematics curriculum. An increasing number of studies have compared the mathematical knowledge of students enrolled in the Interactive Mathematics Program (IMP) with that of students enrolled in the traditional algebra I, geometry, algebra II sequence. Accumulating evidence shows that IMP students perform as well as, if not better than, students taking the traditional curriculum, as measured by standardized normreferenced tests such as the Scholastic Assessment Test (SAT), PreScholastic Assessment Test (PSAT), and the Comprehensive Tests of Basic Skills (IMP, 1997; Webb & Dowling, 1995a, 1995b, 1995c, 1996). These traditional instruments measure students' knowledge of very general mathematics skills and reasoning in mathematics, but give little or no attention to probability, statistics, quantitative reasoning, and problem solving. A quasiexperimental study (Campbell & Stanley, 1963) was conducted that employed experimental and control groups, with a pretest and posttest but without random assignment. The study was replicated one year later. Grade 8 standardized normreferenced mathematics test scores were used as a premeasure representing students’ mathematics achievement upon entry into high school. Tests were administered to students enrolled in IMP classes and students enrolled in the ongrade level traditional collegepreparatory mathematics course. Teachers of IMP and the traditional mathematics classes volunteered to have their students tested. Students were not randomly assigned between IMP and traditional classes. A small sample design was used in the original study (Table 1). This was done to impose the necessary controls that required each school to provide grade 8 standardized normreferenced mathematics test scores for both IMP and traditional students; to have an adequate number of students (at least two classes each) enrolled in both IMP and the traditional course; to have both IMP and the traditional teachers willing to cooperate; and to have someone at the school willing to oversee the administration of the testing to assure this was successfully completed. A replication of the small sample study was conducted one year later by testing students in three high schools (Table 2). Replications can help to establish the conditions under which results hold and can contribute to the generalizability of findings (Shaver & Norton, 1980). Important to conducting replications and drawing meaning from the results is a description of the population tested and the target population. Replication studies have been advanced as important for research in the natural sciences (Campbell & Jackson, 1979) and are particularly viable for evaluation studies. Evaluators infrequently are afforded the opportunity to control conditions sufficiently to establish attested results. Repeating a study of the effects attributable to a curriculum at a different time, in different schools, in different areas of the country, and with different teachers, helps to validate the findings and extend these to a larger population. Certain conditions were imposed on the selection of outcome measures. The tests had to come from a source independent of IMP and had to be easily administered under the same conditions to classes of students in both course sequences. Each participating school had to provide grade 8 standardized normreferenced mathematics test scores on a significant number of IMP and traditional mathematics students who were tested. The inconvenience to teachers and students had to be kept to a minimum. Students' knowledge was sought on as wide a range of content as possible. Regional and school factors had to be reduced as much as possible. Effects had to be attributed to the curriculum rather than to other factors such as teacher, school, or region. The grade 9 test (Appendix) was composed of all statistics items (a total of four) released from the Second International Mathematics Study (SIMS) (Crosswhite et al., 1986). These items were administered to grade 12 students in the 198182 school year. The items were modified from the multiplechoice format used in SIMS to an openresponse format. At the time this study was conducted, the grade 12 items used in the Third International Mathematics and Science Study had not been released. If these had been available, they would have been used in order to have more current data. The SIMS items used are basic data analysis and statistics items that do not include any information that would date the items. Item 1 required students to determine the approximate average weekly rainfall from a bar graph. Item 2 required the computation of a weighted average. Item 3 (scored in two parts) required students to analyze a linear transformation on the mean and standard deviation of a distribution. Item 4 required the application of properties of the normal curve (identifying the proportion of the area under the curve related to +/ one standard deviation). Because item 3 was scored in two parts, the total test was considered to have five items. At grade 10, students completed two performance assessment activities (Connecting Nodes and New Cubes) prepared for the Wisconsin Student Assessment Systems. Each performance assessment activity required students to construct a response by solving a multistep mathematics problem, generalizing the results obtained, and writing an explanation of the reasoning process and any procedures used. The estimated time required to work each activity was 20 minutes. Connecting Nodes required students to determine the number of connections between pairs of nodes in a network given the number of nodes. A high score on this activity required demonstration of some skills in solving problems, reasoning, developing and testing conjectures, writing clear and correct explanations, computing, and extending thinking to a general case. New Cubes required some understanding of probability and its application to a new situation. A high score on this activity required verifying that a probability is the highest among all possible for the situation, computing all possibilities for an invented set of dice, determining the expected value for a number of trials, and describing in writing the reasoning for the given result. A sixlevel holistic scoring rubric was used to rate student responses to each of the two performance assessment activities. The highest rating, "Advanced" (5), indicated that a student demonstrated a highly developed understanding of the mathematics required by the activity and was able to communicate this understanding very clearly. The second highest rating, "Proficient" (4), indicated that the student demonstrated an acceptable understanding of the mathematical requirements of the activity with a coherent description of work. The other ratings, "Nearly Proficient" (3) to "Attempted" (1), indicated progressively more serious misconception of the requirements of the problem from minor to seriously flawed reasoning. Any paper left blank, or which had a response demonstrating a total lack of engagement in the activity, was assigned a level of "not scorable" and assigned a value of 0. In the replication study, only three responses on Connecting Nodes and 12 responses on New Cubes were given a value of 0. Each student response was scored independently by two or more raters trained in using the general rubric. A third rater adjudicated all pairs of ratings not within an acceptable range. In the first study, the first two raters reached acceptable agreement for 86% of the ratings. In the replication study, raters attained exact agreement on 80% of the papers in scoring Connecting Nodes and on 85% of the papers in scoring New Cubes. An acceptable agreement between raters was achieved on over 97% of the papers. The total possible score for the two performance assessment activities was 10. A “proficient” or “advanced” score should not always be interpreted to mean that the student applied problemsolving skills to a novel situation. These ratings can be attained by a student demonstrating full understanding of the mathematics, such as giving the formula for the generalized case with a sufficient explanation (e.g., giving the formula for computing n nodes taken two at a time for Connecting Nodes). The IMP coordinator at each school oversaw the details of gaining teachers’ participation and of administering the tests, all of which were administered by the classroom teachers. Students were given one class period to complete all of the items, well within the time requirements for each instrument. Teachers reported some variation in how seriously students engaged in taking the tests. School 3 had a higher proportion of both IMP students and the traditional students who did not exert any effort to take the voluntary tests. At the other schools, teachers did not report any irregularities in the administration of the test. The IMP coordinator at School 2 (grade 10 testing) reported, "I believe the students tried their best. They were told they were taking an ‘IMPlike’ test for comparison because the IMP students had taken many traditional tests. Some may have viewed it as a competition between IMP and Geometry." Schools participating in the original study and the replication study were located in four different states—two western, one midwestern, and one eastern. Two of the five schools (Schools 2 and 5) are in large urban areas. Two of the other three schools are located in suburban areas in medium size communities located close to a large urban area. These communities are composed of mixed social and economic populations, but predominantly are middle class. The fifth school (School 1) is in a medium size community. All five schools are public high schools. School 1 is located in a city in the western United States and serves a diverse group of students. Three Year 1 IMP classes were taught in School 1, all by the same teacher. All the students in these three classes were tested along with four algebra I classes taught by two different teachers. The grade 8 Comprehensive Tests of Basic Skills (CTBS) scores were used as the measure of prior knowledge of mathematics. School 2 is a select public high school located in a large midwestern city. The school, which had an enrollment of over 2,100 in the 199697 school year, serves ethnically diverse students who must meet minimal requirements to enroll in the school, including scoring at least in the 6th stanine on grade 8 Iowa Tests of Basic Skills (ITBS). The racial distribution in the student body included 52% black, 19% white, 15% Asian, 13% Hispanic, and 0.2% Native American. Nearly all of the students will continue their education at universities or colleges. Students tested were in 4 IMP Year 2 classes, taught by two teachers, and in six geometry classes, three taught by each of two teachers. The grade 8 ITBS scores were used as the measure of prior knowledge of mathematics. School 3 is a large high school with an enrollment of over 3,100 students. The high school serves a middle income, suburban area on the edge of a medium size urban area in a western state. The student body by ethnicity was 64% white, 20% Hispanic, 12% Asian/Pacific Islander/Filipino, 3% black, and 1% American Indian. About 90% of the 1996 graduating class planned to attend some form of higher education. School 4 serves a suburban area in a western state. The school enrollment was over 2,100 students, most of whom were from middle class families. The racial composition for the school was 83% white, 7% black, 6% Hispanic, 4% Asian/Pacific Islander, and less than 1% Native American. School 5 is in a urban area in the east and considers itself urban/suburban. The school enrollment was approximately 1,700 students and included students from 68 different countries. The school served predominantly middle class students. About 80% of its graduates enroll in higher education. Nearly 14% of the student body came from lowincome families. The racial composition of the school was 70% white, 14% Asian/Pacific Islander, 11% black, 5% Hispanic, and a few students of other racial backgrounds. In the first study (Webb & Dowling, 1997), three grade levels of students were tested, grades 9, 10, and 11. Because permission was not received to administer the grade 11 outcome measure a second time, the replication study included only grades 9 and 10; therefore, this report will pertain only to grades 9 and 10. At the two schools in the 199596 study, students from a range of grades were enrolled in both the IMP classes and the traditional classes. Only data for students at the targeted grade levels were used in the analyses. A total of 115 grade 9 students were used in the analysis, 60 IMP students (57% female) from three classes taught by the same teacher and 55 students enrolled in algebra I (51% female) from four classes taught by two teachers (Tables 1 and 3). The demographics of the IMP group varied some from the algebra I group (Table 4). The IMP had a higher percentage of white students (63% compared to 31% of the grade 9 algebra I students). A higher percentage of the algebra I students were Asian or Pacific Islander (29% compared to 17% of the IMP students). In the first study, 199596, a total of 184 grade 10 students were used in the analysis, 87 IMP students (61% female) from 4 classes taught by two teachers and 97 geometry students (65% female) from six classes taught by two teachers (Tables 1 and 5). Blacks constituted the highest percentage of students in both IMP (45%) and the traditional (54%, Table 6). Only 18% of the IMP students and 8% of the traditional students were white. In the replication study, at the end of the 199697 school year, IMP and traditional mathematics students from both grade 9 and grade 10 were tested in three high schools not included in the original study. Students in grade 9 were tested on knowledge of some statistics, and grade 10 students were tested on problem solving using the same instruments as those used in the original study. Grade 8 scores on standardized normreferenced tests were obtained for nearly all of the students. For each grade level and at each school, the design was to test students of two or more teachers in each curriculum. This condition was met only in School 5 (Table 2). At the other two schools, teachers chose not to participate at the last minute for a number of reasons including endofschoolyear pressures. As in the original study, only data for students at the targeted grade levels were used in the analyses. At School 4 in the replication study, the traditional algebra I course was enhanced by including a unit on some of the statistical concepts tested (e.g., properties of the normal distribution). At this same school, the traditional geometry course taken by grade 10 students was enhanced by including instruction on combinations by studying the number of ways n objectives can be combined two at a time. This instruction was directly related to one of the extended problemsolving activities (Connecting Nodes). The nonIMP program at School 4 was identified as a supplemented traditional program (labeled “traditional+”). The tables report the results for students enrolled in the two traditional course sequences at School 4 separately because of the nature of the instruction these students received. In the replication study, the final population for the analyses for IMP grade 9 consisted of 105 students (46% female) from seven different classes and taught by six teachers (one teacher taught two of the classes); the final population for traditional grade 9 consisted of 63 students (57% female) from five classes taught by four teachers; and for the supplemented traditional grade 9 the final population consisted of 40 students (55% female) from three classes all taught by the same teacher (Tables 2 and 3). For the grade 9 replication study, white students were the largest percentage of students in all groups—75% of the IMP students, 44% of the traditional students, and 72% of the supplemented traditional students (Table 4). All three groups included students from a number of different ethnic groups; however, all nonwhite ethnic groups were less than 20% of the total group. The final population for the analyses for IMP grade 10 consisted of 132 students (48% female) from eight classes taught by five teachers; for traditional grade 10 the final population consisted of 77 students (56% female) from eight classes taught by four teachers; and for supplemented traditional grade 10 the final population consisted of 45 students (71% female) from three classes all taught by the same teacher (Tables 2 and 5). For grade 10, the highest percentage of students were white—73% of the IMP students, 60% of the traditional students, and 80% of the supplemented traditional students (Table 6). The replication study conducted at the end of the 199697 school year supports the findings of the study conducted at the end of the previous school year. This was true for both grade 9 and grade 10. Grade 9 students enrolled in IMP Year 1 performed higher on a set of statistics items than students enrolled in the traditional college preparatory mathematics. At each of three high schools, IMP Year 1 students achieved a higher mean score than students in the traditional courses (Table 7). In the replication study when results for Schools 3 and 5 were aggregated, IMP students preformed significantly higher when accounting for prior achievement as measured by grade 8 normreferenced standardized tests using an analysis of covariance (Table 8). The mean score of 1.72 attained by grade 9 IMP students at schools 3 and 5 combined was significantly higher (p<.01) than the mean score of .87 attained by students in the traditional class (Tables 8 and 9). When the traditional course was supplemented by instructional experiences directly related to the statistical ideas tested, as was the case at School 4, students in the traditional course performed higher than the IMP students at grade 9. Information that the traditional course had been supplemented with instruction directly related to what was tested (e.g., properties of the normal distribution) only was revealed after the testing and during the analysis of data. Because of the difference in instruction, the data from the traditional classes at School 4 were analyzed as a second alternative curriculum program, a supplemented traditional program, rather than being aggregated with data from the other two schools. Grade 9 students in the supplemented traditional course at School 4 outperformed the students enrolled in the IMP Year 1 at the same school. Students in the supplemented traditional course attained a mean of 2.68 compared to a mean of 1.78 by the IMP students (Table 7). The fiveitem statistics test had similar standard errors of measurement (.82 and .84) each year (Table 9). The test had a higher reliability, indicating more consistency in item responses, in the first year (.73) than in the second (.47). For an instrument with only five items, a reliability of .73 is high. Students in both years and for both curricula attained the full range of possible scores. Some students answered correctly all five of the openresponse questions, and some students did not answer any of the items correctly. The pattern of results by individual items on the statistics test (Table 10) helps to interpret some of the findings. The pattern of item results for both the initial study and its replication for the traditional students is very consistent even though students who were tested came from three different schools, one located in a different part of the country from the other two. The average item response by grade 9 students enrolled in the traditional courses each year on each item was lower than the average item response by grade 9 IMP students for each year. About onehalf of the traditional students each year were able to interpret information presented in graphical form (item 1), compared to over 70% of the IMP students. Onethird, compared to over 50% of the IMP students, could determine the effect of a linear transformation on the mean of a distribution (item 3a). A low percentage, about one out of ten or fewer, of traditional students were able to answer correctly each of the other three items—computation of a weighted average (item 2), effect of a linear transformation on the standard deviation (item 3b), and application of properties of a normal curve (item 4). Comparing results of the IMP students in the two years, the IMP students performed nearly the same on item 1 (interpretation of information in graphical form) both years. About threequarters of the IMP students in both years answered this item correctly, a higher percentage than the traditional students but lower than the students who had experienced a supplemented traditional program. Over half of the IMP students both years were able to correctly identify the effect of a linear transformation on the mean of a distribution (item 3a). Again, this was a higher percentage than students enrolled in the traditional program, but only IMP students tested in the first year scored higher than students enrolled in the supplemented traditional program (87% compared to 75%). On the other three items, IMP students from School 1 scored higher than IMP students tested in the replication study. This suggests some variation in the degree that students are gaining knowledge of these ideas within IMP. The IMP students from School 1 scored higher on three of the five items than all of the other groups, including those receiving the supplemented traditional program (items 2, 3a, and 3b). This was not the case for IMP students from Schools 3, 4, and 5. Grade 10 students enrolled in IMP Year 2 performed higher on a test of two extended problemsolving activities than grade 10 students enrolled in the traditional college preparatory mathematics, (Table 11). Accounting for prior achievement as measured by grade 8 normreferenced standardized tests, an analysis of covariance showed the difference to be significant in the original study and in the replication study for the IMP/traditional comparison (Table 12). At School 4, the traditional geometry course was enhanced by including instruction on computing the combinations of n objectives taken two at a time. The grade 10 students at School 4 outperformed the IMP students on one problem that required computing combinations (Connecting Nodes), but had a lower mean total score based on the results on both problemsolving activities, 5.74 for the IMP students compared to 5.56 (Table 11). The difference in means was not significant using an analysis of covariance accounting for grade 8 achievement. For grade 10 in both years, the mean total score of IMP students at each of the four schools was higher than the mean total score of students enrolled in the traditional course at each of three schools (Table 11). At each of the three schools, the mean total score of IMP students was higher than that of students enrolled in the supplemented traditional course. With the exception of School 4, both IMP students and the traditional students scored higher on the New Cubes problem than on the Connecting Nodes, requiring students to identify the probability of rolling a specific value using two dice and estimating the expected value for a large number of trials. At School 4, both IMP students and those enrolled in the supplemented traditional classes attained a higher mean score on Connecting Nodes than on New Cubes. School 4 students in the traditional supplemented classes had the highest mean score on Connecting Nodes of any group, IMP or traditional. A high percentage of the School 4 students in the supplemented traditional classes gave the generalized formula for n objects taken two at a time (n[n1]/2). Both problems required specific knowledge of mathematical concepts and general reasoning. Higher performance on both items (a value of 4 or 5) required demonstrating the understanding of the mathematical ideas without any major conceptual errors and attaining logical conclusions. A higher percentage of IMP students at School 4 achieved a proficient (score of 4) or an advanced rating (score of 5) on both problems than those enrolled in the supplemented traditional course, 31% compared to 13%. Whereas School 4 students in the supplemented traditional classes performed higher on one of the problems than IMP students at any of the four schools, these students performed lower on the other problem than IMP students at any of the schools. This suggests the students had achieved specific knowledge related to one problem, but had not achieved either the specific or general knowledge required to perform well on both problems. Findings from School 4 help confirm the sensitivity to instruction of the measures used in this study. Students performed better when given curriculum materials, either IMP or supplementary direct instruction, related to statistics in grade 9 and combinatorics in grade 10. The mean scores by students in the unenhanced traditional courses in both years varied little, only by .25 (about onethird of a standard deviation) from a high of 1.07, School 3, and a low of .82, School 5 (Table 7). In the grade 10 replication study, the maximum difference in mean total scores for the traditional groups between School 2, School 3, and School 5 was .84, a little more than half of the overall standard deviation for the traditional mathematics (Table 11). At grade 10 each group of IMP students at each of the four schools scored higher than any group of students enrolled in the traditional course. The consistency in results adds to the validity of the findings. When the traditional curriculum was supplemented by instruction directly related to the knowledge tested, then students performed better. However, three of the four groups of IMP students at grade 10 still scored higher than the students in the supplemented traditional course. The results of this study, as a replication of a previous study, increase the confidence that students enrolled in IMP are learning more about statistical ideas and solving complex problems than students in a traditional college preparatory curriculum. The findings in the replication study are consistent with those of the original study, at both grade 9 and grade 10. Statistics and problem solving are being advanced as increasingly more important in higher education and in the work place. This study indicates that students in IMP are learning mathematics beyond what students learn in the traditional college preparatory mathematics curriculum. This finding, along with mounting evidence that IMP students do as well as students enrolled in the traditional collegepreparatory mathematics curriculum on standardized normreferenced tests such as the SAT, provides evidence that IMP maintains students’ learning on general understanding of mathematics while increasing their knowledge of areas emerging in importance given less attention in the traditional mathematics curriculum. This study was one of many evaluating the impact of IMP. Although the study is insufficient to explain all of the reasons for the results, the most obvious reason is that the IMP curriculum includes instruction and content on statistics and problem solving that are not included in the traditional curriculum. The results from School 4, where students in a supplemented traditional curriculum achieved as well if not better on some of the outcome measures, indicate that when students are given instruction on statistics and other mathematics tested, they perform better. This is not surprising. When students have solved the same problem or a similar problem that is being tested then the knowledge being tested is more recall of information rather than students’ knowledge of how to resolve a nonroutine situation. The strong consistency in how School 4 students in the supplemented traditional classes responded to the questions provide some indication that many of the test items and activities had become routine for them. There was more variation in the form of responses by the IMP students. As a group, these IMP students received instruction of a different nature than the more direct instruction in the supplemented traditional classes. A greater level of analysis than the scope of this study allows would be needed to fully explain the interaction between the form of instruction and the mathematics that students learn. IMP does not claim that it is the only or the best curriculum for having students learn statistics and how to solve problems. It does claim to integrate traditional mathematics content usually taught as individual subjects, giving more emphasis to statistics and problem solving and providing students with the mathematics they will need after graduating from high school. The findings from this study support the claim that students in IMP do learn more statistics and are better able to solve complex problems, on the average, than students in traditional collegepreparatory mathematics classes. There are limitations to this study. The small number of items on both the grade 9 statistics test (five) and on the grade 10 problem solving test resulted in low instrument reliability. This means that the instruments were very susceptible to minor influences such as supplementing the traditional curriculum with one unit of instruction. All of the items used in the study were openresponse items, thus eliminating guessing. The desired criteria of testing classes of students of two or more teachers for each curriculum program at each school was not met. This increases the likelihood of teacherbyprogrambyschool effects. This condition was met fully only by School 5, and partially by School 3 for grade 9. Combining data across Schools 3 and 5 helps to eliminate this issue. However, having to analyze the data from School 4 separately means that the results for School 4 could be due to the teacher as much as to the curriculum. The findings for the IMP classes at School 4 are consistent with those at the other two schools in the replication study, which adds evidence that there was a curriculum effect. Without any confirming evidence on the supplemented traditional curriculum at School 4, it is difficult to form any generalizations beyond the school or the teacher at each grade on this program. The same measure on grade 8 achievement was not available for each of the schools. Students at School 3 had taken the Comprehensive Tests of Basic Skills (CTBS) as eighth graders. Students at Schools 4 and 5 had taken the Iowa Tests of Basic Skills (ITBS). These two widely used normreferenced standardized tests are not equivalent. In this study the norm percentile ranking score was used as a covariate variable. Since the scores were used only as a covariate, an assumption was made that the percentile rankings would be sufficiently close with any variations normally distributed. Thus, for the analyses of covariance, the grade 8 normreferenced scores from the two tests were used as if produced by the same test. This evaluation study was conducted without strong controls such as random assignment or close monitoring of test administration. The tests were administered in the last month of the school year, at a time when not all students are motivated to do their best on an additional test; the performance on the tests most likely indicates a lower limit to achievement rather than best possible results. Even with these limitations, the consistency of findings between the two years and between the schools increased our confidence in their validity and in the conclusion that students in the IMP curriculum are attaining higher on some statistical ideas and in solving problems than those in the traditional curriculum. Bauman,
S. (1997). [Assessment test problems in level 1, 2, and 3 courses 199096].
Unpublished raw data. Campbell,
K. E., & Jackson, T. T. (1979). The role of and need for replication
research in social psychology. Replications
in Social Psychology, 1, 314. Campbell,
D. T., & Stanley, J. C. (1963). Experimental
and quasiexperimental designs for research. Chicago: Rand McNally. Crosswhite,
F. J., Dossey, J. A., Swafford, J. O., McKnight, C. C., Cooney, T. J.,
Downs, F. L., Grouws, D. A., & Weinzweig, A. I. (1986). Second International Mathematics Study, detailed report for the United
States. Champaign, IL: Stipes. Fendel,
D., Resek, D., Alper, L., & Fraser, S. (1997). Interactive Mathematics Program: Integrated high school mathematics.
Berkeley, CA: Key Curriculum Press. FerriniMundy,
J., & Schram, T. (Eds.). (1997). The Recognizing and Recording Reform in
Mathematics Education Project: Insights, issues, and implications. Monograph
Number 8. Journal for Research in
Mathematics Education, Reston, VA: National Council of Teachers of
Mathematics. IMP
students demonstrate high achievement. (1997, Fall). Evaluation
Update, 3, 14. Kearns,
D. T., & Nadler, D. A. (1992). Prophets
in the dark. New York: HarperBusiness. Loftsgaarden,
D. O., Rung, D. C., & Watkins, A. E. (1997). Statistical abstract of undergraduate
programs in the mathematical sciences in the Unites States: Fall 1995
CBMS survey. Washington, DC: Mathematical Association of America. McLeod,
D. B., Stake, R. E., Schappelle, B. P., Mellissinos, M., & Gierl, M. J.
(1996). Setting the standards: NCTM’s role in the reform of mathematics
education. In S. A. Raizen & E. D. Britton (Eds.), Bold
ventures: Vol. 3. Case studies of U.S. innovations in mathematics education.
Dordrecht, The Netherlands: Kluwer. National
Council of Teachers of Mathematics. (1989). Curriculum
and evaluation standards for school mathematics. Reston, VA: Author. National
Science Board. (1989). Science &
engineering indicators  1989 (NSB 891).
Washington, DC: U.S. Government Printing Office. National
Science Board. (1996). Science &
engineering indicators  1996 (NSB 9621). Washington, DC: U.S.
Government Printing Office. Schoen,
H. L. (1993). Report to the National Science Foundation on the impact of the
Interactive Mathematics Project (IMP). In N. L. Webb, H. Schoen, & S. D.
Whitehurst (Eds.), Dissemination of nine precollege mathematics
instructional materials projects funded by the National Science Foundation,
198191. Madison: University of WisconsinMadison, Wisconsin Center for
Education Research. Shaver,
J. P., & Norton, R. S. (1980). Randomness and replication in ten years
of the American Educational Research Journal. Educational
Researcher, 9(1), 915. Smith,
H. (1995). Rethinking America. A new
game plan from the American innovators: Schools,
business, people, work. New
York: Random House. Smith,
T. M., Young, B. A., Bae, Y., Choy, S. P., & Alsalam, N. (1997). The
condition of education 1997. (NCES 97388). Washington, DC: U.S.
Government Printing Office. Snyder,
T. D., & Hoffman, C. M. (1995). Digest
of education statistics 1995 (NCES 95029). Washington, DC: U.S.
Government Printing Office. United
States Department of Labor, The Secretary’s Commission on Achieving
Necessary Skills. (June, 1991). What
work requires of schools. A SCANS report for American 2000. Washington,
DC: U.S. Government Printing Office. Webb,
N. L., & Dowling, M. (1995a). Impact
of the Interactive Mathematics Program on the retention of underrepresented
students: Class of 1993 transcript report for school 1: Brooks High School
(Project Report 953 from the Interactive Mathematics Program Evaluation
Project). Madison: University of WisconsinMadison, Wisconsin Center for
Education Research. Webb,
N. L., & Dowling, M. (1995b). Impact
of the Interactive Mathematics Program on the retention of underrepresented
students: Class of 1993 transcript report for school 2: Hill High School
(Project Report 954 from the Interactive Mathematics Program Evaluation
Project). Madison: University of WisconsinMadison, Wisconsin Center for
Education Research. Webb,
N. L. & Dowling, M. (1995c). Impact
of the Interactive Mathematics Program on the retention of underrepresented
students: Class of 1993 transcript report for school 3: Valley High School.
Project Report 955 from the Interactive Mathematics Program Evaluation
Project. Madison: University of WisconsinMadison, Wisconsin Center for
Education Research. Webb,
N. L. & Dowling, M. (1996). Impact
of the Interactive Mathematics Program on the retention of underrepresented
students: CrossSchool analysis of transcripts for the class of 1993 for three
high schools. Project Report 962 from the Interactive Mathematics Program
Evaluation Project. Madison: University of WisconsinMadison, Wisconsin Center
for Education Research. Webb,
N. L., & Dowling, M. (1997). Comparison
of IMP students with students enrolled in traditional courses on probability,
statistics, problem solving, and reasoning (Project Report 971 from the
Interactive Mathematics Program Evaluation Project). Madison: University of
WisconsinMadison, Wisconsin Center for Education Research. Webb,
N. L., Schoen, H., & Whitehurst, S. D. (1993). Dissemination
of nine precollege mathematics instructional materials projects funded by the
National Science Foundation, 198191. Madison: University of
WisconsinMadison, Wisconsin Center for Education Research.
Number of Teachers, Classes, and Students by Curriculum, Grade, and School for the Original Study (199596)
Number of Teachers, Classes, and Students by Curriculum, Grade, and School for the Replication Study (199697)
Number and Percentage of Students by Gender and Course for Grade 9
Number and Percent of Students by Ethnicity and Course for Grade 9
Number and Percent of Students by Ethnicity and Course for Grade 10
Summary Statistics on the Statistics Test by Course and by School (Grade 9)
^{a} Traditional grade 9 math course supplemented with statistical concepts. ^{b} p < .01 ^{c} p < .01 (Data from schools 3 and 5 were combined for this analysis.)
Analysis of Covariance: Test Score by Grade 9 Course
Summary Statistics on the Statistics Test by Course for Grade 9
^{a }Note: The expected mean value or score for the test was 2.50. This value is estimated using the following formula: M_{t} = np, where n is the total number of items in the test (n = 5) and p is the optimum item difficulty level for the items in the test. The optimum difficulty level for an open response item is 0.50. ^{b} Items were not positively related with each other resulting in a negative reliability. Item Difficulty on the Statistics Test by Course for Grade 9
Summary Statistics on the Performance Assessment Test by Course and by School (Grade 10)
Analysis of Covariance: Test Score by Grade 10 Course
Analysis of Covariance: Score Obtained on Connecting Nodes by Grade 10 Course
Analysis
of Covariance: Score Obtained on New
Cubes by Grade 10 Course
SAMPLE COVER SHEET ONLY
Student's Name:
Class Period:
Mathematics Course
Title:
Optional Ethnicity:
(circle one) (1)
American Indian or
(4) Hispanic
Alaskan Native (2)
Asian or
(5) White
Pacific Islander
(not of Hispanic origin) (3)
Black
(6) Other
(not of Hispanic origin) This test consists of four questions. Write your name at the top of each page.
For each question, find the answer and then write your answer on the line
provided. Each question will be scored as right or wrong. The total score for
the four problems will be the total number of questions you answered
correctly. You are free to show your work and write on the pages even though
only your answer will be scored. You can use a calculator. You may answer the questions in any order. If you have difficulty answering
one question, go to the next question. Go back to any unanswered question and
do your best to determine an answer. Please check all of your answers before
returning the test booklet to your teacher. Please complete the information above before beginning. Name
In the graph, rainfall (in centimeters) is plotted for 13 weeks. What was the
approximate average weekly rainfall during the period?
Answer 1:
Name
Answer 3: New
mean:
New standard deviation:
Answer 4:
Scoring Key
Five Modified SIMS Statistics Items 
