Notes from the Field

Notes from the Field feature updates about assessment, standards, and accountability implementation from the field of K–12 education. These short posts are reflections of what CSAI staff have observed or encountered in their work with states and educators.

Recent Post: Collaborative Expertise - Sandy Chang - 10/24/2016



Collaborative Expertise - Sandy Chang - 10/24/16

How can we engage in and achieve transformational learning needed for the 21st century workforce? How do we create and utilize effective personalized and adaptive learning environments? How will we continue to innovate how we teach, learn, assess, and evaluate learning and teaching? These were just some of the issues the CRESST conference tackled during a two-day meeting that brought together educators, researchers, policymakers, ed tech executives, and other stakeholders interested in learning about ways the educational community can address its biggest challenges.


One potential answer to these complex yet thought-provoking questions? Collaborative expertise. John Hattie, in his keynote presentation, offered “ten big ideas in education” for improving student achievement (see the list below for all ten).1 Number 6: It is all about collective expertise in “knowing thy impact.” Collaborative expertise occurs when all sectors of the education system – from teachers and school and district leaders, to students and parents, to state leaders and policymakers – work together to form a community to share effective practices and policies. A critical aspect of collaborative expertise is ensuring that the practices and policies enacted are coordinated with strong research, development, and evaluation; that is, how we know that what we do has impact.


This idea of collaborative expertise weaves its way into every single one of Hattie’s ten big ideas.2

  1. Ask the relative question (not what works but what works best)
  2. Ask the upscaling question
  3. Every child deserves at least a year’s growth for a year’s input
  4. The major learning is between 0-2
  5. It is about how educators think
  6. It is about collective expertise in “knowing thy impact”
  7. Develop evaluative capacity
  8. Assessments is feedback to teachers about their impact
  9. Focus on learning not teaching
  10. We need to educate parents about learning


In particular, developing evaluative capacity (#7) and providing feedback to teachers about their impact (#8) struck a chord in me with the work we do at CSAI. Hattie said that building the expertise of educators as evaluators involves teachers working together to assess their impact, moving from what students know now towards explicit success criteria, and maximizing feedback about teachers’ impact. These ideas coalesce into an important point that assessments are feedback to teachers: once teachers and students understand their expectations of learning, we need to provide teachers with valid and reliable assessments and evaluative processes and tools to help them set, measure, and gauge these expectations. Collaborative expertise – teachers, students, parents, test developers, and education leaders, to name a few – is needed to accomplish such goals.


Collaborative expertise was on display at the 2016 CRESST conference, where a confluence of experts in research, policy, and technology met with school, district, state, and national leaders in education to share their collective knowledge in solving educational problems and creating solutions. Collaborative expertise is also at work within the comprehensive center network. Recently, in speaking with program officers from U.S. Department of Education, this concept came to my mind as we discussed ways that the network collaborates to assist states with their educational needs. CSAI, as a content center, helps teachers – and those who support teachers from the regional comprehensive centers (RCCs) and state education agencies (SEAs) – evaluate how effective their teaching strategies are by providing tools, training, and resources that are aligned to state college and career ready standards, which are aligned to valid and reliable assessment practices. With the passage of the Every Student Succeeds Act, CSAI is working with RCCs and the states in their region to provide technical assistance and research support to help inform decisions about assessment and accountability systems. CSAI is part of network of centers that is implementing collaborative expertise: working together toward a goal, knowing our impact, and reacting accordingly.


1Hattie, the Director of the Melbourne Educational Research Institute at the University of Melbourne, is probably best known for his book Visible Learning, in which he conducted a meta-analysis of over 800 studies to examine the influences on student achievement. The meta-analysis revealed that learning and teaching strategies and teacher and student self-efficacy matter more to achievement than structural and programmatic factors.

2To read a paper Hattie wrote on collaborative expertise, visit


What ESSA Did on Summer Vacation - Augustus Mays - 9/15/16

Augustus Mays is the Director of Government Relations for the Office of Policy and Communications at WestEd and serves as WestEd’s liaison to Washington, D.C.–based professional associations, education policy think tanks, and related groups. 

With the atypical presidential campaign and the Olympics grabbing everyone’s attention this summer, you may have missed that the U.S. Department of Education (USED) keeps chugging along in its effort to release guidance and regulations on various aspects of the Every Student Succeeds Act (ESSA) by year’s end. This post summarizes the most significant features of the draft regulations and highlights proposals of special interest to CSAI.

USED Issues Notices of Proposed Rulemaking

In July, USED issued two notices of proposed rulemaking (NPRMs) to implement provisions of ESSA regarding assessments for Title I, Part A, and Title I, Part B:

  • Title I, Part A. ESSA required USED to go through the negotiated rulemaking process primarily on two sections of the law — Title I, Part A, assessments and “supplement, not supplant.” Through that negotiated rulemaking process, negotiators were only able to reach agreement on the assessment issues, which provided clarity and guidance on issues such as computer-adaptive tests, how tests for English language learners and students in special education should work, how high schools can offer local exams in lieu of state exams, and how advanced math testing for eighth graders should work.
  • Title I, Part B. The second NPRM touches on the ESSA “innovative assessment demonstration authority” under Title I, Part B. The law allows a small handful of states (initially up to seven) to permit districts to use local tests, such as a system of competency-based assessments, in lieu of the state exam, as long as those districts are trying out a system that will eventually go statewide. ESSA also places a number of stringent conditions on the pilot — what the law’s architects like to call “guardrails.” These guardrails are aimed at ensuring that the new kinds of tests that states develop are of high quality, and that all kinds of students — including English language learners and students in special education — have access to them.

Response to USED’s Draft Rules on Accountability

In addition to the two regulations on assessments, USED also released proposed regulations on accountability systems, school improvement systems, data reporting, and state plans. The agency gave educators, advocates, and others until August 1 to comment on the proposed rules.

What was the education community’s response? USED received nearly 21,000 comments in response to the proposed regulations. The following are some general themes to the education community’s response to USED’s proposed accountability rules:

  • Many groups, such as CCSSO and NASBE, don’t like that the proposed rules require schools to be identified for improvement based on data from the 2016–17 school year. These groups are asking USED to allow schools to first be identified for improvement in 2018–19, based on 2017–18 data.
  • A number of groups (e.g., AASA) are not too happy with the requirement for a single summative rating for schools, which many argue would lack nuance in describing the overall performance of a school.
  • Some civil rights groups are concerned that the department’s proposed menu of ideas for states on how to pinpoint which schools have groups of students that are “consistently underperforming” are not prescriptive enough. For example, the Education Trust wants the final regulations changed to require that “consistently underperforming” subgroups of students be defined by their progress toward state goals, not by how they perform compared to the statewide student average.
  • Many groups want to see the proposed rules take a lighter touch in relation to how states report test-participation numbers (95% rule) for accountability purposes.

So What’s Next?

USED will now consider public comments on the proposed regulations for assessment and accountability, and will potentially make changes based on this feedback sometime in the fall. As new developments emerge, CSAI will update the “Communications from the U.S. Department of Education” collection. 



Next-Generation Assessments for Next-Generation Standards - Andrew Latham - 7/15/2016

At the National Conference on Student Assessment in Philadelphia this June, I was struck by the number of presentations that focused on how we teach and assess K–12 science. Much of the conversation focused on the Next Generation Science Standards (NGSS),1 released in 2013, which present a comprehensive, integrated vision for science education. Specifically, each NGSS standard weaves together three dimensions—Disciplinary Core Ideas, Scientific and Engineering Practices, and Crosscutting Concepts—to produce Performance Expectations for what students should know and be able to do.2 If we are to realize a rich vision for how students learn science, as expressed in A Framework for K–12 Science Education: Practices, Crosscutting Concepts and Core Ideas (NRC, 2012),3 we will need to figure out how to assess students’ learning across these integrated dimensions in valid and reliable ways, at a time when there is tremendous pressure to reduce the amount of time and resources spent assessing students.4

Many of the conference attendees were eager to learn from their peers in states that have already begun to develop and implement NGSS assessment solutions. The conference included presentations by Kentucky, Maryland, Oregon, and Washington,5 the latter two on behalf of a 14-state collaborative that has developed two prototype NGSS assessment tasks, referred to as “item clusters.”6 After attending these sessions as well as an invitational focus group of a half-dozen states interested in next-generation science assessment, I was struck by the following observations:

  1. It is hard to imagine any scenario in which a traditional assessment of independent multiple-choice questions can measure three-dimensional standards such as the NGSS. Instead, the NGSS would seem to require some type of integrated scenarios or tasks requiring students to connect their thinking across nuanced ideas and observations, as called for in Developing Assessments for the Next Generation Science Standards (NRC, 2014).7
  2. While integrated tasks tend to elicit more authentic scientific thinking than traditional assessment questions do, they also tend to be more expensive to develop and more time-consuming to administer. Additionally, if they require open-ended responses, they typically must be scored by hand, so next-generation assessments of this type will likely pose significant pragmatic challenges if they are to be administered on a large scale within a state. However, when I asked the Oregon and Washington presenters whether they felt that they could implement this type of assessment on a large scale, they both emphatically answered that they could, and that they are currently exploring a variety of approaches. One possibility would be to use a matrix sampling approach so that each individual student is tested on a subset of the standards but, collectively, all of the students in the state are tested on most of the standards. This approach provides a comprehensive statewide view at a fraction of the cost and testing time that would be required if every student were to be tested on every standard.
  3. No single NGSS assessment will work across all states, because science is inextricably influenced by local cultural mores. For example, one state representative said that, although his state was a strong advocate of the NGSS, it would likely reject any questions related to evolution and climate change. Presumably, additional scientific topics will be politically sensitive in other states. Moreover, states vary in which grades are tested, their interpretations of reporting expectations, and opinions on how to bundle Performance Expectations, all of which will pose significant technical hurdles for any collaborative that is trying to develop a single assessment that will work across states. So, while the relative expense of developing three-dimensional science tasks may provide states with an incentive to pool resources to jointly develop content, each state will likely need some freedom and leeway to adapt or refine some percentage of the content that appears on its state-specific assessment. This will make comparing results across states more challenging.

Interest in how to implement and assess the NGSS is not new. In September 2013, the K–12 Center at ETS hosted an invitational research symposium on next-generation science assessments, which brought national and international researchers together with state science staff to discuss the latest relevant research and its implications for future science assessment. Over the past three years, a number of states have made considerable strides in moving from research to initial implementation, but substantial challenges remain. How well these states succeed, and the lessons that they learn along the way, should help illuminate and inform the future, not only for science assessment, but for assessment of all content areas that seek to integrate the complex ways in which students learn content and apply their knowledge.


1NGSS Lead States. (2013). Next generation science standards: For states, by states. Washington, DC: Achieve, Inc.

2For more background and detail on the NGSS, see

3National Research Council (NRC). (2012). A framework for K–12 science education: Practices, crosscutting concepts, and core ideas. Committee on Conceptual Framework for the New K–12 Science Education Standards. Board on Science Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

4See, for example, President Obama’s testing action plan, released in October 2015.

5The following are links to the states’ NCSA presentations: KY, MD, and OR & WA.

6The interactive demo of the Grade 5 item cluster prototype can be accessed here.

7NRC. (2014). Developing assessments for the Next Generation Science Standards. Committee on Developing Assessments of Science Proficiency in K–12. Board on Testing and Assessment and Board on Science Education, J. W. Pellegrino, M. R. Wilson, J. A. Koenig, & A. S. Beatty (Eds.). Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.



The Feds Want to Know - Joan Herman - 6/15/2016

Unless you have been living under a rock for the last several weeks, you know that the Department of Education has issued for public comment draft rules for implementing the Every Child Succeeds Act’s (ESSA) accountability and state planning provisions (see the Proposed Rule from the Federal Register). You also probably have read about the major requirements these rules encompass with regard to:

  • Establishing a single, state-wide accountability system, including the indicators and measures that the state will use to differentiate (in a single metric) the quality of all public schools and to identify schools for comprehensive or targeted support and improvement;
  • Developing and implementing evidence-based improvement plans for these identified schools;
  • Developing – with parent involvement – and making available and accessible state and local report cards, with specified content elements and timeliness requirements; and
  • Creating consolidated state plans, including specification of content, format, and deadlines for submission (March or July, 2017).


See, for example, ESSA’s general rules on accountability and specific pieces of ESSA’s accountability draft rules.


But you might have missed that in addition to requesting your comments on the proposed rules, the Department also is asking for your help in refining and creating additional rules in five specific areas.

  1. Do the suggested options for identifying “consistently underperforming subgroups of students,” as proposed in §200.19, result in meaningful identification? If not, which should be deleted and/or what additional options should be considered?
  2. Should additional or different options be proposed for dealing with low participation rates in schools that do not assess at least 95% of their students, so that parents and teachers will have the information they need to evaluate and support students’ academic progress (§200.15)?
  3. Would states be better able to support English learners if proposed regulations include a maximum state-determined timeline for English learners achieving English language proficiency? If so, what should that timeline be, based on available research (§200.13)?
  4. Keep, modify, or eliminate the draft provision that allows a student who was previously identified as a student with a disability under IDEA, but who no longer receives special education services, to be included in subgroup calculations for school academic achievement indicators? If so, should such students be permitted to be counted in the subgroup for up to two years or for a shorter period of time (§200.16)?
  5. Should there be standardized criteria for including children with disabilities, English learners, homeless children, and children who are in foster care in their corresponding subgroups within the adjusted cohort graduation rate? If so, what are ways to standardize these criteria (§200.34)?


Have ideas or comments on any of these issues? Submit your comments by August 1, 2016 to


Getting It Right This Time - Deb Sigman - 5/12/2016

Earlier this year, I had the pleasure of attending the Accountability Systems and Reporting (ASR) State Collaborative on Assessment and Student Standards (SCASS). This group, one of several sponsored by the Council of Chief State School Officers (CCSSO), supports the work of state education agencies (SEAs) as they navigate policies related to educational standards, assessment, and accountability. As a former state testing director, I always found the SCASS meetings to be extraordinarily helpful and to be a safe, collegial place where SEA staff could celebrate our state successes and commiserate about the ever-changing landscape of education policy.


It had been about three years since I had attended one of these meetings. Much has happened in the area of education standards, assessment, and accountability during that time. The assessment consortia delivered their first operational assessments, every student was to achieve proficiency by 2014, and states with federal accountability waivers were continuing down a path of uncharted territory with varying degrees of success. Most notable was the December 2015 reauthorization of the Elementary and Secondary Education Act, meaning the end of No Child Left Behind (NCLB) and the beginning of the Every Student Succeeds Act (ESSA). As I listened to the ASR participants, I noticed two things. First, out of the 100 or so SEA assessment/accountability personnel in attendance, the vast majority had been in these roles for less than three years. Second, those in attendance, no matter how seasoned, seemed concerned, but hopeful, about ESSA.


These SEA personnel, who are at the front line of implementing ESSA, generally seemed grateful for the new education law, which lifts a very tightly constrained set of regulations around standards, assessment, and accountability. The few veteran staff in the audience agreed that the reauthorization was long overdue, and were anxious to move beyond NCLB, but along with this renewed sense of vigor came a sense of apprehension and reticence. Why is it that after a decade of hoping for more flexibility and autonomy, fully embracing the flexibility and autonomy that is offered seems slow to take root?  Let me offer this analogy. Most of us have experienced the relief of finally getting rid of an electronic device that we’ve held onto for far too long. It’s old, out-of-date, and no longer functional. But once it’s gone, a sense of nostalgia, safety, comfort, lingers. We knew the old device, we knew how to make it work (even if it was not very functional), we knew its good qualities and its idiosyncrasies. We could become frustrated by its dysfunction, but we knew the work-arounds. Then we experience the excitement of getting a new device. The new device is great and we’re excited to see what it can do. But there’s also a reticence. We don’t know this device, and we’re not sure about all its functions and how to make it work seamlessly. We may be hesitant to take full advantage of what the device has to offer, particularly before we become very familiar with the explicit instructions. And so it goes with ESSA. For a number of years, states, districts, and schools recognized that NCLB was out-of-date, had ceased to be a positive influence, and essentially, not functioning in the way that had been intended. As a state testing director, I often heard from district personnel, “NCLB is too compliance-driven; schools can’t possibly educate all of their students under the constraints of NCLB. Schools and districts would be able to do things better and more efficiently if school and district personnel had more control over the system and the money.” But now that NCLB is gone and ESSA is here, just like with new devices, there is a reticence to take full advantage of what this law offers until we know the exact instructions (regulations).


During the NCLB era, states and districts could point to the constraints and dysfunction of NCLB as the “enemy,” responsible for keeping states from making effective education policy decisions. Now states will have to make and own the decisions they make about standards, assessment, and accountability. While the provisions around standards, assessment, accountability, and interventions are still very present in ESSA, the balance of power has shifted; states will be able to maximize the flexibility offered in the law to increase access and equity for all students. It will be difficult to argue that federal law is at fault if the education policy decisions made by states and districts do not lead to educational successes.


With great autonomy and flexibility comes great responsibility. Assessments and the accountability systems that utilize them have the potential to be change agents that advocate for children and families who may not be able to advocate for themselves. I’m hopeful, now that ESSA is here, that the education community will be able to strike a balance between accountability and fundamental principles of equity and fairness, to learn from NCLB and its predecessors, and to make good on the promise that all children will learn. Let’s make sure we get it right this time.


Maximizing Assessment Value While Minimizing Administrative Burden - Andrew Latham - 4/11/2016

Last October, President Obama announced the Testing Action Plan, an initiative encouraging states to evaluate and refine their statewide assessment accountability systems. The plan outlines many principles for creating and implementing high-quality state assessments, including the recommendation that states “place a cap on the percentage of instructional time students spend taking required statewide standardized assessments to ensure that no child spends more than 2 percent of her classroom time taking these tests.”


Who could argue against such a laudable goal? The ultimate intent of assessment and accountability systems is to ensure that students learn more, so the less time they take away from classroom instruction, the better. Nonetheless, test reduction initiatives can pose significant challenges for states. The Center on Standards and Assessment Implementation (CSAI) has begun helping these states conduct audits of their assessment programs to see where they can effectively be scaled back.


The core challenge of reducing testing time is that newer standards, such as the Common Core State Standards (CCSS) and Next Generation Science Standards (NGSS), seek to establish deeper, more complex learning goals than previous standards. But measuring such learning typically requires more testing time, not less. A traditional multiple-choice vocabulary question might provide a word and then list five possible answer options, one of which is a synonym. In tests striving to measure the CCSS in English language arts, which look for students to derive meaning from text, that vocabulary question might consist of a relatively brief reading passage containing the vocabulary word in question, and ask the student to define that word based on the contextual clues from the passage. Because the latter question requires students to read a passage, it will take longer to administer.


So how are states supposed to reduce their required testing time while measuring student learning more deeply? Since the Testing Action Plan was released, CSAI has met with more than half a dozen states to discuss their options. The most interesting theme to emerge from these conversations is that most states know relatively little about what additional assessments are being required locally at the district and school levels. While the Testing Action Plan specifically refers to state-required assessments, any thoughtful reduction in testing time must also take into account local requirements, many of which can often far exceed state mandates.


As an initial step, CSAI helps states collect data in two waves. First, we work with states to conduct an inventory of the assessments currently in use at both the state and local levels. This is trickier and more time-consuming than it sounds. In some of the larger districts, local assessment requirements can vary widely from school to school within the district, so capturing accurately all the tests in use at schools throughout the state requires a considerable effort. To counter this problem, one state (with a small number of districts) we are working with has elected to meet with the assessment director in each district individually to fill out the inventory together. This method will not only be likely to yield more accurate information, but it also serves as an opportunity for the state to demonstrate how serious it is about reducing test burden on students. It is also a very time-consuming data collection method, and thus would likely not be viable in states with larger numbers of districts.


Once CSAI inventories the assessments being used across the state, we then survey the schools to see how the assessment results are being used, and to capture the teachers’ impressions about how useful the results are in helping them tailor their instruction to maximize student learning. After collecting and analyzing this usage and utility data, we then help the state conduct focus groups to delve more deeply into the use and benefit of each assessment. During these conversations, we look to discover redundant assessment systems, where the purposes of and information provided by assessments overlap unduly. Such instances are not uncommon.


Our first two state audits are well underway, and we anticipate completing the initial data collection and analysis this spring. From these first two efforts, we plan to distill themes and guidelines for other states to consider as they evaluate their own programs. Every assessment will impose some level of administrative burden, and each should also provide some value for student learning. Our goal in conducting these assessment audits is to help states identify the point at which they can maximize value while minimizing burden in terms of lost instructional time.


To learn more about the results of our first two assessment audits in response to the Testing Action Plan, please join us at the National Conference on Student Assessment in Philadelphia on Monday, June 20, at 4:00 pm. CSAI Director Andrew Latham will be joined by staff from the West Comprehensive Center and the U.S. Department of Education in a session titled The Two Percent Solution: Weighing Value Against Burden in Statewide Standardized Assessments.


Support for the Common Core: The Silent Majority - Joan Herman - 8/5/2015

Controversy over the Common Core! Parents opting out in growing numbers!  Congress clashing over standards and testing in the reauthorization of No Child Left Behind! Turmoil, discord, conflict abounds. Browse recent headlines and media accounts, and the situation indeed is looking dire for current initiatives to support students’ achievement of college and career readiness.


But wait! Last week, two interesting news pieces on standards and assessment provide quite a different perspective.


First, a very interesting infographic from Mission: Readiness: Military Leaders for Kids summarized state reactions to the Common Core. Mission: Readiness, as its name implies, brings together retired generals, admirals, and other senior military leaders to advocate for policies that support children and the military. Five Years After the Common Core: The State of the States makes clear that despite the fireworks of the past year, the Common Core is secure as of the end of the 2015 state legislative season. Of the 19 states in which legislation to repeal the Common Core was introduced or attempted, none succeeded, and little meaningful change is expected in the three states where the standards are under review.1 Sure, a few states may have nominally created their own standards, but look at what they created, and the Common Core rings through. Mission: Readiness declares the Common Core debate “firmly settled.”


A second item worth underscoring comes from advance results of EducationNext’s 9th national survey on public attitudes toward testing and the opt-out movement. The survey’s nationally representative sample of 700 teachers and 3,300 adult members of the general public were asked, “Do you support or oppose the federal government continuing to require that all students be tested in math and reading each year in grades 3-8 and once in high school?” Two-thirds of the public and a similar percentage of parents responded affirmatively, with only 21% opposed to annual testing. Teachers’ views were more mixed: 46% opposed annual testing—perhaps understandably if test results are likely to be used against them. 


The opt-out movement also garnered relatively little support when survey respondents were posed the question:

Some people say that ALL students should take state tests in math and reading. Others say that parents should decide whether or not their children take these tests. Do you support or oppose letting parents decide whether to have their children take state math and reading tests?

A quarter of the public supported the idea, 59% opposed it, and the remainder were neutral. Parents and teachers were slightly more positive, with 32% favoring the idea, but the clear majority were opposed.


I’m not sure the debate on these issues is fully over, as Mission: Readiness concluded about the Common Core. Surely some skirmishes remain. However, I do hope that with the close of the legislative season and its controversies and with the start of a new school year, states and districts can focus their full attention on strategies that will support their schools’, teachers’, and students’ success. Let’s be sure we build a strong bridge from “debate over” to “mission accomplished” for kids and help teachers help students be prepared for college and career success.


1The infographic is based on analysis from the Collaborative for Student Success.

A Look at the EdReports Reviews - Glory Tobiason - 6/2/2015

Measuring the moving parts of our educational system is hard. I spend my days knee-deep in the subject, sleeves rolled up, hands dirty in nitty-gritty methodological details of education research, and things are just as complex from this vantage point as they are from the classroom. A lot of research in education boils down to some form of the question “Is this going to work for students?” and recently this question, as it pertains to curriculum, has gotten quite a bit of attention.


EdReports: New Kid on the Block

As states have transitioned to the Common Core State Standards (CCSS), several organizations have positioned themselves to examine existing educational resources and ask, “How well does this resource support students in achieving the CCSS?” EdReports is one of the newest players on this field, having recently released a collection of reviews that analyze popular instructional materials.

The organization bills itself as a “Consumer Reports” for educational materials, one that aims to “provide free, web-based reviews of instructional materials focused on alignment to the Common Core and other indicators of high quality as recommended by educators, including usability, teacher support, and differentiation.” The initial suite of reviews focused on K-8 mathematics curriculum; additional suites that focus on secondary mathematics and ELA are in the works.

There’s a lot to like in the work of the EdReports team. They made teachers a central part of their evaluation process, which lends the reviews an unusual credibility. They made their evaluation criteria clear. And they showed remarkable integrity by posting “Publisher Responses,” letters in which many curriculum publishers commented on the review results and provided a rich description of the design, intent, and logic of their instructional materials. In general, there is a refreshing sense of enthusiasm and transparency in EdReports’ work, which focuses on critical issues for districts and schools who are faced with purchasing new resources or reviewing existing ones. But to understand what the evaluations do and do not tell us, we need to unpack a bit of the underlying methodology.


Embedded Value Judgments

Details of the rubric and process used to guide the review process are available here, but – in a nutshell – three guiding questions were used:

  1. Does the instructional material focus on the CCSS major work of the grade (on the order of 65-85% of class time over the year) and is it coherent?
  2. Does the instructional material meet the CCSS expectations for rigor and mathematical practices?
  3. Are the materials consistent with effective practices for use and design, teacher planning and learning, assessment, differentiated instruction and effective technology use?

What strikes me about these questions is that, taken together, they do a remarkably good job of bite-sizing a very complex concept: “quality” in instructional materials. In order to bound the scope of their work, however, EdReports did not weight all three questions equally, but rather decided that the first question was more important than the second, and that both of these were more important than the third. This ordering is wholly defensible (it mirrors the official publisher’s criteria for the CCSSM), but it’s certainly not the only option. If asked to prioritize the questions, different stakeholders would likely answer differently, depending on their beliefs, ideals, and interests. For example, one could argue that an integration of the math practice standards is just as important as a focus on the CCSS major work of the grade. Or that support for differentiated instruction and coherence across topics are equally critical.


Constraints in Available Information

What happened next is quite interesting and quite consequential: EdReports’ value judgment resonated methodologically in the use of sequential “gateways” for review. If materials didn’t pass muster at question 1, questions 2 and 3 were never asked. (The open letter published by NCTM and NCSM does a nice job of exploring this methodological issue in more detail.)

What this means is that this study of “the alignment and quality of instructional material programs” actually tells us very little about a program that has failed to pass Gateway 1. It could have failed at question 1, but done a stellar job at questions 2 and 3. Or it could have failed relative to all three questions.

In other words, the inferences we can make about a program that fails at Gateway 1 are limited. Such a program could have important strengths. It could do an unusually good job of integrating the content and practice standards, for example, or of incorporating effective designs to support teacher use. If this is the case, it may provide good resources for addressing at least some of the CCSS — but we can’t differentiate such a program from one that fails on all three counts.  



Things are changing fast in our education system, and the adoption of the CCSS is a major driver of this change. Several weeks ago, my colleague Andrew Lantham wrote about how this change is playing out for student assessments [see below: “Assessing the New Tests - 3/10/2015”], and the EdReports reviews call our attention to this change, relative to instructional materials. Regardless of whether we’re talking about tests or curriculum, pedagogy or policy, the essential question remains: “Is this going to work for students?”

The EdReports reviews bring a particular perspective to this question; other reviews will bring different ones. The way I see it, the real value of EdReports’ work lies in the conversations it has sparked: it has forced us to think long and hard about what constitutes “quality instructional materials” and how to prioritize different dimensions of this multifaceted concept.  Moving forward, I suggest we follow the lead of Eric Hirsch, executive director of EdReports, and “value the diversity of expertise in the space.”


Glory Tobiason is a UCLA doctoral student in social research methodology and a member of the CSAI team. She has a background in formative assessment and mathematics curriculum, as well as nine years of experience as a teacher of ESL, EFL, and math. Her current research focuses on strategies to effectively communicate technical education research to parents, teachers, administrators, and policymakers.


Good News about Higher Standards - Joan Herman - 4/27/2015

The media is full of news of the controversies surrounding the Common Core State Standards. Polemics and partisanship run high. Less prominent in the public dialogue is recent evidence about what happens when states and schools adopt higher expectations and implement higher standards for college and career readiness.  Consider these recent findings.

  • Positive relationship between Common Core implementation and NAEP performance. The Brown Center’s used NAEP data for 2009-2013 and two indices of Common Core implementation to examine how student performance in states with strong implementation compared to that in states that did not adopt the Common Core. Findings for both fourth grade reading and eighth grade math found small, statistical differences favoring states with strong Common Core implementation. (See Tom Loveless, March 24, 2015, Measuring the Effects of the Common Core.)
  • Positive relationship between Common Core and ACT scores. AIR researchers analyzed data from Kentucky, the first state to adopt the Common Core, to examine students ACT performance, prior and subsequent to implementation. Results indicated that students in both high and low poverty schools subsequent to Common Core implementation showed higher improvement than students who were not engaged with the standards. (See Zeyu Xu & Kennan Cepa, March 2015, Getting College and Career Ready During State Transition Toward Common Core Standards.)
  • Positive relationship between deeper learning and student learning. College and career ready standards share with deeper learning a focus on students’ content mastery in English language arts and mathematics, as well as students’ ability to use their knowledge and skills to communicate effectively, reason with evidence, and solve complex problems. Researchers analyzed the performance of students who attended schools that were well implementing deeper learning with that of demographically similar students in comparison schools. Results showed that students in deeper learning schools surpassed their counterparts in learning, based on both a PISA-based measure and state mandated tests, and in inter- and intrapersonal skills and on-time high school graduation. (See AIR, Does Deeper Learning Improve Student Outcomes? Results from the Study of Deeper Learning: Opportunities and Outcomes.)
  • College expectations and access to rigorous coursework positively related to college-ready performance. Advanced Placement (AP) classes represent college-level expectations, and students who score 3 and above on AP exams can earn college credit. Over the last 10 years, Florida has made concerted efforts to increase its students’ participation in AP and has a far higher proportion of students - including low-income students - participating in AP, compared to the national average. Thirty percent of Florida’s 2014 graduates earned a score of three or more on at least one AP exam, ranking Florida third among states. Surpassing Massachusetts, Florida’s AP performance was only slightly bested by that of Maryland and Connecticut, states with considerably higher SES and lower diversity than Florida. Florida’s improvement in AP performance, from 16.3% of students with passing scores in 2004 to 2014’s 30% rate, was second only to Connecticut. (See Leslie Postal, Orlando Sentinel April 8, 2015, Florida Ranks Third for AP Success.)

Although none of these studies can prove that higher standards and expectations increase student learning (and in the main, they reveal only small effects), they are results that are highly promising. The consistency of positive results across multiple quasi-experimental studies using a variety of outcome measures is particularly heartening. Minimally, these study results suggest that implementing new, more rigorous standards and engaging students in rigorous curriculum and instruction does not disrupt or impede student learning, as some have feared. Future research clearly is needed to reach greater clarity on the effects of current initiatives, and as we move forward, let’s keep our eye on the evidence.


Assessing the New Tests - Andrew Latham - 3/10/2015

The era of Common Core testing is upon us. Many Smarter Balanced Assessment Consortium (Smarter Balanced) and Partnership for Assessment of Readiness for College and Careers (PARCC) states are preparing to launch their operational assessments either this spring or next. Both development efforts were ushered into existence with great fanfare and a staggering $330M of federal funding in 2010, with the promise that these consortia would produce “next-generation assessments” that would dramatically improve how we measure students’ readiness for college and careers. The closer we get to implementation, the more charged the rhetoric has become, with strident battle lines being drawn between supporters and detractors, and very few ambivalent people left in the middle. This article looks at some of the more contentious points being debated, and analyzes a few released Smarter and PARCC questions, to get a sense of how different and effective the new tests may be.


But first, a disclaimer. My company, WestEd, has completed project management work for Smarter, and test development work under subcontract to PARCC, so I am not an objective observer. In general I support the Common Core State Standards, and I support the notion that pooling resources and working towards a single set of challenging standards has the potential to lead to much more innovative and accurate measures of student learning than have been available in the past. This does not make me a blind adherent, however; clearly both programs face considerable challenges, and not even the most staunch advocate would predict a smooth launch and seamless transition to the new assessments.


On to the tests themselves. To begin with, let’s debunk the tired next generation moniker, because it implies a degree of homogeneity in the current generation of assessments. Prior to the introduction of the Common Core, when almost all states were developing their own state standards and assessments, the rigor and quality of both were all over the map. One 2014 study estimated that state standards varied by as much as three to four grade levels between the least and most rigorous states. And the tests developed to measure these standards ranged from traditional paper-and-pencil, multiple-choice tests, to computer-adaptive instruments with significant percentages of open-ended questions that required students to construct their responses. For this reason, any comparison of old with new glosses over the significant variability among the old tests, and must be interpreted with caution.


Both Smarter Balanced and PARCC are designed for computer delivery, significantly expanding the types of items that can be administered and scored automatically in real time. Consider the following Grade 3 Math item from PARCC:



On a paper-and-pencil test, a comparable item most likely would contain a number line with 5 dots at various plausible points and ask the student to identify which dot represents a specific fractional value.By guessing alone, the student would have a 20 percent chance of answering correctly. To be sure, the traditional test might also ask the student to draw three dots representing the same three fractional values shown in the PARCC item, but if it did, that task would need to be hand-scored, increasing the expense of administration, the possibility of errors in scoring, and the time required to report scores. Moreover, note that the PARCC question probes the student’s depth of understanding in a manner consistent with the intended richness of the Common Core State Standards. Instead of presenting a single fraction, it presents three. All three have the same denominator, so students are encouraged to recognize fraction patterns and relationships (i.e., if they all share a denominator, the value increases as the numerator increases). One of the fractions is also a whole number. In this way, one relatively straightforward PARCC question provides a much more nuanced measure of student understanding than a simpler, more traditional paper-and-pencil variant might.


Now let’s look at a fourth-grade reading task from Smarter Balanced:


Read the sentences from the passage. Then answer the question.

“My grandma pulled the ball out, unwrapped it, and held it out for us to see. The ball was scarred almost beyond recognition. It had dog bite marks, dirt scuffs, and fraying seams. Right in the middle was a big signature in black ink that I had somehow overlooked. It was smudged now and faded, but it still clearly said ‘Babe Ruth.’ I began to shake inside.”

Click on two phrases from the paragraph that help you understand the meaning of scarred.




The Common Core State Standards for English language arts (ELA) emphasize the ability to derive meaning from texts as essential for eventual college and career success. A more traditional question might simply ask the student to select the correct definition of “scarred” from five options, again opening the door to guessing, but also measuring a student’s vocabulary more than his or her ability to interpret context clues. In this example, the student isn’t even asked to define the word, but instead must identify those specific areas of text that shed light on its definition. Guessing is largely eliminated, and the construct measured is textual interpretation, not vocabulary knowledge.


Speaking and listening skills also have a place in the Common Core. While the new tests have not yet made a concerted effort to address the speaking component, they do include listening tasks. One set of sample tasks from Smarter Balanced involves students watching a brief video on their computer about astronauts and gravity, and then answering a few informational questions from the video. At first, I was skeptical. Is pulling information out of a video really a necessary college and career skill? On further reflection, I realized I was probably being a Luddite. When my two high school children conduct research for a project, they don’t go to the library, they go to the Internet. For better or worse, video appears to be the ubiquitous medium through which the current generation of students will access and interpret information. In addition, recent research on the use of video in the classroom has shown that watching short, dynamic videos provides a sensory experience that produces a greater depth of understanding of new concepts and ideas. Using videos and other media in assessments allows for a deeper measure of the comprehension, through a more immersive experience.


Among the many detractors of the new tests, one popular refrain is that they still contain lots of multiple-choice questions, so by definition they aren’t really measuring higher-order thinking skills. But while multiple-choice questions are clearly more limited than open-ended ones, it’s a mistake to think they can’t measure complex reasoning or skills, particularly when coupled with open-ended questions.This type of coupling is common among the Smarter Balanced- and PARCC-released questions. A typical example item might ask the student to select the main idea of a passage from among five options, then follow it up with an open-ended question requiring the student to identify the sections of the text that illustrate the main idea. To receive full credit, the student must provide a reasoned, text-based rationale in support of the multiple-choice selection.


It’s too easy to be dismissive of multiple-choice items. To be sure, relying solely on multiple-choice would limit the depth of knowledge that can be assessed. But the new tests don’t rely on multiple-choice questions alone; they integrate multiple-choice questions with healthy portions of constructed-response questions that take advantage of various technological innovations. And there are many pragmatic reasons why multiple-choice questions have dominated the first century or so of standardized assessments: they are far less expensive to develop and validate; they require much less student testing time to complete; they can be scored automatically and quickly; and they tend to produce more reliable scores than open-ended items. Moreover, multiple-choice items provide the most efficient and valid way to measure student understanding at the foundational learning levels. The trick, then, is to provide an integrated mix of multiple-choice and open-ended tasks that try to balance all the pressures of providing rich, authentic measures of student learning with the need to hold rising test costs and student testing time in relative check.


Perhaps the most interesting tasks in the new assessments are the performance tasks in both math and ELA, which can take anywhere from a half hour to two hours to complete. These tasks typically require students to interpret multiple resources, sometimes using multiple media, to address a series of questions that require them to demonstrate strategic and extended thinking by researching, hypothesizing, modeling, comparing, and/or problem solving. When most people think of next-generation assessments, they probably think of these types of tasks, which can and should be used to great effect when integrated seamlessly with instruction. But as promising as they are, when used in standardized accountability assessments performance tasks can rampantly increase testing time and be exceedingly expensive and difficult to score accurately and reliably. So the new tests must strike a balance. While both assessments include extended performance tasks for math and ELA, they do so in limited numbers—Smarter Balanced includes just one performance task for each content area.


Will the Smarter Balanced and PARCC tests justify their $330M price tag? We’ll find out soon, as the tests are administered for the first time, the data are analyzed, and teachers and schools have a chance to evaluate what the results tell them about their students’ learning.It’s certainly possible to produce rigorous standards and innovative computer-delivered tests at the state level—some states have already done just that. But few would argue that this high quality cuts across all state programs. The consortia tests represent an intriguing and expensive gamble, founded on the belief that by pooling resources and developing to a single set of high standards, states can disruptively innovate the depth and breadth with which we measure our students’ learning. It will be fascinating to see what the data tell us this spring.