Student achievement: Improving it through assessment

The following remarks about the importance of assessment in improving student achievement were delivered by Texas 2036 President and CEO Margaret Spellings during her testimony to the Texas House Committee on Public Education on Aug. 9, 2022. 

Good morning, Mr. Chairman and members. Thank you for the opportunity to appear before you this morning, especially following Commissioner Morath who in my view is the strongest chief state school officer in the country because of his ability to stay focused on our urgent need to dramatically improve student achievement and his recognition of the critical role assessment and accountability play in achieving our shared goals. I am sorry I cannot be with you in person.

During my decades long career, I have had the privilege of working in service to many critical players in public education, including the Texas legislature and Governor’s Office on behalf of Texas school boards and at the federal level leading the U.S. Department of Education. It has been fascinating to be party to the calibration and recalibration of the roles these various actors play as we work to serve our students, taxpayers, and communities.

What I have learned — sometimes at the school of hard knocks — could fill volumes, but I want to focus on a few key points about assessment today.

Our economy relies on out of state talent to meet its workforce needs.  We need to develop our own Texas students so that they can be part of our economy for decades into the future. That requires knowing how our students and schools are doing so we can make smart investments. As I like to say we need to “care enough to find out” how all students are doing. With this information we, as state policymakers, can and must set priorities, allocate resources, and convey needs and challenges to parents and taxpayers.

For those who want to “measure what matters” — whatever that means — I ask them: What matters more than a student reading or doing math on grade level? How will we make the case for further and righteous investments when we cannot demonstrate clearly where we lead and where we lag? Why would taxpayers support funds for reading academies, for example, without the data that demonstrates a need? As I like to say, “if there is no problem, we don’t need a solution.” Without assessment, we lack the ability to make the case for parent empowerment through charters, or the need to reward teachers who take on the hardest work in schools who need them most.

At this juncture, it’s probably helpful to do a little table setting with a history lesson about Texas and its assessment system. Our work, like many states was spurred in the aftermath of the Nation at Risk Report, sadly quite relevant even today nearly 40 years later.

Prior to that report, states had nearly meaningless testing in reading and math and in Texas that was the TABS test, or Texas Assessment of Basic Skills. It measured quite low levels of educational attainment and aggregated data of all students together. We could tell ourselves that those in the middle of the pack were doing okay, when in truth, many children were being left behind. TABS didn’t tell us much since the test was only given in the 3rd, 5th, and 9th grades. In 1984, in response to the Nation at Risk Report and with the work of the Perot Commission, Texas expanded to annual assessment in reading and math and began to disaggregate data so we could have a clearer picture to better apply resources and solutions and understand the needs of every child and every unique population of students. But that test, called TEAMS, was ultimately too easy and over time it became less useful. Essentially, we tapped out.

Imagine you are training for a 10k; you know you will ultimately need to run about 6 miles, but you don’t start there. You start with a mile, get good at it and then continue to add distance. I know this is a silly and simple analogy, but, when you think about it, what we are collectively trying to do is move a large and complex enterprise forward over time while keeping political, taxpayer, parent, and educator support. A system that is too hard or too easy undermines confidence by all.

Okay, back to history. With each new test first TAAS, then TAKS, then STAAR, we raised the bar to ask more of our students and better capture the needs of our economy and workforce more accurately and to respond to improvements in testing itself to create a system that was understood to be achievable and supportable. Until STAAR, Texas had a history of pretty low standards. With STAAR, we are asking more of students and teachers to truly ready our young people for the world they will face.

The truth is, adjusting and modifying our system of measurements has happened many times over the years. In fact, you all have allocated $70 million dollars to upgrade these tests to help guide our $70 billion dollar public education system and you have just heard from the Commissioner about improvements that can be made now. That is appropriate and natural. But we must stay true to the need and centrality of these measures with the twin goals of stretching the system to better meet the needs of students and the economy and keeping the political oars in the water so we can row the boat forward with the needed taxpayer and institutional support.

As you know, in 2001, No Child Left Behind was enacted at the federal level. This law was in no small part based on the improvements, especially for our neediest students, that Texas was showing using these sound principles.

Here’s a look at the data from that period in Texas, with all due credit given to those in our classrooms at that time. These were days when leaders around the nation were looking to us to show the way for enhanced student achievement. The federal law, which I was privileged to help implement, required states to develop, often for the first time, annual assessment in reading and math and once in high school and report that data in a disaggregated fashion. It also required that states develop, and the feds approve, systems that had alignment between the curriculum standards, that is, what we wanted students to know and do and the measures against those standards. It required all states to participate in the NAEP to serve as a check on the quality of the state systems overall in exchange for the billions invested at the federal level. I mention this because strong alignment between curriculum and measurement help illustrate my next point and that is that good design of tests and the accountability systems that flow from them matter a lot.

As people who care about public education, we care about every students’ learning. The only way we can understand how much students learn is through assessments. The Legislature also needs information that is valid, reliable, fair, and comparable to make good policy decisions for Texas students.

The STAAR exams meets all these criteria, they are valid, reliable, fair, and comparable exams.  They are also an approved part of our plan under ESSA.

For example, the STAAR given one year is comparable to a test given in the prior year. And if that isn’t enough, in Texas we release the test items so parents, teachers and others have faith that the test is measuring what we want students to know, which increases our costs since new test items have to be created every year. Test items are typically field tested before they can be incorporated into the exam that will be used. None of this happens without the work of hundreds of teachers and thousands of hours to design the standards and to develop the tests.

It’s also worth noting what does not pass for good design. First, these are terms only psychometric experts can discern and endorse. Not me, not you, not your average superintendent. In addition, things like parent surveys or extracurricular participation do not constitute valid or reliable measures while they may be useful for other purposes. Tech based curriculum offerings that monitor progress along the way can also be useful of course but they are not a substitute for valid, reliable, and sound measures.

Great care should be taken with through-year or “rolling assessments.” While potentially useful, we must ensure these meet the same validity, reliability, fairness, and comparability features. Florida, for example, is beginning to implement a new law that would add this type of assessment to its existing summative system but know that this is more, not less, testing.

The STAAR exam is a strong assessment in large part because of the incredible amount of work Texas teachers put into reviewing the exam every year. Each STAAR question is reviewed by 16-20 teachers, meaning that every year over 3,000 hours are spent reviewing and editing STAAR questions. The redesign has also had significant teacher involvement, with over 5,000 teacher hours spent on making the STAAR more closely mirror the classroom experience.

This slide is an example of that principle. The standards, on the left of the slide, are written by a team of teachers, parents, and business leaders. Then approved by the SBOE. These standards are then tested on the STAAR exam. If a test is well designed, teaching to it, that is covering the material that will be on the test is not educationally unsound but, of course, kill and drill can overdo it. That’s why curricular rigor helps teachers stay focused on student achievement, over test taking strategies.

When Texas has held true to a philosophy that embraces “what gets measured gets done” coupled with resources for reforms that work, we move the needle for students. This approach bore fruit in the early 2000s as you can see from this slide. Sadly, and around the time of the financial crisis our progress slowed dramatically. There was more flex and less muscle in the accountability system, more fine print that was authorized by the feds and could be manipulated to paint a rosier picture of our schools that masked the underperformance of students. During my days as Secretary, Texas had some of the highest special education exemption rates in the country.

In short, we took our foot off the gas. But happily, with HB 3, motivated by the significant declines in Texas student achievement that preceded that new law coupled with a realization that other states were pulling ahead and making faster progress, Texas has begun to get back on track. I think we are all pleased with the recent STAAR results especially in light of COVID, but we certainly have much more to do.

Like you, I believe our state’s number one asset is our people. Today, we are falling woefully short of meeting the needs of students and families and ultimately of the needs of our state and its growing economy. Texas had enjoyed economic growth because we have attracted talented people from other places, but we must do better by our own students. Over the past several sessions, this body has taken significant steps to strengthen our assessment and accountability system. To aid this work, you also invested heavily in our public schools through House Bill 3. I urge you to hold the line on these reforms and continue the work you all have already started.

Without the measurement and data that assessment brings coupled with the motivation, incentive and yes, enforcement of strong accountability we can and have fallen behind. If ever there was a time to stay true to the principle of caring enough to find out all we can through strong assessment and accountability systems — this is it.

Thank you.

For more, check out a recording of the testimony on our YouTube channel or download the slide deck here.