Know Your Options: Phonics Interventions



While a comprehensive reading program has many critical components, we know that phonics and phonemic awareness are foundational. A kiddo cannot comprehend what they cannot read. Unfortunately, one of the major gaps in our school system today is lack of sufficient, explicit instruction in the foundational reading skills: Phonics, Phonemic Awareness, and Decoding.

If your child is struggling in these areas, you’ll want to look into intervention. If you have a formal diagnosis and an IEP, this can happen in a school setting. Alternatively, you may opt to hire a private tutor or even DIY. These are all valid options. 


But…


The intervention has to be GOOD. 


So how do we know which phonics interventions are better than others? Over the years, a few key elements have emerged as essential to a foundational phonics program. Phonics, phonemic awareness, and decoding skills need to be taught explicitly and systematically, following a scope and sequence that focuses on the individual sound ("phoneme") level, with the teacher modelling each concept, a high amount of student practice, and immediate corrective feedback.


Here's the problem.


Theoretically, many phonics programs have all these components. But in practice? Some of them are far more effective than others.


So what does the real-world evidence say?



REALLY Know Your Options:

What Does the Scientific Research Say?


Just because a phonics program has a strong theoretical basis, and is “aligned” to the science of reading, does not mean that it has evidence of effectiveness. Things that might sound great in theory, don’t always work out the way we’d hoped once we test them out in the real world.


When my own kiddos struggled to learn to read, I went on a deep dive into the research on the real-world effectiveness of phonics programs. And what I found wasn’t always pretty. 


What did I find? I’ll walk you through it all below.

 

I did the deep dive so you can do the snorkel.

(.... okay, let's be honest. It's more like a snorkel tour; it's an extended snorkel). NOTE: This blog post is still in draft form as I continue to review the research, but I'm making it available for those interested.

Understanding the Research

If you’re well-versed in reading academic journal articles, are familiar with scientific jargon, and already know all about the importance of statistical significance and effect sizes, and control groups, then feel free to skip this next part and hop down to my summary of the research in the Evidence of Effectiveness section, or the phonics program Table of Contents.


If you’re not familiar with these terms, you’ll definitely want to stick around for a minute, because these concepts are critical to understanding the research on the effectiveness of phonics programs. I’ll take you through each of these concepts. If you have some extra time, you might also find this video on Reading Program Effectiveness: Science vs. Snake Oil and this Teacher's Guide to Meta-Analyses quite helpful. 




Scientific Research Crash Course


Okay, there are a few terms that are important to know before diving into the scientific research on reading programs, and my summaries below: Control Groups, "Signifiance," and "Effect Size." Here comes a crash course.



Control Groups: What Are They, and Why Do They Matter?


When diving into the research on the evidence-base for any program, it is very very very important to check if the researchers compared their data to that of a “control group” or control population. Here’s why:


A Control Group is basically a group that did not receive the intervention being studied. The group who received the intervention (usually called the “treatment” group) is compared to the control group. Why is this important? 


If a study does not compare the progress of the students who DID use the program to the progress of similar students who did NOT use the program, there is no way to tell whether or not the program had an impact, or if the students would have improved without it. Sure, students who received the intervention might have improved, but did they improve more than students who used something else? Without a control or comparison group, we quite simply don’t know the answer to that question. Intervention students might have had less progress than kids who received regular instruction. Without a control, we just don’t know.


Let’s put this another way:


Say we ran a reading intervention study and didn’t use a control group. During the course of the study, we measured reading outcomes and also children’s height. At the end of the intervention study, we noted that there were gains in reading and also growth in height. Would we say that the children’s increase in height was due to the reading program? No, of course not, that would be ridiculous. 


If, in that reading intervention & height study, we had included a Control Group (a group not receiving reading intervention), after the study, we would have compared the data between the two groups. It would have been crystal clear that no, the reading intervention had nothing to do with increase in height - we would quickly see that the Control Group children increased in height just as much as the Treatment Group who received reading intervention. 


The same goes for any increase in reading ability. 


Without a Control Group, we can measure reading progress, but we simply have no way of knowing whether the intervention had anything to do with it. We cannot definitively attribute any growth we see to the intervention program itself. It could be due to some other factor, such as ‘natural’ progress due to self-teaching, or business-as-usual classroom teaching.


Sidenote: Studies that use Control Groups are called experimental or quasi-experimental studies. You will often see this mentioned in the “Abstract” section of the journal article or dissertation that summarizes the research. These are the studies that are most valuable to us if we want to look at the effectiveness of a reading program. And within these, the gold standard is what is called a Randomized Controlled Trial (RCT).


I will only be including studies with control or comparison groups in my analysis.



Scientific Jargon: “Significance” vs “Effect Size”


Beware the word “significant” in academic research. It may not mean what you think it means.


Usually when we see the word "Significant" we think "large" or "substantial." So if we read the phrase “The program had a significant impact on reading scores,” we might intuitively think, that this means: “The program had a LARGE impact on children’s reading scores”


This is not how the term "significance" is used in academic research.


Significant does not mean large.


Instead, there is a second meaning that is much much more common in academic literature: Statistical Significance. In these cases …


Significant = The result you got would not likely occur unless the treatment itself was having a REAL impact.


Yeah, it’s a mouthful. And THIS is the way that the term “significant” is used most often in academic literature with an experimental design. It refers to statistical significance. So in these instances, the phrase “The program had a significant impact on reading scores” does not necessarily mean large impacts, instead it means: “The program DID, ITSELF, APPEAR TO HAVE A REAL impact on reading scores.” 


In this usage, the phrase indicates NOTHING as to whether the impact the reading program had was a large effect, a medium/moderate effect, or a small effect … or even if the program had a positive or negative effect! “Significant” only indicates that, statistically speaking, the reading program probably had a real impact, and the result probably wouldn't occur due to chance or an external factor.


IMPORTANT NOTE: I will only use the word “significant” in this second, statistical way on this page. Whenever I use it, just think: REAL RESULT or LIKELY NOT DUE TO CHANCE.


EXTRA INFO FOR THE CURIOUS: In my research summaries below, I haven’t bothered to list the numbers that researchers use to report statistical significance. I may add them in later, but for now, to keep things as simple as possible, I have just summarized the findings as significant or not. Typically, significance is measured by Confidence Intervals or a P-Value. P-Values are more commonly reported, but this is shifting. If you are interested in looking at the actual research studies yourself, know that significance is reported on a continuum. A p-value of 0.05 basically means that there is a 5% chance that the results could have occurred even if the treatment (i.e. reading intervention) had no real effect. Basically, it’s the percent chance that we thought our treatment had a real impact when it didn't (technically “spurious” / false positive). A p-value of of 0.01 means that there is a 1% chance of this error. For p-values, smaller is better. Typically, researchers only accept a maximum of 5% chance of error, i.e. a maximum p-value of 0.05. Anything greater than this and the result is dubbed "not significant," and we can't really use it as evidence.


If a result is significant, it means we can probably rely on it to be real evidence.

If a result is not significant, it means we probably can't rely on it as evidence.


Typically, it is difficult to reach statistical significance levels in small studies (i.e. with few students).



Effect Size = Amount of Impact


Okay, so significant means real evidence, and not significant means we can't really use the results as solid evidence.

But what we really want to know is, of results that were real/significant, how much impact did the reading intervention have? Instead of using the term “significant,” when academic research discusses the amount of impact that a treatment (ie reading intervention) had, they will usually refer to this as the “effect size.”


Effect sizes are calculated in many different ways, depending on the statistical analysis that is performed. All you really need to know is that effect sizes are usually described in three categories (1) Large/Strong, (2) Medium/Moderate, and (3) Small/Weak. You should also know that in education research, large effect sizes are rare, and teacher effect size is right at the small/medium threshold (see also this video). In other words, what we want to see is an effect size beyond the “teacher effect” - that is, we want to see at least a Medium effect size, and we should keep in mind that it is rare to see Large effects.


Effect sizes are not always calculated in the research I have reviewed. At this time I have opted not to calculate them myself, but I may do so at a later date, because this information would be valuable to have.


EXTRA INFO FOR THE CURIOUS: In my research summaries below, I have not included the exact numbers calculated for effect size. I may add them in later, but for now, to keep things as simple as possible, I just refer to the effect sizes by category: [Large/Strong], [Medium/Moderate], and [Small/Weak]. The reason I don't provide numbers is that effect sizes are calculated differently depending on the statistical analysis required by the study and these are reported on different scales.


For example: Cohen's d (a common effect size calculation) reports Effect Sizes in this way: at least 0.2 = Small effect, at least 0.5 = Medium effect, and at least 0.8 = Large effect .... but for Pearson's r (another common effect size calculation): at least 0.1 = Small effect, at least 0.3 = Medium effect and at least 0.5 = Large effect.


Comparing numbers across studies can clearly be deceiving. Which is why I don't include them here.


Whew. Okay.


As a recap:



Control Groups are very important. 

“Significant” = Real Evidence of Impact


“Effect” = Amount of Impact



Enough with the scientific jargon and onward to the research summaries!



Evidence of Effectiveness: Broad Research


So, how do we choose a phonics intervention program? How well do each of these types of programs help dyslexic and otherwise struggling readers? What does the evidence say? 


Which programs have a Good Evidence Base?


While we don’t have many experimental studies comparing one phonics program to another directly, we do have experimental studies which examine phonics interventions compared to control groups. And we have Meta-Analyses of these studies. Meta-Analyses are studies which compare the effectiveness of a program in one study to the effectiveness of another program in a different study using a standardized measure of effectiveness (for example, comparing significance and effect sizes). This sort of research can be extremely helpful in sorting through all the noise. 


So, What Do the Meta-Analyses and Literature Reviews say?


In 2001, the National Reading Panel conducted a meta-analysis of phonics interventions and found that while overall, the phonics interventions studied had positive effects compared to controls, and moderate effect sizes on average (d = 0.41), but Orton-Gillingham approaches had the lowest effect sizes by far (d = 0.22).


Similarly, in 2006, Ritchie & Goeke looked at studies examining Orton-Gillingham based interventions and their results showed “inconclusive findings of the effectiveness of OG programs" due to mixed results, and lack of well-designed research. A subsequent meta-analysis of OG research was carried out a decade later by Stevens et al (2021). Unfortunately, they also found that “Orton-Gillingham reading interventions do not statistically significantly improve foundational skill outcomes (i.e., phonological awareness, phonics, fluency, spelling … vocabulary and comprehension outcomes)” when compared to other interventions or business-as-usual controls. 


For more in-depth summaries of and commentary on this body of research, see Dr. Ginsberg’s “Is OG The Gold Standard?” … Solari et al’s “What does science say about OG?” and “Which aspects are supported?” as well as evidence that Syllable Division rules are not very useful and Nathaniel Hansford’s summaries of Orton-Gillingham research, including Multi-Sensory research, Wilson: Fundations research and Sonday research. 


It all amounts to the same thing. Mixed, underwhelming results for Orton-Gillingham methods. As Dr. Tim Shanahan puts it, “so much for being the gold standard.”


Whoa, whoa, whoa. Okay, if you are anything like me when I dove into the academic research on phonics interventions, you might be starting to freak out. I mean, Orton-Gillingham is THE thing, right? I won’t lie, I started to panic a little. I mean, where does this leave us?


Hang On. Don’t Panic.


We’re just getting started. Keep reading.


Here’s the thing. While a meta-analytic study can give us a good sense as to how well a general approach to intervention is working (or NOT working), the issue is that it groups programs together which may have quite different approaches in practice. So I decided to investigate a bit further. 



What DOES Work, and What Doesn’t?


It’s important to keep in mind that with just a few exceptions, the vast majority of phonics programs examined in research studies DO result in positive outcomes for students. It’s just that some of them are more effective than others, and some of the most popular pre-packaged programs don’t appear to be the best ones.


I dug up the original research studies that were summarized by Stevens et al 2021 in the most recent Meta-Analysis of phonics research, grouped them by the specific phonics program used, and summarized the findings of each one below. I also included research studies examining several other popular phonics programs. This is ongoing work, so as I find more that meet the criteria outlined below, I will review and add those.


Below, I have organized the program summaries into four categories based on the evidence of effectiveness for the program: (1) Poor, (2) Mixed/Uncertain, (3) Good or (4) Excellent.


This organization gives us a clearer picture as to what the research says about an individual program, rather than a generic approach.


Criteria for Inclusion


On this page, I’ve ONLY included research studies which used a Control Group. Why? Remember our Reading Intervention & Child Height example from earlier? Without a control group to compare to, we have no idea whether or not it is the program which led to reading growth, or some other factor (such as standard classroom instruction, outside tutoring, passage of time etc). With one exception, I've only included research that has been peer-reviewed ("peer-review" basically means that people from competing universities fact-checked the work of the researchers). I have only included research studies that looked at how interventions specifically performed with dyslexic or otherwise struggling readers, not the broader student population. I love our non-struggling readers, but quite frankly, they aren't who I'm worried about.


Below, I will summarize my findings, but if you want to dig deeper, search for a phonics program on Google Scholar, or DOAJor Core UK and see if you can find an experimental or quasi-experiemental study that used a control group. Some academic journals are open access now, so even if you don’t have a university login, you might find what you need. Reading the Abstract (which is usually open access) and Discussion sections of a paper can often give you a fair idea regarding the results of the study without wading into complex methods and tables of confusing data.


For an even easier overview of some of the reading program research, visit Pedagogy Non Grata’s research summaries. They cover many more programs than I do here, including HMH, SPIRE, SIPPS, Logic of English (sorta), and Read 180 to name a few. They include programs designed for classroom use, not just intervention. Other sources of information on the effectiveness of reading interventions include the What Works Clearinghouse and Evidence for ESSA. The Reading League also provides commentary on a program’s theoretical base. Be very careful how you interpret the results from WWC and ESSA, however. They have two ratings for each program. One measurement rates the quality of the research, and one measures the average effect size of the program. So a program could, for example, have a rating of "Strong 0.02" which actually means that there is strong evidence of a weak effect (i.e probably not a great program). They also tend not to include many research studies, so it can be difficult to draw generalized conclusions from them.


ONE FINAL NOTE: Please note that though I hold a PhD, and have read many many research articles in my day, my degree is not in Education. I am not an educational researcher myself. I have summarized the research findings below to the best of my ability, but it is always wise to cross-check this information by digging into the original sources, as well as peer-reviewed articles and meta-analyses written by educational researchers who have examined these studies.


All right, without further ado, here is the summary of the scientific research on Phonics Interventions. Below each program summary, I have an even more concise tl;dr section "My Takeaway"



Evidence of Effectiveness: Phonics Interventions Program by Program

Click on the links below to read a summary of the research on each program. 

Keep in mind that these are currently draft summaries of the research, with some commentary from me at the end of each entry. Many studies did not include effect sizes. I may calculate these myself at a later time. I am also continually looking for more research studies to add to this compilation.

At this time, I have only included research that meets these criteria: (1) peer-reviewed studies with (2) a control/comparison group which studied the effectiveness of an (3) intervention for (4) struggling readers.

Phonics Intervention Evidence of Effectiveness
Table of Contents

   click to read

Poor

Wilson
Excellent

Reading Simplified / TRI ("Speech-to-Print")


If the program you are interested in is not listed, you may find it useful to read the summaries from those that are the closest fit. For example, for All About Reading or Logic of English, you may glean insights from the Orton-Gillingham (Other) and Barton pages.


Is there a program you'd like me to look into? 

Let me know in the Comments!

I can't promise that I'll be able to find solid research on it, but I will try my best.


No comments:

Post a Comment

Welcome

Welcome! Looking for information on how to teach children to read, write, and spell? You've come to the right place.  As a mother and li...