Know Your Options: Phonics Interventions



While a comprehensive reading program has many critical components, we know that phonics and phonemic awareness are foundational. A kiddo cannot comprehend what they cannot read. Unfortunately, one of the major gaps in our school system today is lack of sufficient, explicit instruction in the foundational reading skills: Phonics, Phonemic Awareness, and Decoding.

If your child is struggling in these areas, you’ll want to look into intervention. If you have a formal diagnosis and an IEP, this can happen in a school setting. Alternatively, you may opt to hire a private tutor or even DIY. These are all valid options. 


But…


The intervention has to be GOOD. 


So how do we know which phonics interventions are better than others? Over the years, a few key elements have emerged as essential to a foundational phonics program. Phonics, phonemic awareness, and decoding skills need to be taught explicitly and systematically, following a scope and sequence that focuses on the individual sound ("phoneme") level, with the teacher modelling each concept, a high amount of student practice, and immediate corrective feedback.


Here's the problem.


Theoretically, many phonics programs have all these components. But in practice? Some of them are far more effective than others.


So what does the real-world evidence say?



REALLY Know Your Options:

What Does the Scientific Research Say?


Just because a phonics program has a strong theoretical basis, and is “aligned” to the science of reading, does not mean that it has evidence of effectiveness. Things that might sound great in theory, don’t always work out the way we’d hoped once we test them out in the real world.


When my own kiddos struggled to learn to read, I went on a deep dive into the research on the real-world effectiveness of phonics programs. And what I found wasn’t always pretty. 


What did I find? I’ll walk you through it all below.

 

I did the deep dive so you can do the snorkel.

(.... okay, let's be honest. It's more like a snorkel tour; it's an extended snorkel). NOTE: This blog post is still in draft form as I continue to review the research, but I'm making it available for those interested.

Understanding the Research

If you’re well-versed in reading academic journal articles, are familiar with scientific jargon, and already know all about the importance of statistical significance and effect sizes, and control groups, then feel free to skip this next part and hop down to my summary of the research in the Evidence of Effectiveness section, or the phonics program Table of Contents.


If you’re not familiar with these terms, you’ll definitely want to stick around for a minute, because these concepts are critical to understanding the research on the effectiveness of phonics programs. I’ll take you through each of these concepts. If you have some extra time, you might also find this video on Reading Program Effectiveness: Science vs. Snake Oil and this Teacher's Guide to Meta-Analyses quite helpful. 




Scientific Research Crash Course


Okay, there are a few terms that are important to know before diving into the scientific research on reading programs, and my summaries below: Control Groups, "Signifiance," and "Effect Size." Here comes a crash course.



Control Groups: What Are They, and Why Do They Matter?


When diving into the research on the evidence-base for any program, it is very very very important to check if the researchers compared their data to that of a “control group” or control population. Here’s why:


A Control Group is basically a group that did not receive the intervention being studied. The group who received the intervention (usually called the “treatment” group) is compared to the control group. Why is this important? 


If a study does not compare the progress of the students who DID use the program to the progress of similar students who did NOT use the program, there is no way to tell whether or not the program had an impact, or if the students would have improved without it. Sure, students who received the intervention might have improved, but did they improve more than students who used something else? Without a control or comparison group, we quite simply don’t know the answer to that question. Intervention students might have had less progress than kids who received regular instruction. Without a control, we just don’t know.


Let’s put this another way:


Say we ran a reading intervention study and didn’t use a control group. During the course of the study, we measured reading outcomes and also children’s height. At the end of the intervention study, we noted that there were gains in reading and also growth in height. Would we say that the children’s increase in height was due to the reading program? No, of course not, that would be ridiculous. 


If, in that reading intervention & height study, we had included a Control Group (a group not receiving reading intervention), after the study, we would have compared the data between the two groups. It would have been crystal clear that no, the reading intervention had nothing to do with increase in height - we would quickly see that the Control Group children increased in height just as much as the Treatment Group who received reading intervention. 


The same goes for any increase in reading ability. 


Without a Control Group, we can measure reading progress, but we simply have no way of knowing whether the intervention had anything to do with it. We cannot definitively attribute any growth we see to the intervention program itself. It could be due to some other factor, such as ‘natural’ progress due to self-teaching, or business-as-usual classroom teaching.


Sidenote: Studies that use Control Groups are called experimental or quasi-experimental studies. You will often see this mentioned in the “Abstract” section of the journal article or dissertation that summarizes the research. These are the studies that are most valuable to us if we want to look at the effectiveness of a reading program. And within these, the gold standard is what is called a Randomized Controlled Trial (RCT).


I will only be including studies with control or comparison groups in my analysis.



Scientific Jargon: “Significance” vs “Effect Size”


Beware the word “significant” in academic research. It may not mean what you think it means.


Usually when we see the word "Significant" we think "large" or "substantial." So if we read the phrase “The program had a significant impact on reading scores,” we might intuitively think, that this means: “The program had a LARGE impact on children’s reading scores”


This is not how the term "significance" is used in academic research.


Significant does not mean large.


Instead, there is a second meaning that is much much more common in academic literature: Statistical Significance. In these cases …


Significant = The result you got would not likely occur unless the treatment itself was having a REAL impact.


Yeah, it’s a mouthful. And THIS is the way that the term “significant” is used most often in academic literature with an experimental design. It refers to statistical significance. So in these instances, the phrase “The program had a significant impact on reading scores” does not necessarily mean large impacts, instead it means: “The program DID, ITSELF, APPEAR TO HAVE A REAL impact on reading scores.” 


In this usage, the phrase indicates NOTHING as to whether the impact the reading program had was a large effect, a medium/moderate effect, or a small effect … or even if the program had a positive or negative effect! “Significant” only indicates that, statistically speaking, the reading program probably had a real impact, and the result probably wouldn't occur due to chance or an external factor.


IMPORTANT NOTE: I will only use the word “significant” in this second, statistical way on this page. Whenever I use it, just think: REAL RESULT or LIKELY NOT DUE TO CHANCE.


EXTRA INFO FOR THE CURIOUS: In my research summaries below, I haven’t bothered to list the numbers that researchers use to report statistical significance. I may add them in later, but for now, to keep things as simple as possible, I have just summarized the findings as significant or not. Typically, significance is measured by Confidence Intervals or a P-Value. P-Values are more commonly reported, but this is shifting. If you are interested in looking at the actual research studies yourself, know that significance is reported on a continuum. A p-value of 0.05 basically means that there is a 5% chance that the results could have occurred even if the treatment (i.e. reading intervention) had no real effect. Basically, it’s the percent chance that we thought our treatment had a real impact when it didn't (technically “spurious” / false positive). A p-value of of 0.01 means that there is a 1% chance of this error. For p-values, smaller is better. Typically, researchers only accept a maximum of 5% chance of error, i.e. a maximum p-value of 0.05. Anything greater than this and the result is dubbed "not significant," and we can't really use it as evidence.


If a result is significant, it means we can probably rely on it to be real evidence.

If a result is not significant, it means we probably can't rely on it as evidence.


Typically, it is difficult to reach statistical significance levels in small studies (i.e. with few students).



Effect Size = Amount of Impact


Okay, so significant means real evidence, and not significant means we can't really use the results as solid evidence.

But what we really want to know is, of results that were real/significant, how much impact did the reading intervention have? Instead of using the term “significant,” when academic research discusses the amount of impact that a treatment (ie reading intervention) had, they will usually refer to this as the “effect size.”


Effect sizes are calculated in many different ways, depending on the statistical analysis that is performed. All you really need to know is that effect sizes are usually described in three categories (1) Large/Strong, (2) Medium/Moderate, and (3) Small/Weak. You should also know that in education research, large effect sizes are rare, and teacher effect size is right at the small/medium threshold (see also this video). In other words, what we want to see is an effect size beyond the “teacher effect” - that is, we want to see at least a Medium effect size, and we should keep in mind that it is rare to see Large effects.


Effect sizes are not always calculated in the research I have reviewed. At this time I have opted not to calculate them myself, but I may do so at a later date, because this information would be valuable to have.


EXTRA INFO FOR THE CURIOUS: In my research summaries below, I have not included the exact numbers calculated for effect size. I may add them in later, but for now, to keep things as simple as possible, I just refer to the effect sizes by category: [Large/Strong], [Medium/Moderate], and [Small/Weak]. The reason I don't provide numbers is that effect sizes are calculated differently depending on the statistical analysis required by the study and these are reported on different scales.


For example: Cohen's d (a common effect size calculation) reports Effect Sizes in this way: at least 0.2 = Small effect, at least 0.5 = Medium effect, and at least 0.8 = Large effect .... but for Pearson's r (another common effect size calculation): at least 0.1 = Small effect, at least 0.3 = Medium effect and at least 0.5 = Large effect.


Comparing numbers across studies can clearly be deceiving. Which is why I don't include them here.


Whew. Okay.


As a recap:



Control Groups are very important. 

“Significant” = Real Evidence of Impact


“Effect” = Amount of Impact



Enough with the scientific jargon and onward to the research summaries!



Evidence of Effectiveness: Broad Research


So, how do we choose a phonics intervention program? How well do each of these types of programs help dyslexic and otherwise struggling readers? What does the evidence say? 


Which programs have a Good Evidence Base?


While we don’t have many experimental studies comparing one phonics program to another directly, we do have experimental studies which examine phonics interventions compared to control groups. And we have Meta-Analyses of these studies. Meta-Analyses are studies which compare the effectiveness of a program in one study to the effectiveness of another program in a different study using a standardized measure of effectiveness (for example, comparing significance and effect sizes). This sort of research can be extremely helpful in sorting through all the noise. 


So, What Do the Meta-Analyses and Literature Reviews say?


In 2001, the National Reading Panel conducted a meta-analysis of phonics interventions and found that while overall, the phonics interventions studied had positive effects compared to controls, and moderate effect sizes on average (d = 0.41), but Orton-Gillingham approaches had the lowest effect sizes by far (d = 0.22).


Similarly, in 2006, Ritchie & Goeke looked at studies examining Orton-Gillingham based interventions and their results showed “inconclusive findings of the effectiveness of OG programs" due to mixed results, and lack of well-designed research. A subsequent meta-analysis of OG research was carried out a decade later by Stevens et al (2021). Unfortunately, they also found that “Orton-Gillingham reading interventions do not statistically significantly improve foundational skill outcomes (i.e., phonological awareness, phonics, fluency, spelling … vocabulary and comprehension outcomes)” when compared to other interventions or business-as-usual controls. 


For more in-depth summaries of and commentary on this body of research, see Dr. Ginsberg’s “Is OG The Gold Standard?” … Solari et al’s “What does science say about OG?” and “Which aspects are supported?” as well as evidence that Syllable Division rules are not very useful and Nathaniel Hansford’s summaries of Orton-Gillingham research, including Multi-Sensory research, Wilson: Fundations research and Sonday research. 


It all amounts to the same thing. Mixed, underwhelming results for Orton-Gillingham methods. As Dr. Tim Shanahan puts it, “so much for being the gold standard.”


Whoa, whoa, whoa. Okay, if you are anything like me when I dove into the academic research on phonics interventions, you might be starting to freak out. I mean, Orton-Gillingham is THE thing, right? I won’t lie, I started to panic a little. I mean, where does this leave us?


Hang On. Don’t Panic.


We’re just getting started. Keep reading.


Here’s the thing. While a meta-analytic study can give us a good sense as to how well a general approach to intervention is working (or NOT working), the issue is that it groups programs together which may have quite different approaches in practice. So I decided to investigate a bit further. 



What DOES Work, and What Doesn’t?


It’s important to keep in mind that with just a few exceptions, the vast majority of phonics programs examined in research studies DO result in positive outcomes for students. It’s just that some of them are more effective than others, and some of the most popular pre-packaged programs don’t appear to be the best ones.


I dug up the original research studies that were summarized by Stevens et al 2021 in the most recent Meta-Analysis of phonics research, grouped them by the specific phonics program used, and summarized the findings of each one below. I also included research studies examining several other popular phonics programs. This is ongoing work, so as I find more that meet the criteria outlined below, I will review and add those.


Below, I have organized the program summaries into four categories based on the evidence of effectiveness for the program: (1) Poor, (2) Mixed/Uncertain, (3) Good or (4) Excellent.


This organization gives us a clearer picture as to what the research says about an individual program, rather than a generic approach.


Criteria for Inclusion


On this page, I’ve ONLY included research studies which used a Control Group. Why? Remember our Reading Intervention & Child Height example from earlier? Without a control group to compare to, we have no idea whether or not it is the program which led to reading growth, or some other factor (such as standard classroom instruction, outside tutoring, passage of time etc). With one exception, I've only included research that has been peer-reviewed ("peer-review" basically means that people from competing universities fact-checked the work of the researchers). I have only included research studies that looked at how interventions specifically performed with dyslexic or otherwise struggling readers, not the broader student population. I love our non-struggling readers, but quite frankly, they aren't who I'm worried about.


Below, I will summarize my findings, but if you want to dig deeper, search for a phonics program on Google Scholar, or DOAJor Core UK and see if you can find an experimental or quasi-experiemental study that used a control group. Some academic journals are open access now, so even if you don’t have a university login, you might find what you need. Reading the Abstract (which is usually open access) and Discussion sections of a paper can often give you a fair idea regarding the results of the study without wading into complex methods and tables of confusing data.


For an even easier overview of some of the reading program research, visit Pedagogy Non Grata’s research summaries. They cover many more programs than I do here, including HMH, SPIRE, SIPPS, Logic of English (sorta), and Read 180 to name a few. They include programs designed for classroom use, not just intervention. Other sources of information on the effectiveness of reading interventions include the What Works Clearinghouse and Evidence for ESSA. The Reading League also provides commentary on a program’s theoretical base. Be very careful how you interpret the results from WWC and ESSA, however. They have two ratings for each program. One measurement rates the quality of the research, and one measures the average effect size of the program. So a program could, for example, have a rating of "Strong 0.02" which actually means that there is strong evidence of a weak effect (i.e probably not a great program). They also tend not to include many research studies, so it can be difficult to draw generalized conclusions from them.


ONE FINAL NOTE: Please note that though I hold a PhD, and have read many many research articles in my day, my degree is not in Education. I am not an educational researcher myself. I have summarized the research findings below to the best of my ability, but it is always wise to cross-check this information by digging into the original sources, as well as peer-reviewed articles and meta-analyses written by educational researchers who have examined these studies.


All right, without further ado, here is the summary of the scientific research on Phonics Interventions. Below each program summary, I have an even more concise tl;dr section "My Takeaway"



Evidence of Effectiveness: Phonics Interventions Program by Program

Click on the links below to read a summary of the research on each program. 

Keep in mind that these are currently draft summaries of the research, with some commentary from me at the end of each entry. Many studies did not include effect sizes. I may calculate these myself at a later time. I am also continually looking for more research studies to add to this compilation.

At this time, I have only included research that meets these criteria: (1) peer-reviewed studies with (2) a control/comparison group which studied the effectiveness of an (3) intervention for (4) struggling readers.

Phonics Intervention Evidence of Effectiveness
Table of Contents

   click to read

Poor

Wilson
Excellent

Reading Simplified / TRI ("Speech-to-Print")


If the program you are interested in is not listed, you may find it useful to read the summaries from those that are the closest fit. For example, for All About Reading or Logic of English, you may glean insights from the Orton-Gillingham (Other) and Barton pages.


Is there a program you'd like me to look into? 

Let me know in the Comments!

I can't promise that I'll be able to find solid research on it, but I will try my best.


7 comments:

  1. Hi Carissa, thanks for a great, objective analysis! I assume the outcomes you analysed were for reading rather than spelling? I'd love you to look into the research on EBLI too! I have several major problems with the research on literacy approaches, in order of importance:
    1. Research almost always looks at short-term outcomes, which can mean outcomes are highly inflated due to kids learning partial decoding skills which makes them better at guessing. This is especially true for some measures like the San Diego Quick Test (often used by Reading Simplified and EBLI), which can highly inflate apparent reading ability (which I have seen when comparing it to other tests with struggling readers). We know that true skilled reading takes a long time to learn and requires a threshold of decoding skills.

    2. The control group is often given Business As Usual instruction. BAU is often Balanced Literacy instruction and/ or NO intervention time, compared to extra teaching time with the target approach. This means that the outcomes of the approach may be due to extra phonics teaching time or sometimes ANY phonics teaching, compared to NONE.

    3. The outcomes measured are often the average of all kids lumped together, so it is not capturing the outcomes for the kids with the biggest deficits.

    4. By measuring reading but not spelling, it can look like an approach is very effective, because it leads to improved reading scores, but not whether the kids reach grade-level. So, it may not be targeting the underlying skills required for orthographic mapping and may even sabotage these. If spelling were measured as well, this would be a better measure of whether orthographic mapping is improving.

    The above factors are likely involved in the initial perception that Reading Recovery had an excellent research base. This is still perpetuated by WWCH and ESSA (and you wisely point out cautions in interpreting their ratings). The reason for the high-quality research was because it was well-funded by governments, not because it was superior program. This eventually yielded a long-term study that showed Reading Recovery actually made reading worse in the long-term. Earlier, the research also showed no progress for kids with the biggest deficits. I think this gives us an important lesson about what we really need to consider with research on outcomes. That said, careful analyses like yours can show broad findings about which approaches may have major flaws that lead to lower outcomes. You can then use research on specific strategies, as you have done, to try to tease out why this might be the case (e.g. limited effectiveness of teaching syllable types). Thanks again!

    ReplyDelete
    Replies
    1. Hi Michelle,

      Thanks for your comment! Because I am only summarizing the research, not running a meta-analysis, the outcomes I analyzed drew from whatever data was available, be it reading or spelling data. I included a reference list with links at the bottom of each program page, so if you are looking to dig deeper, hopefully you'll be able to find what you need there!

      I'll try to provide some further information to each of your questions below:

      1. Short-Term Research & Assessment Measures:
      Yes, many educational research studies are not longitudinal, and do not include follow-up assessments. It would be much better if they did (though of course, the reason that they don't is that it is very hard to do). Regarding assessment measures, many different measures were used by the researchers. For brevity's sake, I did not list them in my summaries, however you can find that information in the original studies linked in the references section at the end of each page.

      Reading Simplified does use the San Diego Quick as a screener, but it was not used in these studies. Rather, the Reading Simplified studies used a variety of assessment batteries, including various Woodcock Johnson subtests and the CTOPP, among others.

      In general, the studies listed within these pages do not assess using informal screening measures. The only studies I include are experimental and quasi-experimental in nature, and as such, they use more rigorous assessment measures.

      2. Business-As-Usual control groups - Yes, many of the studies listed in these pages utilize business-as-usual control groups. Some also use comparison groups, wherein they compare two phonics interventions, but these are rarer. I did try to mention these distinctions within the summary pages. It would certainly be fabulous if we had more studies that compared two interventions. On the upside, because so many studies use BAU as the control, we can more easily compare across studies, whereas this would be difficult to do if each study used a different comparison program.


      Delete
    2. 3. Outcomes for Children with Biggest Deficits - One of my criteria for review here was that the study must focus on dyslexic or otherwise struggling readers. I don't include any studies that only measured effectiveness of the broader student population. That said, these studies have varying thresholds for which students are identified as struggling, and most of the studies do not tease out whether or not the intervention had varying effects for different subsets of students. If I recall, some of the LiPS and Reading Simplified studies had a bit of this sort of analysis, as do the Torgesen studies. It would be wonderful if we had more studies that did this, because our students obviously have diverse and varying needs.

      4. Reading vs. Spelling Measures - The studies varied widely in terms of whether or not they measured reading vs. spelling outcomes. Some did, and some did not. I tried to list these outcomes where the data was available. It would be fabulous if more studies assessed spelling.

      That said, the most effective reading interventions, and the most effective spelling interventions may look a bit different. The interventions summarized here are all phonics interventions, and therefore I would expect them to have a bigger impact on reading than spelling, especially in the shorter timeframes used in the studies. While certain interventions may be effective at getting kids decoding well and reading fluently, other interventions (and additional practice for orthographic mapping) may provide additional benefits for spelling, given that English spelling is heavily influenced by morphology and etymology. I'd love to do another series on spelling interventions, but that will be a bit down the road :0)

      I have a number of phonics programs on my To-Review list, but haven't had the opportunity to read the research studies yet. I would like to include EBLI on this list, but haven't been able to because I have not found any studies that meet my criteria. Reading Simplified, which I have reviewed, would be the most closely related program to EBLI (though obviously not the same). There are also some links to research on a few related linguistic phonics programs on the Reading Simplified page.

      Delete
    3. Wonderful! Thanks for the great insights, Carissa. I love that you specifically looked at the outcomes for struggling readers. I'm also glad the studies used rigorous measures. I've seen the San Diego Quick assessment used to "show" that kids have much higher reading ability than was thought, and also to measure progress - and I think that's a big mistake. I look forward to digging deeper into your summaries!

      Delete
  2. Thanks for putting this together, it's really helpful. Have you looked into Sounds Write at all? Or know which program it aligns most closely with?

    ReplyDelete
    Replies
    1. Hi Kim,

      The program that aligns most closely with Sounds-Write that I have reviewed is Reading Simplified. Like Sounds-Write, it is a Linguistic Phonics (aka "Speech-To-Print") program derived from Phono-graphix and Dr. Diane McGuinness' work. You can find my summary of the Reading Simplified research here:

      https://seasideliteracy.blogspot.com/p/research-on-reading-simplified.html

      Reading Simplified has the best, most consistent evidence of effectiveness of any of the programs I have reviewed.

      Toward the bottom of the entry, you will see a list of the other Linguistic Phonics programs, with links to experimental and quasi-experimental research studies of those programs, where available. I was only able to find one such study for Sounds-Write. That study found the program to be effective, albeit with a similar impact to a comparison phonics program.

      Hope this helps!

      Delete
    2. That's great, thank you so much!

      Delete

Welcome

Welcome! Looking for information on how to teach children to read, write, and spell? You've come to the right place.  As a mother and li...