This turned up in my alerts today. It is an undergraduate thesis from Fort Lewis College¹:
The Acute Effect of Heel to Toe Drop on Running Economy.
Brown, Harrison and Silva, Robert (2013). Undergraduate thesis, Fort Lewis College.
The purpose of this study was to assess how the running economy of experienced runners was affected when wearing 4mm and 0mm heel to toe drop shoes as opposed to regular running shoes. Previous studies have shown that barefoot running and running in lower heel to toe drop shoes increases running economy (Squadrone & Galozzi, 2009). The participants (n=23; 18 male and 8 female) were subjected to 3 separate tests that were each 20 minutes. The tests were performed within 90 minute, the order randomized. During the first test, the subject ran for 20 minutes at a speed they would run at for 1 hour. During the second and third test, the subject ran at the same speed in their randomly chosen shoes. Gas analysis was used to measure VO2 in kilograms and measurements were taken one time per breath for 20 minutes with a Vacumed mini-CPX. Using one way repeated ANOVA, results were not significant (p>.05). The results of this study show that there was not a significant difference in running economy between running with 4mm or 0mm heel to toe drop shoes and running with regular running shoes. This study was examining acute changes in running economy, however it is recommended that further research examine long term changes in running economy
They compared college runners in their usual shoes and also the 0 and 4mm drop New Balance Minimus running shoes and they measured oxygen uptake and found no differences between the 3 conditions.
The study does suffer from the usual constraints that go with an undergraduate type project but was still a pretty impressive effort. There are issues in this and other running economy studies about the acute vs habituated intervention. This just adds to pot of mixed results in studies on running shoe conditions and running economy. We are no further down the track on resolving which is the most economical way to run, but I suspect it is going to be very subject specific.
As always, I go where the evidence takes me until convinced otherwise.
¹This is the second undergraduate project from Fort Lewis College that I have commented on. the other one was this: Vertical Ground Reaction Forces Produced in Shod Running vs. Barefoot Running

Some comments:, having read the full paper:
1. 18 + 8 = 26, not 23. Two subjects were excluded due to “extraneous results”? That would be 24, not 23. Define “extraneous” so that we don’t have to worry about the author cherry picking the data.
2. The “Delimitations” section indicates that “1. This study will contain 25 college athletes, 10 men and 10 women of college age”. 10 + 10 = 20, not 25. Was this paper not reviewed for typos? In conjunction with my number 1 above, this raises some serious accuracy questions in my mind over the entire paper content.
3. There is no discussion as to why the study planned to enroll either 20 or 25 subjects, presumably evenly split between males and females, yet the actual sample studied (26) did not conform to the a priori specifications. What happened?
4. There was no a priori statement of a formal null and alternative hypothesis and no power/sample size calculation. How much of a difference between the shoes used would have been clinically significant and how many subjects would have been required to detect that difference? This study suffers from no control over the probability of either Type I or Type II errors. Had this study actually found a significant difference, it would have been just as invalid a conclusion as not having found a difference.
5. While a repeated measures ANOVA was used for the overall inter-group test, the Calculations and Statistics section indicates that a paired t-test would be used to conduct comparisons between pairs of shoes on the same subjects. However, there is no indication of a planned correction for multiple comparisons, nor is there a presentation of these results.
6. Figure 1, the presentation of VO2 per group, suggests a trend in declining VO2 from the control, to the 4mm to the 0mm. In conjunction with the lack of an a priori power/sample size, this suggests that there might be real differences that could not be detected due to a very small sample size.
7. Table 2 presents a summary of shoe weights for each type of shoe, with what appears to be a meaningful reduction in weight from the control to the 4mm to the 0mm shoes. However, there is no formal statistical test of the differences, to know if the difference was significant.
What kind of appropriate and experienced oversight did this undergrad have in terms of the study design and conduct? I hate to be hard nosed, but It was seriously lacking.
I hate to say it Craig, but this paper suffers from serious methodology and content accuracy issues and is worthless in terms of adding to the debate. Indeed, it is worse, because uninformed people will use it as a demonstrative example of the lack of benefit of lower weight shoes vis-a-vis running economy.
Thanks and regards,
Marc
Thanks Marc! Good analysis! I was not trying to be too hard on it as it was an undergraduate project. But, you are right:
The same can be said about most of the barefoot vs running shoe economy study’s which is why the water is still muddy on this topic.
Thanks Craig. Honestly, I am not sure that I would be too easy on them. It was a senior thesis and they knew enough to use a repeated measures ANOVA versus a independent groups ANOVA and also to randomize the sequences of the 3 conditions for each subject to avoid bias, which is the right way to do what is essentially a single sample cross-over study. That’s well beyond statistics 101. They got some things right.
A key problem that I have with this particular study and which caught my eye was the lack of quality in the thesis itself, which suggests a certain level of carelessness, which then has to make you wonder about the integrity of the rest of the design and conduct of the study. I am not an academic so not sure how their professor might grade this, but if they were to come to me with this study as an example of their research work, with those kinds of typos and errors, they would not have gotten an interview.
That all being said, a big problem with many of these studies, including this one, is that they are terribly underpowered. That leads to problematic conclusions that cannot be replicated by others. That in turn leads to conflicting results and the confusion and muddying of the waters that you mention, because of the high probability of Type I and Type II errors. In the case of the former, if a significant finding is observed, the likelihood is that the effect size is substantially overestimated. These studies do nothing to advance our knowledge, albeit they may serve the interests of the authors in meeting their professional needs to get published, more often than not, in journals that have weak peer review processes.
There was a recent article in Nature that covers the issue of underpowered studies that you might find interesting:
Power failure: why small sample size undermines the reliability of neuroscience
http://www.nature.com/nrn/journal/vaop/ncurrent/full/nrn3475.html
along with a commentary by one of the authors:
Unreliable neuroscience? Why power matters
http://www.guardian.co.uk/science/sifting-the-evidence/2013/apr/10/unreliable-neuroscience-power-matters
There was also some general media coverage of the Nature paper here:
http://www.theregister.co.uk/2013/04/12/brain_science_low_power_junk/
Thanks again Craig!
I’m assuming this was not published in a peer-reviewed journal, and if that’s the case I would cut them a lot of slack. Having supervised many, many undergraduate research projects over he past ten years the goal is typically not academic perfection, but rather getting completely inexperienced students some idea of what conducting a research project is like. In some sense the results are secondary to the process, which is learning how to do independent research.
Occasionally you get a group of students who do a good enough job that you can publish a study in a peer reviewed journal, but that’s only happened once in ten years for me. As such, I wouldn’t make much of this study unless it is in a journal, if for no other reason than the research advisor might well know that it’s not of high enough quality to go through that process. Peer review has its problems, but it’s a hurdle that blocks some stuff at the door.
More and more institutions are putting undergraduate projects online for greater availability, despite the issues with that level of research. I getting a lot more in my alerts. They have to be interpreted for what they are. Obviously there are dangers involved in the reading, interpretation and appraisal of research (of whatever level) by those who are not familiar with the reading, interpretation and appraisal of research.