Dr. Dean Ornish is a physician famed for his very low-fat diet and lifestyle changes as a way to halt and even reverse heart disease. His first study was sufficient to convince many insurance companies to pick up the tab for his therapies. Although his studies have often been small, there is good evidence that he has helped many people, including Bill Clinton, to deal better with their heart problems.

So, should we all go in the Ornish diet? Probably not, because there is little evidence that it will help ordinary people, and even less evidence that most people can tolerate it. The amount of fat allowed – 10% – is so small that you have to eliminate all kinds of food, starting with meat, but including egg yolks, butter and cream. Pastries are banished to the ninth circle of hell, which must be killing Mr. Clinton.

Elimination diets like this are notoriously hard to maintain for any length of time. The instant you say I can’t have a doughnut, I can think of nothing else. Nevertheless, if you have a bad ticker or clogged arteries, you might give it a whirl. Imminent death can really concentrate the mind.

The problem some people have with Ornish is that his studies lack statistical rigor. His first study, published in Lancet in 1990, claimed that lifestyle changes could reverse heart disease. He compared 22 coronary patients on his lifestyle program to 19 patient controls. He showed that, after a year, coronary arteries opened up in the lifestyle group (presumably reversing heart disease), but closed up in the controls.

It was a good result: indicating that heart disease could be reversed without drugs. But the study was small. Why does that matter? Well, the main point of statistics is to get rid of noisy data from the messy world of reality, and that takes big numbers to smooth things out. To take an extreme example, if you pick just one person at random and put him or her on an experimental protocol, such as eating Twinkies all day in order to prove that Twinkies cause pink-eye, how do you know that you just didn’t have the bad luck to pick someone who is prone to pink-eye? So researchers use more people to reduce the odds of accidentally picking an unrepresentative group. The more people you enlist in your study, the less likely you are to have your results caused by random chance.

**Everything you need to know about p values in one paragraph: **In statistics, to analyze your data, you make the perverse assumption that your treatment won’t work. If it doesn’t work, then at the end of your study, everything will look just like it did at the beginning and you will be totally bummed. This depressing expectation corresponds to a probability of 100% ordinary, or equivalently, a p value of 1. You will not be publishing your paper if all of your patients fail to respond to your treatment. If you see anything different from ordinary, however, perhaps your treatment is actually doing something. The less expected your results are, the lower the p value and the more likely that your treatment is responsible. If the probability is less than 5% (p < .05), you can get your paper published, because most scientists will be satisfied. Even more will be satisfied if you can get p < .01. Low p values mean your results are *significant*. That doesn’t mean a large effect, and it doesn’t mean dramatic. It just means that it’s less likely to be an accidental finding.

How do you get the p-value down? One way is to add more people to your study. For instance, let’s say you want to see if a coin is fair. When you flip a fair coin, you expect an equal number of heads and tails. So if you get 5 heads in a row, which seems unlikely, you might have a rigged coin, right? But the p value for that sequence is only .06, not enough to go to press. But just one more head – 6 heads in a row – cuts your p value *in half* to .03. Now you’re gold. Another two flips for 8 heads in a row gives you a p of .008, which is pretty convincing evidence that your coin isn’t fair. It makes sense that the more heads you get in a row, the less likely it is that your coin is fair. Similarly, every time you add a patient who responds to your treatment, it lowers the p-value and increases the significance of your findings. At some point, (p < .05 or p < .01) everyone is convinced that the success of your treatment is unlikely to be a fluke and no more proof is needed. Until then, the more the merrier!

Dr. Ornish’s study is plagued with high p values. HDL cholesterol levels have a p > .8. Triglycerides levels have p > .24. Apolipoproteins levels have p > .46. Blood pressures have p > .7 (systolic) and p > .8 (diastolic). These are unusable measures, and these readings therefore have no significance. The study just wasn’t large enough. These are all important measures of heart health, but Ornish may as well have skipped them.

Although none of the controls in the study died, one of the patients on the Ornish protocol did. Since there were only 22 patients in the experimental group, that represents a 4.5% rate of death for the year. For normal 50-somethings (the age of this group), the death rate is more like 1% in a year, plus or minus .1%. That makes this death a big nasty outlier. A large dataset can absorb these outliers easier than a small one. Here it looks quite bad, but Dr. Ornish says that it was due to a patient who was too exuberant with his exercise. Interestingly, in other parts of the study, Ornish shows that the more exuberant the patients were about the diet, the better their healing. Apparently, there is an exuberance cut-off, but it isn’t delineated in the study. At any rate, because he died before a final measure could be taken, he was left out of the study.

Another problem with the size of the study has to do with women. Specifically, there was only 1 woman in the experimental group. This part of the study is about as significant as my Twinkie study above. To his credit, Ornish recognizes this serious limitation. We still are waiting to find out about the other half of the population.

So, in our crash-statistics course, we’ve learned why studies should be big. The Framingham Heart Study with 5,200 people is big. The Nurses’ Health study with 238,000 people is big. The Ornish study with twenty-two patients is not. And maybe Ornish is just tired of hearing about how small it is, because in a recent article for Medline, he says:

“It is a common belief that the larger the number of patients, the more valid a study is. However, the number of patients is only one of many factors that determine the quality of a study. Judging a study by the number of patients is like judging a book by the number of pages.”

That’s a nice metaphor, but it is wrong. Significance will always be bound up with size, and no magic can release us from those statistical shackles.

Ornish quotes Dr. Attilio Maseri, an Italian cardiologist, to buttress this claim:

“Very large trials with broad inclusion criteria raise grounds for concern for practicing physicians and for the economics of healthcare. The first is the fact that the larger the number of patients that have to be included in a trial in order to prove a statistically significant benefit, the greater the uncertainty about the reason why the beneficial effects of the treatment cannot be detected in a smaller trial.”

To which I say, “What?”

This may have suffered a little in translation from Italian, but it’s fairly unintelligible as is. You have to go to Maseri’s original article to realize that he is really saying that large studies are expensive, and that you can get more bang for your research buck if you select a smaller but more representative sample. That is a more defensible stand, but not the same as Dr. Ornish’s strangely unscientific statement.

Big studies are better, because they offer more reliable conclusions. Don’t let Dr. Ornish’s fame and clout let you think otherwise.

Don’t miss my pithy diet tweets! Follow me @NotchByNotch