H T M L   T R A N S C R I P T   O F

Interpreting Statistics

by   D A N   W A R T E N B E R G
[ Originally published in 1986 when Wartenberg was
a research fellow at the Harvard School of Public Health ]

 I am going to talk about three problems that are important for journalists and scientists to know about. The first is the concept of multiple testing: how many times do you test something, and how does that affect the interpretation of statistics? There is also a question of data standardization: what did you really expect to occur, and how does that influence the interpretation of the results? Finally, there’s the question of statistical power: what does it mean if you get negative results?

 The first question relates to some of the work that I do, and it has to do with clusters. Just what is a cluster, and when do you think you have a cluster? A lot of times there will be a health outcome, like cancer or birth defects. And people will say, ‘Gee, that looks pretty unusual.’ So we think that it’s a cluster, since it’s unusual and there are a large number of cases, either close together or aggregated in some other way.

 What do we mean by an unlikely chance? It shouldn’t happen 1 in 10 times? Or you wouldn’t expect it to happen 1 in 100 times, unless something unusual were going on — something like that. Then it becomes important to ask, ‘Well, how often did we look?’ If we were looking at a cluster in a neighborhood of a few blocks, then we could look at all such areas in the United States too, and there might be millions of them. If we looked a million times, we might expect to see this cluster 1,000 times, just by chance alone. As it turns out, we really have to ask the context in which we were thinking about this cluster, and how often we asked the question, ‘Is this unusual?’

cartoon poking fun at our attempts to scientifically map cancer clusters

^ Original graphic accompanying Wartenberg’s article, as printed in 1986.

 A similar sort of thing happens when one looks at cancer rates and then ranks them. You might see in the newspaper that some county or town on Cape Cod has the highest lung cancer rate in Massachusetts. And people say, ‘Well, there must really be some problem there. I wonder what’s causing that.’ Again, it really depends upon how you ask the question. In fact, it may be true that there’s a problem. There may be something that people are being exposed to.

 On the other hand, we can take the cancer rates for the 351 towns in Massachusetts and order them, and there is always going to be one that’s the highest. That’s the consequence of ordering them. So if we then say that it happens to be this town out on Cape Cod, there might not be a problem. That’s just a natural variation in numbers, and finding the highest one doesn’t tell us there’s a problem. What we really want to know is, ‘How high is it?’ How different is it from the other rates? How unusual is it?

 On the other hand, what if someone had come and said, ‘I think there is a real problem in this town. Would you look at the lung cancer rates?’ And if you look at those rates and find that, in fact, that town has the highest lung cancer rate, then we’ve gone about the problem in a different way. We’ve asked that if we pick a town at random, how likely is it that town will have the highest cancer rate? And that chance would be 1 out of 351. The fact that we hit it is pretty unusual, so that suggests that there might be a problem.

 Looking for Causes

 What I’m getting at is that one has to look for causes. You can’t just say that something’s unusual or that it’s a cluster, and therefore we have a major public health problem. When we find some sort of data that suggests an unusual situation, that should prompt us to ask, ‘What’s causing it?’ You shouldn’t just accept the statistics as showing that there’s an unusual situation.

 There is even a question about how one asks, ‘Is it unusual?’ What do you say if someone says, ‘We just found five new leukemias in Woburn’? How unusual is that, over the past two years? Or what if we didn’t find any over the next five years? Is that unusual? What they’re not telling you is very important, which is the number that’s expected. What is the expected value of the number of cases of cancer or the number of cases of leukemia? I think that’s a really important question that journalists have to ask.

 When someone comes out with a number and tells you, ‘We just found this rate that’s very high,’ we have to ask, ‘What did you expect? And how different is it from what you expected?’ There are a variety of ways that one can do that. It depends on, in the case of leukemia rates, the number of people that are considered, for one. How many children are there in Woburn that we might want to consider in deciding that there is an unusual number of leukemia cases? Often, people will report data in terms of rates, like standard mortality ratios or some relative risk.

 Getting back to clusters, one of the things that people often forget to adjust for is population. In an article published about a particular type of cancer, there’s some data that looks like clusters — but look, one’s in New York, and one’s in Buffalo, and one’s in Syracuse. So what’s [happening] is that there are more cancer cases, but there are more cases because there are more people in those cities. The rates might not be high. And that’s the point of the article: that we have to adjust for population, and we have to talk about rates in proportion, not numbers, or else they can be very deceptive.

 Standardizing Data

 In a similar incident, I worked with people in a county health department looking at how groundwater contamination had spread from a certain source. We looked at how many wells were contaminated as we went away from the source. They published a report saying that most of the wells that were contaminated, over 80%, were within 500 feet of the source. And once one got out to about 1200 or 1300 feet, they said there was no chance of contamination.

 Well, I went back and looked at the data and asked, ‘How many wells are there that are greater than 1500 feet away?’ And it turned out that there weren’t many. So when you normalize for the number of wells out there, you have a 10% chance of having a contaminated well, even if you live over 2000 feet away. They were drawing the wrong conclusion because they weren’t standardizing their data to what was expected. And that turned out to be a very important problem. They were telling people that if they lived more than 1500 feet away from this source, they didn’t have a problem. That wasn’t true. It just meant that you probably didn’t have a well that you were drawing drinking water from. But if you did, you had better worry.

 Also, people often forget to consider confounding variables — variables that are likely to be misleading. One factor that’s often ignored is age. You can look at general cancer rates and it may turn out that there is a community that has a very high cancer rate relative to another community. Again, you have to ask, ‘What is expected?’ If it turns out that the community that has the high rate has many very old people, that may not be surprising. In fact, it may turn out to be a low rate. So it’s very important to consider the other factors that could contribute to the outcome that you’re looking at. Have those factors been taken into account?

 Finally, I want to mention the topic of negative results. People put a lot of credence in them. A common example is dioxin — it seems to be pretty popular today [c.July 1986] — where there are reports that scientists looked for epidemiological effects from dioxin and couldn’t find them, so it must not be dangerous. A recent Scientific American article states, ‘Concern that this material is harmful to health and the environment may be misplaced. Although it is toxic to certain animals, evidence is lacking that it has any serious, longterm effects on human beings.’

 Well, if evidence is lacking, does that mean it’s safe? I’m not saying whether it is or isn’t. It just seems that conclusion is completely unjustified. We don’t have sufficient information. So what we have to ask is, ‘Why didn’t we find it if we looked? What was the problem?’

 Finding an Effect

 That gets into the issue of statistical power: if there is an effect, how likely are we to find it? This is a very complicated concept, and lots of scientists don’t utilize it in designing their studies. Epidemiology is a very difficult science — a lot of people I know say it’s like using a blunt instrument to try and find something.

 So the fact that we can’t find an effect doesn’t mean that it’s not there. It means that our methods are just not quite sensitive enough to pick it up. It might be that we have to see a five-fold increase in a particular outcome, such as a type of birth defect, to even begin to suggest that it was unusual. Well, that doesn’t mean that if it’s less than that, there’s nothing going on. It means that our methods are not very effective at picking this up.

 In looking at results, it seems important to ask the question, ‘How big an effect would you have to have before you found it?’ Could you have picked up a doubling of the rates? Or even one-and-a-half times the rate? If you’re looking at a disorder that’s very common in the population, it’s a very small fluctuation. If it’s a disorder that’s very rare, it’s much more difficult to find in the population. That doesn’t say anything about the effect; that just says whether or not we can find it. You also have to ask how big the study was. If you looked at 10 people, it’s going to be a lot tougher to pick up an effect than if you looked at 10,000 people.

 These are issues related to statistical power that are very important. People should be very cautious about taking negative effects and assuming that we have, in fact, proved something. We haven’t. When we’re trying to disprove a null hypothesis — for example, that dioxin is harmful — and we can’t, the converse — that dioxin is safe — just doesn’t follow. What it shows is that, at this point in time, we have not been able to demonstrate that it is harmful. But maybe we haven’t looked at it the right way or asked the right questions. 

—  Dan Wartenberg, “Interpreting Statistics.” Science for the People, July–August 1986, vol. 18, no. 4, pp. 14–15.