Data Underdetermines Our Hypotheses.

One of the big ideas to come out of Philosophy of Science is that of Underdetermination. The folk-science view of things is that if you have a collection of facts, then those facts (all neatly tied together) will point at a conclusion.

It may be objected that this is an extremely viewpoint, and that’s also true. A slightly more complex version would that if you have a collection of facts, then those facts (all neatly tied together with a hypothesis) will point at a conclusion. This is also problematic.

It’s problematic because the hypothesis often comes from that collection of facts. We often also have ‘the conclusion’ already, and we’re often stuck trying to figure out how Event A caused Event B. So we look at our facts, we create a hypothesis that accounts for (ideally) all of them, and then we spin out a conclusion (or explanation, or whatever it is we were looking to do), all nice and neat.

But, alas, it’s not nice and neat at all. Because the facts underdetermine the hypothesis.

Take a quick look at the below picture.

Hypotheses as circles1

The collection of dots represent the various points of data that we wish to explain by way of a hypothesis. So we construct Hypothesis A (the smaller brown circle) to explain those facts. But someone else, competing with us for funding, has a competing hypothesis, the larger blue circle Hypothesis B. With C, we have yet another competing hypothesis (for the same data)

How do we know which one is correct? All adequately explain the data under question. On the face of it, focusing entirely on the data we have, there is no way to determine which one is correct. The takeaway point here is that there is no upper limit to the number of hypotheses that can be generated from these datapoints.

Additionally, Hypothesis B and C are not only competing, but they are mutually exclusive, as one predicts that X will be found (whatever it is) and the other asserts that not-X will be found. It’s not possible for both of these hypotheses to be true. Does this means that it’s a wash, and that science knows nothing? No, it means that there is more than one conclusion that can be reasonably drawn from the data. Further testing would eliminate at least one of these hypotheses.

Things become more complex when we are dealing with physical data, and the historical record: some data points are missing. When it comes to the fossil record, vast quantities of the data is missing. As such, there may be data out there that would eliminate certain hypotheses, but it’s simply unavailable to us (none of the relevant fossils survived, for example, or we simply haven’t dug in the relevant location). In this situation, whether X is true, or Not-X is true, may simply be unknowable.

This doesn’t mean that “we don’t know anything!”, or that the data isn’t “sending a definitive historical signal about what happened in the past” (Meyer, Darwin’s Doubt, pg 117): the data is. Our inability to correctly interpret doesn’t mean that no interpretation is possible, nor does it mean that there is no signal.

This post is a precursor to my review of Darwin’s Doubt.

Follow Brian on Twitter!



Leave a Reply

Your email address will not be published. Required fields are marked *