Chapter 7 – The Second Attack on Bayesianism and a Response to
It
The
Bayesian way of explaining how we think about, test, and then adopt a new model
of reality has been given a number of mathematical formulations. They look
complicated, but they really aren’t that hard. I have chosen one of the more
intuitive ones below to discuss the theoretical criticism of Bayesianism.
The
Bayesian model of how a human being’s thinking evolves can be broken down into
a few basic components. When I, as a typical human, am examining a new way of
explaining what I see going on in the world, I am considering a new hypothesis,
and as I try to judge how true—and therefore how useful—a picture of the world
this new hypothesis may give me, I look for ways of testing it that will show
decisively whether it and the model of reality it is based on really work. I am
trying to determine whether this hypothesis will help me to understand,
anticipate, and respond effectively to events in my world.
When
I encounter a test situation that fits within the range of events that the
hypothesis is supposed to be able to explain and make predictions about, I tend
to become more convinced the hypothesis is a true one if it enables me to make accurate
predictions. (And I tend to be more likely to discard the hypothesis if the
predictions it leads me to make keep failing to be realized.) I am especially
more inclined to accept the hypothesis and the model of reality it is based on
if it enables me to make reliable predictions about the outcomes of these test
situations and if all my other theories and models are silent or inaccurate
when it comes to explaining my observations of these same test situations.
In
short, I tend to believe a new idea more and more if it fits the things I’m
seeing. This is especially true when none of my old ideas fit the events I’m
seeing at all. All Bayes’ Theorem does is try to express this simple truth
mathematically.
It
is worth noting again that this same process can also occur in an entire nation
when increasing numbers of citizens become convinced that a new way of doing
things is more effective than the status-quo practices. Popular ideas, the few
that really work, are lasting. In other words, both individuals and whole
societies really do learn, grow, and change by the Bayesian model.
In
the case of a whole society, the clusters of ideas an individual sorts through
and shapes into a larger idea system become clusters of citizens forming
factions within society, each faction arguing for the way of thinking it favours.
The leaders of each faction search for reasoning and evidence to support their
positions in ways that are closely analogous to the ways in which the various
biases in an individual mind struggle to become the idea system that the
individual follows. The difference is that the individual usually does not
settle heated internal debates by blinding his right eye with his left hand. That
is, we usually choose to set aside unresolvable internal disputes rather than
letting them make us crazy. Societies, on the other hand, have revolutions or
wars.
In
societies, factions sometimes work out their differences, reach consensus, and
move on without violence. But sometimes, as noted in the previous chapter, they
seem to have to fight it out. Then violence settles the matter—whether between
factions within a society or between a given society and one of its neighbouring societies that is perceived as
being the carrier of the threatening new ideas. But Bayesian calculations are
always in play in the minds of the participants, and these same calculations
almost always eventually dictate the outcome: one side gives in and learns the
new ways. The most extreme alternative, one tribe’s complete, genocidal
extermination of the other, is only rarely the final outcome.
But let’s
get back to the so-called flaw in the formula for Bayesian decision making.
Suppose
I am considering a new way of explaining how some part of the world around me
works. The new way is usually called a hypothesis.
Then suppose I decide to do some research and I come up with a new bit of
evidence that definitely relates to the matter I’m researching. What kind of
process is going on in my mind as I try to decide whether this new bit of
evidence is making me more likely to believe this new hypothesis is true or
less likely to do so. This thoughtful, decision-making time of curiosity and
investigation, for Bayesians, is at the core of how human knowledge forms and
grows.
Mathematically,
the Bayesian situation can be represented if we set the following terms: let Pr(H/B) be
the degree to which I trust the hypothesis just based on the background
knowledge I had before I observed any bit of new evidence. If the hypothesis
seems like a fairly radical one to me, then this term is going to be pretty
small. Maybe less than 1%. This new hypothesis may sound pretty crazy to me.
Then
let Pr(E/B) be
the degree to which I expected to see this new evidence occur based only on my
old familiar background models of how reality works. This term will be quite small if, for
example) I’m seeing some evidence that at first I can’t quite believe is real because
none of my background knowledge had prepared me for it.
These
terms are not fractions in the normal sense. The slash is not a division sign. The
term Pr(H/B), for example, is called my “prior expectation”. The term refers to my
estimate of the probability (Pr) that
the hypothesis (H) is correct if I
base that estimate only on how well the hypothesis fits together with my old, personal,
already established, familiar set of background assumptions about the world (B).
The
term Pr(E/H&B) means my estimate of the probability that the
evidence will happen if I assume just for the sake of this term that my
background assumptions and this
new hypothesis are both true.
The most
important part of the equation is Pr(H/E&B). It represents how much I now am
inclined to believe that the hypothesis gives a correct picture of reality
after I’ve seen this new bit of evidence, while assuming that the evidence is
as I saw it and not a trick or illusion of some kind, and that the rest of my
background beliefs are still in place.
Thus,
the whole probability formula that describes this relationship can be expressed
in the following form:
Pr(H/E&B) = Pr(E/H&B) x Pr(H/B)
Pr(E/B)
While
this formula looks daunting, it actually says something fairly simple. A new
hypothesis that I am thinking about and trying to understand seems to me increasingly
likely to be correct the more I keep encountering new evidence that the hypothesis
can explain and that I can’t explain using any of the models of reality I
already have in my background stock of ideas. When I set the values of these
terms, I will assume, at least for the time being, that the evidence I saw (E) was as I saw it, not some mistake
or trick or delusion, and that the rest of my background ideas/beliefs about
reality (B) are valid.
Increasingly,
then, I tend to believe that a hypothesis is a true one the bigger Pr(E/H&B) gets and the smaller Pr(E/B) gets.
In
other words, I increasingly tend to believe that a new way of explaining the
world is true, the more it can be used to explain the evidence that I keep
encountering in this world, and the less I can explain that evidence if I don’t
accept this new hypothesis into my set of ways of understanding the world.