Chapter 10 The Second Attack On Bayesianism And A
Response To It
A number of Math formulas
have been devised to try to represent the Bayesian way of explaining how we learn
a new theory of reality. They look complicated, but they really aren’t that
hard. I have chosen one of the more intuitive ones to use in my discussion of
the main theoretical flaw that critics think they see in Bayesian Confirmation
Theory.
The Bayesian model of
how a human’s way of thinking evolves can be broken down into a few basic
parts. When I, a typical human, examine a new way of explaining what I see
going on in the world, I’m considering a new theory. As I try to judge how true
and useful a picture of the world this new theory may give me, I look for ways
of testing it, ways that will show me decisively whether this new model/theory
of reality helps me to get good results, i.e. whether or not it works. I’m
trying to decide whether the theory will help me to understand events in my world
better and respond to them more effectively.
For Bayesians,
theories worth investigating always enable us to form more specific hypotheses
that can be tested in the real world. I can’t test the Theory of Gravitation by
manipulating Mars or the moon, but I can drop objects from towers here on Earth
to see whether they take as long to fall as the theory’s hypothesis for each
case leads me to expect. I can’t observe the evolution of the living world in
all its amazing detail for three billion years, but I can observe a particular
insect species that is being sprayed with a new pesticide every week and
predict, based on the Theory of Evolution and this species’ mutation rate, that
it will be immune to the pesticide by the end of this summer.
When I encounter a real-world
situation that will let me formulate a specific hypothesis based on the theory,
and then test that hypothesis, I tend to lean more toward believing the hypothesis
and the theory underlying it if it enables me to make accurate predictions. I lean
toward discarding it if the predictions it leads me to make keep failing to
turn out right. I am especially inclined to believe the hypothesis, and the theory
of reality it’s based on, if all my other theories are silent or inaccurate
when it comes to explaining the test results.
In short, I tend to
believe a new idea more and more if it explains what I see. This idea about how
we form new ideas, Bayes’ Theorem, can be expressed in a Math formula.
It is worth noting
again that this same process can occur in a whole nation when increasing
numbers of citizens become convinced that a new way of doing things is more
effective than the status-quo way. Popular ideas that work get followers who use
the idea to get more effective work done faster. Then, they multiply. Thus,
both individuals and societies learn and change by the Bayesian model.
In the case of a society,
the clusters of memories and theories about them that an individual sorts
through and shapes into his/her whole idea system are analogous to clusters of
citizens forming factions within society, each faction arguing for the way of
thinking it favors. The leaders of each faction search for reasoning and
evidence to support their positions. They do this in ways that are closely
analogous to the ways in which the varied ideas in one person’s mind struggle
to become the ones that the individual will use to handle life. The difference
is that a normal individual does not settle his internal debates by blinding
his right eye with his left hand. As individuals, we usually choose to set
aside unresolvable internal debates rather than let them make us crazy. On the
other hand, societies sometimes do harm themselves.
In societies,
factions sometimes work out their differences, reach consensus, and move on
without violence. But sometimes, as noted in the previous chapter, on core
value matters, they fight it out. Then violence settles the matter - whether
between factions in a society or between societies. But Bayesian calculations
are always in play in the minds of the participants, and these same
calculations almost always eventually dictate the outcome: one side wins and
the other side loses, gives in, and accepts large parts of the other’s culture.
The most extreme option, one tribe’s extermination of the other, is only rarely
the final outcome.
But let’s get back to
the flaw the critics see in Bayesian Confirmation Theory.
Suppose I am
considering a new way of explaining how some part of the world around me works.
The new way is usually called a theory. Then, suppose I see a way
to form a specific hypothesis based on the theory and I decide to do some
research to see whether real world results will provide me with evidence that
definitely relates to the matter I’m studying. What kind of process goes on in
my mind as I try to decide whether this new bit of evidence is making me more
likely to believe my hypothesis or less likely? This time of curiosity and testing,
for Bayesians, is the core of their model of how human knowledge grows.
Mathematically, Bayesian Theory can be represented if we set the following terms: let Pr(H/B) be the degree to which I trust the hypothesis H based just on the background beliefs that I had before I began to consider this new theory and the hypothesis it has led me to formulate. If the hypothesis and its theory seem like a fairly radical ones to me, then this term is going to be small. Maybe, less than 1%. Given my background beliefs, the new theory and its hypothesis may seem pretty far-fetched to me.
Then let Pr(E/B) be the degree to which I
expected to see this new evidence E
based only on my old familiar background models B of how reality works. This term will be quite small if, for
example, I see some evidence that at first, due to my old set of beliefs still
dominating my thinking, I can’t believe is real. None of my old background
knowledge B had prepared me for
seeing this evidence.
These terms are not
fractions in the normal sense. The forward slash in the way they’re written is
not working in its usual sense here. For example, the term Pr(H/B) is called my prior expectation. The term refers
to my estimate of the probability Pr
that the hypothesis H is correct if I
base that estimate only on how well the hypothesis fits my familiar old
set of background assumptions, B,
about reality. It doesn’t say anything like “hypothesis divided by background”.
The term Pr(E/H&B) means my estimate of the
probability that the evidence will happen if I assume just for the sake
of this term that my background assumptions and this new hypothesis are both true, i.e. if for a short while, I
try to think as if the hypothesis and its base theory are true.
The most important
part of the equation is Pr(H/E&B).
It represents how much I am starting to believe that the hypothesis H must be right, now that I’ve seen this
new evidence, all the while assuming that the evidence E is as I saw it, not an
illusion of some kind, and that the rest of my old beliefs B are still in
place.
Thus, the whole
probability formula that describes this relationship can be expressed in the
following way:
Pr(H/E&B) = Pr(E/H&B) X Pr(H/B)
Pr(E/B)
While this formula
looks daunting, it actually says something fairly simple. A new hypothesis that
I am trying to understand seems more likely to be correct the more I keep
encountering new evidence that the hypothesis can explain and that my old
models of reality can’t explain. When I set the values of these terms – probabilities
that we’d normally express as percentages – I will assume, for the time being,
that the evidence E is as I saw it, not a mistake or trick, and that I still
accept the rest of my background ideas, B,
about reality as being valid so that I can think and try to make sense of what
I’m seeing at all.
I more and more tend
to believe that a hypothesis is a true one the bigger Pr(E/H&B) gets and
the smaller Pr(E/B) gets.
In other words, I
increasingly tend to believe that a new way of explaining the world is true the
more it works to explain evidence I keep encountering in the world, and the
less I can explain that evidence if I don’t accept the new hypothesis and its
base theory.
So far, so
good.
perihelion procession of
Mercury (credit: Wikimedia Commons)
Now, all of this may
begin to seem intuitive, but once we have a formula set down it also is open to
attack, and the critics of Bayesianism see a flaw in it that they consider
fatal. The flaw they see is called “the problem of old evidence.”
One of the ways a new
hypothesis gets more respect among experts in the field the hypothesis covers
is by its ability to explain old evidence that old theories in the field have
been unable to explain. For example, physicists all over the world felt that
the probability that Einstein’s theory of relativity was right took a huge jump
upward when he used his theory to account for the regular changes in the orbit
of the planet Mercury – changes that were familiar to physicists, but that had
long defied explanation by the old Newtonian model of the universe.
The constant shift in
Mercury’s orbit had baffled astronomers since they had first acquired telescopes
that enabled them to detect that shift. The shift could not be explained by
pre-relativity models. But Relativity Theory could describe the gradual shift
and make predictions about it that were extremely accurate.
Other examples of
theories that worked to explain old evidence in many other branches of Science
could easily be listed. Kuhn gives lots of them.1
What is wrong with
Bayesianism, according to its critics, is that it can’t explain why we give
more credence to a theory when we realize it can be used to explain old
evidence that had long defied explanation by the established theories in the
field. When the formula above is applied in this situation, critics say Pr(E/B) has to be considered equal to
100 percent, or absolute certainty, since the old evidence E has been accepted
as real for a long time.
For the same reasons,
Pr(E/H&B) has to be thought of as
equal to 100 percent because the evidence has been reliably observed and
recorded many times – since long before we ever had this new theory to
consider.
When these two 100%
probabilities are put into the equation, it looks like this:
Pr(H/E&B) = Pr(H/B)
This new version of
the formula emerges because Pr(E/B) and Pr(E/H&B) are now both equal to 100
percent, or a probability of 1.0, and thus they can be cancelled out of the
equation. But that means that when I realize this new theory that I’m
considering adding to my mental programming can be used to explain some nagging
old problems in my field, my confidence in the new theory does not rise at all.
Or to put the matter another way, after seeing the new theory explain some
troubling old evidence, I trust the theory not one jot more than I did before I
realized it might explain that old evidence.
This is simply not
what happens in real life. When we suddenly realize that a new theory or model
can be used to solve some old problems that previously had been not solvable,
we are impressed and definitely more inclined to believe that this new theory
or model of reality is true.
Pasteur in his laboratory (artist: Eldelfeldt) (credit: Wikimedia
Commons)
An indifferent
reaction to a new theory’s being able to explain confusing old evidence is
simply not what happens in real life. When physicists around the world realized
that the Theory of Relativity could be used to explain the shift in the orbit
of Mercury, their confidence that the theory was correct shot up. Most humans
are not just persuaded but exhilarated when a new theory they are beginning to
understand gives them solutions to unsolved old problems.
Hence, the critics
say, Bayesianism is obviously not adequate as a way of describing human
thinking. It can’t account for some of the ways of thinking that we’re certain
we use. We do indeed test new theories against old, puzzling evidence all the
time, and we do feel much more impressed with a new theory if it can account
for that same evidence when all the old theories can’t.
The response in
defense of Bayesianism is complex, but not that complex. What the critics seem
not to grasp is the spirit of Bayesianism. In the deeply
Bayesian way of seeing reality and our relationship to it, everything in the
human mind is morphing and floating. The Bayesian picture of the mind sees us
as testing, reassessing, and restructuring all our ways of understanding reality
all the time.
In the formula above,
the term for my degree of confidence in the evidence, when I take only my
background beliefs as true – i.e. Pr(E/B)
– is never 100%. Not even for very familiar old evidence. Nor is
the term for my degree of confidence in the evidence if I include the hypothesis
in my set of mental assumptions – i.e. Pr(E/H&B)
– ever equal to 100%. I am never perfectly certain of anything, not of my
background assumptions and not even any physical evidence I have seen
repeatedly with my own eyes.
To closely consider
this situation in which a hypothesis is used to try to explain old evidence, we
need to examine the kinds of things that occur in the mind of a researcher in
both the situation in which the new hypothesis does fit the old evidence and
the one in which it doesn’t.
When a hypothesis
successfully explains some old evidence, what the researcher is affirming is
that, in the term Pr(E/H&B), the
evidence fits the hypothesis, the hypothesis fits the evidence, and the
background assumptions can be integrated with the hypothesis in a comprehensive
way. She is delighted to see that, if she commits to this hypothesis and the
theory underlying it, that will mean she can feel reassured that the old
evidence did happen in the way in which she and her colleagues observed it. In
short, she can feel reassured that they did the work well. She did not
make any mistakes. She did see what she thought she saw.
Sloppy observing is a
haunting fear for all scientists. It's nice to learn that you didn't mess up.
All these logical and
psychological factors raise her confidence that this new hypothesis and the
theory behind it must be right.
This insight into the
workings of Bayesian confirmation theory becomes even clearer when we consider
what the researcher does when she finds that a hypothesis does not successfully
account for the old evidence. In research, only rarely does a researcher in
this situation simply drop the new hypothesis. Instead, she examines the
hypothesis, the old evidence, and her background assumptions to see whether any
or all of them may be adjusted, using new concepts or new calculations
involving newly proposed variables or closer observations of the old evidence,
so that all the elements in the Bayesian equation may be brought into harmony
again. She gives the hypothesis really careful consideration. Every chance to
prove itself.
When the old evidence
is examined in light of the new hypothesis, if the hypothesis does successfully
explain that old evidence, the scientist’s confidence in the hypothesis and her
confidence in that old evidence both go up. Even if her prior confidence in
that old evidence was really high, she can now feel more confident that she and
her colleagues – even ones in the distant past – did observe that old evidence
correctly and did record their observations well.
The value of this
successful application of the new hypothesis to the old evidence may be small.
Perhaps it raises the E value in the term Pr(E/H&B) only a
fraction of 1 percent. But that is still a positive increase in the value of
the whole term and therefore a kind of proof of the explicative value, rather
than the predictive value, of the hypothesis being considered.
Meanwhile, Pr(H/E&B), i.e. the scientist’s
degree of confidence in the new hypothesis, also goes up another notch as a
result of the increase in her confidence in the evidence. A scientist, like all
of us, finds reassurance in the feeling of mental harmony she gets when more of
her perceptions, memories, and concepts about the world are brought into consonance with
each other. (She feels relieved as her cognitive dissonance now goes down.)
A human mind
experiences much cognitive dissonance when it keeps observing evidence that
does not fit any of its models. The person attempting to explain evidence that
is inconsistent with his world view, clings to his background beliefs and shuts
out the new theory his colleagues are discussing. He keeps insisting that this new
evidence can’t be correct. Some systemic error must be leading those other
researchers to think they have observed E, but they must be wrong. E is not what they say it is. “That can’t be right,” he says.
In the meantime, his
more subversive colleague down the hall, even if only in her own mind, is
arguing “I know what I saw. I know how careful I’ve been. E is right. Thus, the probability
of H, at least in my mind,
has grown. It’s such a relief to see a way out of all the cognitive dissonance
I’ve been experiencing for the last few months. I get it now. Wow, this feels
good!” Settling a score with a stubborn bit of old evidence that refused to fit
into any of a scientist’s models of reality is a bit like finally whipping a
bully who picked on her in elementary school – not really logical, but still
very satisfying.
Normally, testing a
new hypothesis involves performing an experiment that will generate new
evidence. If the experiment delivers evidence that was predicted by the
hypothesis, but not by our background concepts, then the hypothesis, as a way
of explaining the real world, seems more likely or probable to us. The new
evidence confirms the hypothesis. That’s Bayesianism and it
fits us exactly.
But I may also decide
to try to use a hypothesis and the theory it is based on to explain some
problematic old evidence. I’ll be studying whether the hypothesis and its
predictions do in fact fit the old evidence situations. If I find that the
hypothesis and the theory it is based on do successfully explain that
problematic old evidence, what I’m confirming is not just the hypothesis and the
theory it is based on, but also a new consistency between the evidence, the
hypothesis, and all, or nearly all, of my background set of concepts. (I likely
will have to drop a few of my old ways of thinking to make room for the new
theory.)
Levitation (Is it real?) (credit: Wikimedia Commons)
And no, it is not
obvious that evidence seen with my own eyes is 100 percent reliable, not even
if I’ve seen a particular phenomenon repeated many times. Neither my
longest-held, familiar background concepts nor the ordinary sense data I see in
everyday experiences are trusted that much. If they were, then I and anyone who
trusts gravity, light and human anatomy would be unable to watch a good magic
show without having a nervous breakdown. Elephants disappear, men float, and
women get sawn in half. By pure logic, if my most basic concepts were believed
at the 100 percent level, then either I would have to gouge my eyes out or go
mad.
But I know the magic
is all a trick of some kind. And I choose, for the duration of the show, to
suspend my desire to harmonize all my sense data with my set of background
concepts. It is supposed to be a performance of fun and wonder. If I explain
how the trick is done, I ruin my grandkids’ fun … and my own.
It’s important to
point out here that the idea behind H&B,
the set of the new hypothesis/theory plus my background concepts, is more
complex than the equation can capture. This part of the formula should be read:
“If I integrate the hypothesis into my whole background concept set.” The
formula attempts to capture in symbols something that is almost not capturable.
This is because the point of positing a hypothesis, H, is that it doesn’t fit neatly into my background set of
beliefs. It is built around a new way of comprehending reality, and thus, it
will only be fully integrated into my old background set of concepts and
beliefs if some of those concepts are adjusted, by careful, gradual tinkering,
and then, some are removed entirely.
Similarly, in the
term Pr(H/E&B), the E&B part is trying to capture
something no math term can capture. E&B
is trying to say: “If I take both the evidence and my set of background beliefs
to be 100% reliable.”
But that way of
stating the E&B part of the term merely highlights the issue of
problematic old evidence. This evidence is problematic because I can’t make
it consistent with my set of background concepts and beliefs, no matter how I
tinker with them.
All the whole formula
really does is try to capture the gist of human thinking and learning. It is a
useful portrayal, a kind of metaphor,
but we can’t become complacent about this formula for the Bayesian model of
human thinking and learning any more than we can become complacent about any of
our concepts. And that thought is consistent with the spirit of Bayesianism. It
tells us not to become too blindly attached to any of our concepts; any of them
may have to be radically updated and revised at any time.
Thus, on closer
examination, the criticism of Bayesianism which says the Bayesian model can’t
explain why we find a fit between a hypothesis and some problematic old
evidence reassuring turns out not to be a fatal criticism, but more a useful
tool, one that we may use to deepen our understanding of the Bayesian model of
human thinking.
We can hold onto the
Bayesian model if we accept that all the concepts, thought patterns, and
patterns of neuron firings in the brain – hypotheses, evidence, and assumed
background concepts – are forming, reforming, aligning, realigning, and
floating in and out of one another all the time – even concepts as basic as the
ones we have about gravity, matter, space, and time. This whole view of the
scary idea called “Bayesianism” arises if we simply apply Bayesianism to
itself.
In short, Bayesianism
says we keep adjusting our thinking until we die.
The Bayesian way of
thinking about our own thinking requires us to be willing to float all our
concepts, even our most deeply held ones. Some are more central, and we use
them more often with more confidence. A few we may believe almost absolutely.
But in the end, none of our concepts is irreplaceable.
For humans, the mind
is our means of surviving. It will adapt to almost anything. Let war, famine,
plague, economics, technology and so on do what they may, rattle living styles
and ways to their foundations. We choose to go on.
We gamble heavily on
the concepts we routinely use to organize our sense data and memories of sense
data. I use my concepts to organize the memories already stored in my brain and
the new sense data that are flooding into my brain all the time. I keep trying
to acquire more concepts – including concepts for organizing other concepts –
that will enable me to utilize my memories more efficiently to make faster and
better decisions and to act increasingly effectively. In this constant,
restless, searching mental life of mine, I never trust anything absolutely. If
I did, a simple magic show would mesmerize and paralyze me. Or reduce me to
catatonia.
But I choose to stand
by my concepts in almost every such case, not because I am certain they’re
perfect, but because they’ve been tested and found effective over so many
trials and for so long that I’m willing to keep gambling on them. At least
until someone proposes something even more promising to me. I don’t know for
certain that the theories of the real world that my culture has gained via the
scientific method are sure bets; they just seem very likely to be the most
promising options available to me now. And I need at least some theories about
reality in place every day. I have to see, recognize, and act. I can’t sit catatonic.
Harry Houdini with his
“disappearing” elephant, Jennie
(credit:
Wikimedia Commons)
Life is constantly
making demands on me to move and keep moving. I have to gamble on some models
of reality just to live my life; I go with my best horses, my most successful
and trusted concepts. And sometimes, I change my mind.
This flexibility on
my part is not weakness or lack of discipline; it is just life. Bayesianism tells
us Kuhn’s thesis in The Structure of Scientific Revolutions. We are
constantly adjusting all our concepts as we try to make our ways of dealing
with reality more effective.
And when a researcher
begins to grasp a new hypothesis and the theory it is based on, the resulting
experience is like a religious “awakening” – profound, even life-altering. Everything
changes when we accept a new model or theory because we change. How we perceive
and think changes. In order to “get it”, we have to change. We have to
eliminate some old beliefs from our familiar background belief set and
literally see in a new way.
And what of the
shifting nature of our view of reality and the gambling spirit that is implicit
in the Bayesian model? The general tone of all our mental experiences tells us
that this overall view of our world and ourselves – though it may seem scary or
maybe, for confident individuals, challenging – is just life.
We have now arrived
at a point where we can feel confident that Bayesianism gives us a good base on
which to build further reasoning. Solid enough to use and so to get on
with all the other thinking that must be done. It can answer its
critics – both those who attack it with real-world counterexamples and those
who attack it with pure logic. And it outperforms Rationalism and Empiricism
every time.
Bayesianism is not
logically unshakable. But in a sensible view of our world and ourselves,
Bayesianism serves well. First, because it makes sense when it is applied to our
real problem-solving behavior; second, because it works even when it is applied
to itself; third, because we must have a foundational belief of some kind in
place in order to get on with building a universal moral code; and, finally,
because – as was shown earlier – we have to build that code. That task is
required of us. Without it, we aren’t going to …anything.
We are now at a good
place to pause to summarize our case so far. The next chapter is devoted to
that summing up.
Notes
1.
Thomas Kuhn, The Structure of
Scientific Revolutions (Chicago: The University of Chicago Press, 3rd
ed., 1996).
No comments:
Post a Comment
What are your thoughts now? Comment and I will reply. I promise.