Saturday 30 November 2019


Chapter 10       The Second Attack On Bayesianism And A Response To It


A number of Math formulas have been devised to try to represent the Bayesian way of explaining how we learn a new theory of reality. They look complicated, but they really aren’t that hard. I have chosen one of the more intuitive ones to use in my discussion of the main theoretical flaw that critics think they see in Bayesian Confirmation Theory.

The Bayesian model of how a human’s way of thinking evolves can be broken down into a few basic parts. When I, a typical human, examine a new way of explaining what I see going on in the world, I’m considering a new theory. As I try to judge how true and useful a picture of the world this new theory may give me, I look for ways of testing it, ways that will show me decisively whether this new model/theory of reality helps me to get good results, i.e. whether or not it works. I’m trying to decide whether the theory will help me to understand events in my world better and respond to them more effectively.

For Bayesians, theories worth investigating always enable us to form more specific hypotheses that can be tested in the real world. I can’t test the Theory of Gravitation by manipulating Mars or the moon, but I can drop objects from towers here on Earth to see whether they take as long to fall as the theory’s hypothesis for each case leads me to expect. I can’t observe the evolution of the living world in all its amazing detail for three billion years, but I can observe a particular insect species that is being sprayed with a new pesticide every week and predict, based on the Theory of Evolution and this species’ mutation rate, that it will be immune to the pesticide by the end of this summer.

When I encounter a real-world situation that will let me formulate a specific hypothesis based on the theory, and then test that hypothesis, I tend to lean more toward believing the hypothesis and the theory underlying it if it enables me to make accurate predictions. I lean toward discarding it if the predictions it leads me to make keep failing to turn out right. I am especially inclined to believe the hypothesis, and the theory of reality it’s based on, if all my other theories are silent or inaccurate when it comes to explaining the test results. 

In short, I tend to believe a new idea more and more if it explains what I see. This idea about how we form new ideas, Bayes’ Theorem, can be expressed in a Math formula. 

It is worth noting again that this same process can occur in a whole nation when increasing numbers of citizens become convinced that a new way of doing things is more effective than the status-quo way. Popular ideas that work get followers who use the idea to get more effective work done faster. Then, they multiply. Thus, both individuals and societies learn and change by the Bayesian model.

In the case of a society, the clusters of memories and theories about them that an individual sorts through and shapes into his/her whole idea system are analogous to clusters of citizens forming factions within society, each faction arguing for the way of thinking it favors. The leaders of each faction search for reasoning and evidence to support their positions. They do this in ways that are closely analogous to the ways in which the varied ideas in one person’s mind struggle to become the ones that the individual will use to handle life. The difference is that a normal individual does not settle his internal debates by blinding his right eye with his left hand. As individuals, we usually choose to set aside unresolvable internal debates rather than let them make us crazy. On the other hand, societies sometimes do harm themselves.

In societies, factions sometimes work out their differences, reach consensus, and move on without violence. But sometimes, as noted in the previous chapter, on core value matters, they fight it out. Then violence settles the matter - whether between factions in a society or between societies. But Bayesian calculations are always in play in the minds of the participants, and these same calculations almost always eventually dictate the outcome: one side wins and the other side loses, gives in, and accepts large parts of the other’s culture. The most extreme option, one tribe’s extermination of the other, is only rarely the final outcome.

But let’s get back to the flaw the critics see in Bayesian Confirmation Theory.

Suppose I am considering a new way of explaining how some part of the world around me works. The new way is usually called a theory. Then, suppose I see a way to form a specific hypothesis based on the theory and I decide to do some research to see whether real world results will provide me with evidence that definitely relates to the matter I’m studying. What kind of process goes on in my mind as I try to decide whether this new bit of evidence is making me more likely to believe my hypothesis or less likely? This time of curiosity and testing, for Bayesians, is the core of their model of how human knowledge grows. 

Mathematically, Bayesian Theory can be represented if we set the following terms: let Pr(H/B) be the degree to which I trust the hypothesis H based just on the background beliefs that I had before I began to consider this new theory and the hypothesis it has led me to formulate. If the hypothesis and its theory seem like a fairly radical ones to me, then this term is going to be small. Maybe, less than 1%. Given my background beliefs, the new theory and its hypothesis may seem pretty far-fetched to me.

Then let Pr(E/B) be the degree to which I expected to see this new evidence E based only on my old familiar background models B of how reality works. This term will be quite small if, for example, I see some evidence that at first, due to my old set of beliefs still dominating my thinking, I can’t believe is real. None of my old background knowledge B had prepared me for seeing this evidence.

These terms are not fractions in the normal sense. The forward slash in the way they’re written is not working in its usual sense here. For example, the term Pr(H/B) is called my prior expectation. The term refers to my estimate of the probability Pr that the hypothesis H is correct if I base that estimate only on how well the hypothesis fits my familiar old set of background assumptions, B, about reality. It doesn’t say anything like “hypothesis divided by background”.

The term Pr(E/H&B) means my estimate of the probability that the evidence will happen if I assume just for the sake of this term that my background assumptions and this new hypothesis are both true, i.e. if for a short while, I try to think as if the hypothesis and its base theory are true.

The most important part of the equation is Pr(H/E&B). It represents how much I am starting to believe that the hypothesis H must be right, now that I’ve seen this new evidence, all the while assuming that the evidence E is as I saw it, not an illusion of some kind, and that the rest of my old beliefs B are still in place.

Thus, the whole probability formula that describes this relationship can be expressed in the following way:             


                        
                               Pr(H/E&B) =  Pr(E/H&B)    X    Pr(H/B)
                                                                               Pr(E/B)  
          


While this formula looks daunting, it actually says something fairly simple. A new hypothesis that I am trying to understand seems more likely to be correct the more I keep encountering new evidence that the hypothesis can explain and that my old models of reality can’t explain. When I set the values of these terms – probabilities that we’d normally express as percentages – I will assume, for the time being, that the evidence E is as I saw it, not a mistake or trick, and that I still accept the rest of my background ideas, B, about reality as being valid so that I can think and try to make sense of what I’m seeing at all.

I more and more tend to believe that a hypothesis is a true one the bigger Pr(E/H&B) gets and the smaller Pr(E/B) gets.

In other words, I increasingly tend to believe that a new way of explaining the world is true the more it works to explain evidence I keep encountering in the world, and the less I can explain that evidence if I don’t accept the new hypothesis and its base theory.

So far, so good. 




Spacely bound orbit in the Schwarzschild space-time surrounding the sun. The precession of the perihelion is clearly visible. 

                            

             perihelion procession of Mercury (credit: Wikimedia Commons) 

Now, all of this may begin to seem intuitive, but once we have a formula set down it also is open to attack, and the critics of Bayesianism see a flaw in it that they consider fatal. The flaw they see is called “the problem of old evidence.”

One of the ways a new hypothesis gets more respect among experts in the field the hypothesis covers is by its ability to explain old evidence that old theories in the field have been unable to explain. For example, physicists all over the world felt that the probability that Einstein’s theory of relativity was right took a huge jump upward when he used his theory to account for the regular changes in the orbit of the planet Mercury – changes that were familiar to physicists, but that had long defied explanation by the old Newtonian model of the universe.

The constant shift in Mercury’s orbit had baffled astronomers since they had first acquired telescopes that enabled them to detect that shift. The shift could not be explained by pre-relativity models. But Relativity Theory could describe the gradual shift and make predictions about it that were extremely accurate.

Other examples of theories that worked to explain old evidence in many other branches of Science could easily be listed. Kuhn gives lots of them.1

What is wrong with Bayesianism, according to its critics, is that it can’t explain why we give more credence to a theory when we realize it can be used to explain old evidence that had long defied explanation by the established theories in the field. When the formula above is applied in this situation, critics say Pr(E/B) has to be considered equal to 100 percent, or absolute certainty, since the old evidence E has been accepted as real for a long time.

For the same reasons, Pr(E/H&B) has to be thought of as equal to 100 percent because the evidence has been reliably observed and recorded many times – since long before we ever had this new theory to consider.

When these two 100% probabilities are put into the equation, it looks like this:


                                             Pr(H/E&B) = Pr(H/B)


This new version of the formula emerges because Pr(E/B) and Pr(E/H&B) are now both equal to 100 percent, or a probability of 1.0, and thus they can be cancelled out of the equation. But that means that when I realize this new theory that I’m considering adding to my mental programming can be used to explain some nagging old problems in my field, my confidence in the new theory does not rise at all. Or to put the matter another way, after seeing the new theory explain some troubling old evidence, I trust the theory not one jot more than I did before I realized it might explain that old evidence.

This is simply not what happens in real life. When we suddenly realize that a new theory or model can be used to solve some old problems that previously had been not solvable, we are impressed and definitely more inclined to believe that this new theory or model of reality is true. 


                      

        Pasteur in his laboratory (artist: Eldelfeldt) (credit: Wikimedia Commons)



An indifferent reaction to a new theory’s being able to explain confusing old evidence is simply not what happens in real life. When physicists around the world realized that the Theory of Relativity could be used to explain the shift in the orbit of Mercury, their confidence that the theory was correct shot up. Most humans are not just persuaded but exhilarated when a new theory they are beginning to understand gives them solutions to unsolved old problems. 

Hence, the critics say, Bayesianism is obviously not adequate as a way of describing human thinking. It can’t account for some of the ways of thinking that we’re certain we use. We do indeed test new theories against old, puzzling evidence all the time, and we do feel much more impressed with a new theory if it can account for that same evidence when all the old theories can’t.

The response in defense of Bayesianism is complex, but not that complex. What the critics seem not to grasp is the spirit of Bayesianism. In the deeply Bayesian way of seeing reality and our relationship to it, everything in the human mind is morphing and floating. The Bayesian picture of the mind sees us as testing, reassessing, and restructuring all our ways of understanding reality all the time.

In the formula above, the term for my degree of confidence in the evidence, when I take only my background beliefs as true – i.e. Pr(E/B) – is never 100%. Not even for very familiar old evidence. Nor is the term for my degree of confidence in the evidence if I include the hypothesis in my set of mental assumptions – i.e. Pr(E/H&B) – ever equal to 100%. I am never perfectly certain of anything, not of my background assumptions and not even any physical evidence I have seen repeatedly with my own eyes.

To closely consider this situation in which a hypothesis is used to try to explain old evidence, we need to examine the kinds of things that occur in the mind of a researcher in both the situation in which the new hypothesis does fit the old evidence and the one in which it doesn’t.

When a hypothesis successfully explains some old evidence, what the researcher is affirming is that, in the term Pr(E/H&B), the evidence fits the hypothesis, the hypothesis fits the evidence, and the background assumptions can be integrated with the hypothesis in a comprehensive way. She is delighted to see that, if she commits to this hypothesis and the theory underlying it, that will mean she can feel reassured that the old evidence did happen in the way in which she and her colleagues observed it. In short, she can feel reassured that they did the work well. She did not make any mistakes. She did see what she thought she saw.

Sloppy observing is a haunting fear for all scientists. It's nice to learn that you didn't mess up.  

All these logical and psychological factors raise her confidence that this new hypothesis and the theory behind it must be right.

This insight into the workings of Bayesian confirmation theory becomes even clearer when we consider what the researcher does when she finds that a hypothesis does not successfully account for the old evidence. In research, only rarely does a researcher in this situation simply drop the new hypothesis. Instead, she examines the hypothesis, the old evidence, and her background assumptions to see whether any or all of them may be adjusted, using new concepts or new calculations involving newly proposed variables or closer observations of the old evidence, so that all the elements in the Bayesian equation may be brought into harmony again. She gives the hypothesis really careful consideration. Every chance to prove itself.

When the old evidence is examined in light of the new hypothesis, if the hypothesis does successfully explain that old evidence, the scientist’s confidence in the hypothesis and her confidence in that old evidence both go up. Even if her prior confidence in that old evidence was really high, she can now feel more confident that she and her colleagues – even ones in the distant past – did observe that old evidence correctly and did record their observations well.

The value of this successful application of the new hypothesis to the old evidence may be small. Perhaps it raises the E value in the term Pr(E/H&B) only a fraction of 1 percent. But that is still a positive increase in the value of the whole term and therefore a kind of proof of the explicative value, rather than the predictive value, of the hypothesis being considered.

Meanwhile, Pr(H/E&B), i.e. the scientist’s degree of confidence in the new hypothesis, also goes up another notch as a result of the increase in her confidence in the evidence. A scientist, like all of us, finds reassurance in the feeling of mental harmony she gets when more of her perceptions, memories, and concepts about the world are brought into consonance with each other. (She feels relieved as her cognitive dissonance now goes down.)

A human mind experiences much cognitive dissonance when it keeps observing evidence that does not fit any of its models. The person attempting to explain evidence that is inconsistent with his world view, clings to his background beliefs and shuts out the new theory his colleagues are discussing. He keeps insisting that this new evidence can’t be correct. Some systemic error must be leading those other researchers to think they have observed E, but they must be wrong. E is not what they say it is. “That can’t be right,” he says.

In the meantime, his more subversive colleague down the hall, even if only in her own mind, is arguing “I know what I saw. I know how careful I’ve been. E is right. Thus, the probability of H, at least in my mind, has grown. It’s such a relief to see a way out of all the cognitive dissonance I’ve been experiencing for the last few months. I get it now. Wow, this feels good!” Settling a score with a stubborn bit of old evidence that refused to fit into any of a scientist’s models of reality is a bit like finally whipping a bully who picked on her in elementary school – not really logical, but still very satisfying.

Normally, testing a new hypothesis involves performing an experiment that will generate new evidence. If the experiment delivers evidence that was predicted by the hypothesis, but not by our background concepts, then the hypothesis, as a way of explaining the real world, seems more likely or probable to us. The new evidence confirms the hypothesis. That’s Bayesianism and it fits us exactly.

But I may also decide to try to use a hypothesis and the theory it is based on to explain some problematic old evidence. I’ll be studying whether the hypothesis and its predictions do in fact fit the old evidence situations. If I find that the hypothesis and the theory it is based on do successfully explain that problematic old evidence, what I’m confirming is not just the hypothesis and the theory it is based on, but also a new consistency between the evidence, the hypothesis, and all, or nearly all, of my background set of concepts. (I likely will have to drop a few of my old ways of thinking to make room for the new theory.)



   File:Levitaatio.jpg
  
                               Levitation (Is it real?) (credit: Wikimedia Commons)



And no, it is not obvious that evidence seen with my own eyes is 100 percent reliable, not even if I’ve seen a particular phenomenon repeated many times. Neither my longest-held, familiar background concepts nor the ordinary sense data I see in everyday experiences are trusted that much. If they were, then I and anyone who trusts gravity, light and human anatomy would be unable to watch a good magic show without having a nervous breakdown. Elephants disappear, men float, and women get sawn in half. By pure logic, if my most basic concepts were believed at the 100 percent level, then either I would have to gouge my eyes out or go mad. 

But I know the magic is all a trick of some kind. And I choose, for the duration of the show, to suspend my desire to harmonize all my sense data with my set of background concepts. It is supposed to be a performance of fun and wonder. If I explain how the trick is done, I ruin my grandkids’ fun … and my own.

It’s important to point out here that the idea behind H&B, the set of the new hypothesis/theory plus my background concepts, is more complex than the equation can capture. This part of the formula should be read: “If I integrate the hypothesis into my whole background concept set.” The formula attempts to capture in symbols something that is almost not capturable. This is because the point of positing a hypothesis, H, is that it doesn’t fit neatly into my background set of beliefs. It is built around a new way of comprehending reality, and thus, it will only be fully integrated into my old background set of concepts and beliefs if some of those concepts are adjusted, by careful, gradual tinkering, and then, some are removed entirely.

Similarly, in the term Pr(H/E&B), the E&B part is trying to capture something no math term can capture. E&B is trying to say: “If I take both the evidence and my set of background beliefs to be 100% reliable.” 

But that way of stating the E&B part of the term merely highlights the issue of problematic old evidence. This evidence is problematic because I can’t make it consistent with my set of background concepts and beliefs, no matter how I tinker with them.

All the whole formula really does is try to capture the gist of human thinking and learning. It is a useful portrayal,  a kind of metaphor, but we can’t become complacent about this formula for the Bayesian model of human thinking and learning any more than we can become complacent about any of our concepts. And that thought is consistent with the spirit of Bayesianism. It tells us not to become too blindly attached to any of our concepts; any of them may have to be radically updated and revised at any time.

Thus, on closer examination, the criticism of Bayesianism which says the Bayesian model can’t explain why we find a fit between a hypothesis and some problematic old evidence reassuring turns out not to be a fatal criticism, but more a useful tool, one that we may use to deepen our understanding of the Bayesian model of human thinking. 

We can hold onto the Bayesian model if we accept that all the concepts, thought patterns, and patterns of neuron firings in the brain – hypotheses, evidence, and assumed background concepts – are forming, reforming, aligning, realigning, and floating in and out of one another all the time – even concepts as basic as the ones we have about gravity, matter, space, and time. This whole view of the scary idea called “Bayesianism” arises if we simply apply Bayesianism to itself.

In short, Bayesianism says we keep adjusting our thinking until we die. 

The Bayesian way of thinking about our own thinking requires us to be willing to float all our concepts, even our most deeply held ones. Some are more central, and we use them more often with more confidence. A few we may believe almost absolutely. But in the end, none of our concepts is irreplaceable.

For humans, the mind is our means of surviving. It will adapt to almost anything. Let war, famine, plague, economics, technology and so on do what they may, rattle living styles and ways to their foundations. We choose to go on.

We gamble heavily on the concepts we routinely use to organize our sense data and memories of sense data. I use my concepts to organize the memories already stored in my brain and the new sense data that are flooding into my brain all the time. I keep trying to acquire more concepts – including concepts for organizing other concepts – that will enable me to utilize my memories more efficiently to make faster and better decisions and to act increasingly effectively. In this constant, restless, searching mental life of mine, I never trust anything absolutely. If I did, a simple magic show would mesmerize and paralyze me. Or reduce me to catatonia.

But I choose to stand by my concepts in almost every such case, not because I am certain they’re perfect, but because they’ve been tested and found effective over so many trials and for so long that I’m willing to keep gambling on them. At least until someone proposes something even more promising to me. I don’t know for certain that the theories of the real world that my culture has gained via the scientific method are sure bets; they just seem very likely to be the most promising options available to me now. And I need at least some theories about reality in place every day. I have to see, recognize, and act. I can’t sit catatonic.



                            File:Houdini-Elephant.jpg
                             
                              Harry Houdini with his “disappearing” elephant, Jennie 
                                             (credit: Wikimedia Commons)



Life is constantly making demands on me to move and keep moving. I have to gamble on some models of reality just to live my life; I go with my best horses, my most successful and trusted concepts. And sometimes, I change my mind.

This flexibility on my part is not weakness or lack of discipline; it is just life. Bayesianism tells us Kuhn’s thesis in The Structure of Scientific Revolutions. We are constantly adjusting all our concepts as we try to make our ways of dealing with reality more effective.

And when a researcher begins to grasp a new hypothesis and the theory it is based on, the resulting experience is like a religious “awakening” – profound, even life-altering. Everything changes when we accept a new model or theory because we change. How we perceive and think changes. In order to “get it”, we have to change. We have to eliminate some old beliefs from our familiar background belief set and literally see in a new way.

And what of the shifting nature of our view of reality and the gambling spirit that is implicit in the Bayesian model? The general tone of all our mental experiences tells us that this overall view of our world and ourselves – though it may seem scary or maybe, for confident individuals, challenging – is just life.

We have now arrived at a point where we can feel confident that Bayesianism gives us a good base on which to build further reasoning. Solid enough to use and so to get on with all the other thinking that must be done. It can answer its critics – both those who attack it with real-world counterexamples and those who attack it with pure logic. And it outperforms Rationalism and Empiricism every time.

Bayesianism is not logically unshakable. But in a sensible view of our world and ourselves, Bayesianism serves well. First, because it makes sense when it is applied to our real problem-solving behavior; second, because it works even when it is applied to itself; third, because we must have a foundational belief of some kind in place in order to get on with building a universal moral code; and, finally, because – as was shown earlier – we have to build that code. That task is required of us. Without it, we aren’t going to …anything.

We are now at a good place to pause to summarize our case so far. The next chapter is devoted to that summing up.




Notes

1.       Thomas Kuhn, The Structure of Scientific Revolutions (Chicago: The University of Chicago Press, 3rd ed., 1996).








No comments:

Post a Comment

What are your thoughts now? Comment and I will reply. I promise.