#1 The Case of Dr. Semmelweis
Carl Hempel (Philosopher of Science)

As a simple illustration of some important aspects of scientific inquiry let us consider Semmelweis' work on childbed fever. Ignaz Semmelweis [1818-1865], a physician of Hungarian birth, did this work during the years from 1844 to 1848 at the Vienna General Hospital. As a member of the medical staff of the First Maternity Division in the hospital, Semmeiweis was distressed to find that a large proportion of the women who were delivered of their babies in that division contracted a serious and often fatal illness known as puerperal fever or childbed fever. In 1844, as many as 260 out of 3,157 mothers in the First Division, or 8.2 per cent, died of the disease; for 1845, the death rate was 6.8 per cent, and for 1846, it was 11.4 per cent. These figures were all the more alarming because in the adjacent Second Maternity Division of the same hospital, which accommodated almost as many women as the First, the death toll from childbed fever was much lower: 2.3, 2.0, and 2.7 per cent for the same years. In a book that he wrote later on the causation and the prevention of childbed fever, Semmelweis describes his efforts to resolve the dreadful puzzle.

He began by considering various explanations that were current at the time; some of these he rejected out of hand as incompatible with well-established facts; others he subjected to specific tests. One widely accepted view attributed the ravages of puerperal fever to "epidemic influences", which were vaguely described as "atmospheric-cosmic-telluric changes" spreading over whole districts and causing childbed fever in women in confinement. But how, Semmelweis reasons, could such influences have plagued the First Division for years and yet spared the Second? And how could this view be reconciled with the fact that while the fever was raging in the hospital, hardly a case occurred in the city of Vienna or in its surroundings: a genuine epidemic, such as cholera, would not be so selective. Finally, Semmelweis notes that some of the women admitted to the First Division, living far from the hospital, had been overcome by labor on their way and had given birth in the street: yet despite these adverse conditions, the death rate from childbed fever among these cases of "street birth" was lower than the average for the First Division.

On another view, overcrowding was a cause of mortality in the First Division. But Semmelweis points out that in fact the crowding was heavier in the Second Division, partly as a result of the desperate efforts of patients to avoid assignment to the notorious First Division. He also rejects two similar conjectures that were current, by noting that there were no differences between the two Divisions in regard to diet or general care of the patients. In 1846, a commission that had been appointed to investigate the matter attributed the prevalence of illness in the First Division to injuries resulting from rough examination by the medical students, all of whom received their obstetrical training in the First Division. Semmelweis notes in refutation of this view that (a) the injuries resulting naturally from the process of birth are much more extensive than those that might be caused by rough examination; (b) the midwives who received their training in the Second Division examined their patients in much the same manner but without the same ill effects; (c) when, in response to the commission's report, the number of medical students was halved and their examinations of the women were reduced to a minimum, the mortality, after a brief decline, rose to higher levels than ever before.

Various psychological explanations were attempted. One of them noted that the First Division was so arranged that a priest bearing the last sacrament to a dying woman had to pass through five wards before reaching the sickroom beyond: the appearance of the priest, preceded by an attendant ringing a bell, was held to have a terrifying and debilitating effect upon the patients in the wards and thus to make them more likely victims of childbed fever. In the Second Division, this adverse factor was absent, since the priest had direct access to the sickroom. Semmelweis decided to test this conjecture. He persuaded the priest to come by a roundabout route and without ringing of the bell, in order to reach the sick chamber silently and unobserved. But the mortality in the First Division did not decrease. A new idea was suggested to Semmelweis by the observation that in the First Division the women were delivered lying on their backs; in the Second Division, on their sides. Though he thought it unlikely, he decided "like a drowning man clutching at a straw, to test whether this difference in procedure was significant. He introduced the use of the lateral position in the First Division, but again, the mortality remained unaffected.

At last, early in 1847, an accident gave Semmelweis the decisive clue for his solution of the problem. A colleague of his, Kolletschka, received a puncture wound in the finger, from the scalpel of a student with whom he was performing an autopsy, and died after an agonizing illness during which he displayed the same symptoms that Semmelweis had observed in the victims of childbed fever. Although the role of microorganisms in such infections had not yet been recognized at the time, Semmelweis realized that "cadaveric matter" which the student's scalpel had introduced into Kolletschka's blood stream had caused his colleague's fatal illness. And the similarities between the course of Kolletschka's disease and that of the women in his clinic led Semmelweis to the conclusion that his patients had died of the same kind of blood poisoning: he, his colleagues, and the medical students had been the carriers of the infectious material, for he and his associates used to come to the wards directly from performing dissections in the autopsy room, and examine the women in labor after only superficially washing their hands, which often retained a characteristic foul odor.

Again, Semmelweis put his idea to a test. He reasoned that if he were right, then childbed fever could be prevented by chemically destroying the infectious material adhering to the hands. He therefore issued an order requiring all medical students to wash their hands in a solution of chlorinated lime before making an examination. The mortality from childbed fever promptly began to decrease, and for the year 1848 it fell to 1.27 per cent in the First Division, compared to 1.33 in the Second. In further support of his idea, or of his hypothesis, as we will also say, Semmelweis notes that it accounts for the fact that the mortality in the Second Division consistently was so much lower: the patients there were attended by midwives, whose training did not include anatomical instruction by dissection of cadavers. The hypothesis also explained the lower mortality among "street births": women who arrived with babies in arms were rarely examined after admission and thus had a better chance of escaping infection. Similarly, the hypothesis accounted for the fact that the victims of childbed fever among the newborn babies were all among those whose mothers had contracted the disease during labor; for then the infection could be transmitted to the baby before birth, through the common bloodstream of mother and child, whereas this was impossible when the mother remained healthy.

Further clinical experiences soon led Semmelweis to broaden his hypothesis. On one occasion, for example, he and his associates, having carefully disinfected their hands, examined first a woman in labor who was suffering from a festering cervical cancer; then they proceeded to examine twelve other women in the same room, after only routine washing without renewed disinfection. Eleven of the twelve patients died of puerperal fever. Semmelweis concluded that childbed fever can be caused not only by cadaveric material, but also by "putrid matter derived from living organisms."

We have seen how, in his search for the cause of childbed fever, Semmelweis examined various hypotheses that had been suggested as possible answers. How such hypotheses are arrived at in the first place is an intriguing question which we will consider later. First, however, let us examine how a hypothesis, once proposed, is tested. Sometimes, the procedure is quite direct. Consider the conjectures that differences in crowding, or in diet, or in general care account for the difference in mortality between the two divisions. As Semmelweis points out, these conflict with readily observable facts. There are no such differences between the divisions; the hypotheses are therefore rejected as false. But usually the test will be less simple and straightforward. Take the hypothesis attributing the high mortality in the First Division to the dread evoked by the appearance of the priest with his attendant. The intensity of that dread, and especially its effect upon childbed fever, are not as directly ascertainable as are differences in crowding or in diet, and Semmelweis uses an indirect method of testing. He asks himself: Are there any readily observable effects that should occur if the hypothesis were true? And he reasons: If the hypothesis were true, then an appropriate change in the priest's procedure should be followed by a decline in fatalities. He checks this implication by a simple experiment and finds it false, and he therefore rejects the hypothesis.

Similarly, to test his conjecture about the position of the women during delivery, he reasons: If this conjecture should be true, then adoption of the lateral position in the First Division will reduce the mortality. Again, the implication is shown false by his experiment, and the conjecture is discarded. In the last two cases, the test is based on an argument to the effect that if the contemplated hypothesis, say H, is true, then certain observable events (e.g., decline in mortality) should occur under specified circumstances (e.g., if the priest refrains from walking through the wards, or if the women are delivered in lateral position); or briefly, if H is true, then so is I, where I is a statement describing the observable occurrences to be expected. For convenience, let us say that I is inferred from, or implied by, H; and let us call I a test implication of the hypothesis H. In our last two examples, experiments show the test implication to be false, and the hypothesis is accordingly rejected. The reasoning that leads to the rejection may be schematized as follows:

If H is true, then so is I.
2 b, But (as the evidence shows) I is not true.
H is not true.

Any argument of this form . . . is deductively valid; that is, if its premisses (the sentences above the horizontal line) are true, then its conclusion (the sentence below the horizontal line) is unfailingly true as well. Hence, if the premisses of (2a) are properly established, the hypothesis H that is being tested must indeed be rejected.

Next, let us consider the case where observation or experiment bears out the test implication I. From his hypothesis that childbed fever is blood poisoning produced by cadaveric matter, Semmelweis infers that suitable antiseptic measures will reduce fatalities from the disease. This time, experiment shows the test implication to he true. But this favorable outcome does not conclusively prove the hypothesis true, for the underlying argument would have the form

If H is true, then so is I.
2b, (As the evidence shows) I is true.
H is true.

And this mode of reasoning, which is referred to as the fallacy of affirming the consequent, is deductively invalid, that is, its conclusion may be false even if its premisses are true. This is in fact illustrated by Semmelweis' own experience. The initial version of his account of childbed fever as a form of blood poisoning presented infection with cadaveric matter essentially as the one and only source of the disease; and he was right in reasoning that if this hypothesis should he true, then destruction of cadaveric particles by antiseptic washing should reduce the mortality. Furthermore, his experiment did show the test implication to be true. Hence, in this case, the premisses of (2b) were both true. Yet, his hypothesis was false, for as he later discovered, putrid material from living organisms, too, could produce childbed fever.

Thus, the favorable outcome of a test, i.e., the fact that a test implication inferred from a hypothesis is found to be true, does not prove the hypothesis to be true. Even if many implications of a hypothesis have been borne out by careful tests, the hypothesis may still be false. The following argument still commits the fallacy of affirming the consequent:

If H is true, then so are I, I, .... I
2c)
(As the evidence shows ) I, I, .... I are all true.
H is true.

This, too, can be illustrated by reference to Semmelweis' final hypothesis in its first version. As we noted earlier, his hypothesis also yields the test implications that among cases of street births admitted to the First Division, mortality from puerperal fever should be below the average for the Division, and that infants of mothers who escape the illness do not contract childbed fever; and these implications, too, were borne out by the evidence -- even though the first version of the final hypothesis was false. But the observation that a favorable outcome of however many tests does not afford conclusive proof for a hypothesis should not lead us to think that if we have subjected a hypothesis to a number of tests and all of them have had a favorable outcome, we are no better off than if we had not tested the hypothesis at all. For each of our tests might conceivably have had an unfavorable outcome and might have led to the rejection of the hypothesis. A set of favorable results obtained by testing different test implications, I, I, .... I, of a hypothesis, shows that as far as these particular implications are concerned, the hypothesis has been borne out; and while this result does not afford a complete proof of the hypothesis, it provides at least some support, some partial corroboration or confirmation for it. The extent of this support will depend on various aspects of the hypothesis and of the test data.

We have considered [a] scientific [investigation] in which a problem was tackled by proposing tentative answers in the form of hypotheses that were then tested by deriving from them suitable test implications and checking these by observation or experiment. But how are suitable hypotheses arrived at in the first place? It is sometimes held that they are inferred from antecedently collected data by means of a procedure called inductive inference ... ... . The idea that in scientific inquiry, inductive inference from antecedently collected data leads to appropriate general principles is clearly embodied in the following account of how a scientist would ideally proceed:

If we try to imagine how a mind of superhuman power and reach, but normal so far as the logical processes of its thought are concerned, ... would use the scientific method, the process would be as follows: First, all facts would be observed and recorded, without selection or a priori guess as to their relative importance. Secondly, the observed and recorded facts would he analyzed, compared, and classified, without hypothesis or postulates other than those necessarily involved in the logic of thought. Third, from this analysis of the facts generalizations would be inductively drawn as to the relations, classificatory or causal, between them. Fourth, further research would be deductive as well as inductive, employing inferences from previously established generalizations.

This passage distinguishes four stages in an ideal scientific inquiry: (1) observation and recording of all facts, (2) analysis and classification of these facts, (3) inductive derivation of generalizations from them, and (4) further testing of the generalizations. The first two of these stages are specifically assumed not to make use of any guesses or hypotheses as to how the observed facts might be interconnected; this restriction seems to have been imposed in the belief that such preconceived ideas would introduce a bias and would jeopardize the scientific objectivity of the investigation. But the view expressed in the quoted passage -- I will call it the narrow inductivist conception of scientific inquiry -- is untenable, for several reasons. A brief survey of these can serve to amplify and to supplement our earlier remarks on scientific procedure.

First, a scientific investigation as here envisaged could never get off the ground. Even its first phase could never be carried out, for a collection of all the facts would have to await the end of the world, so to speak; and even all the facts up to now cannot be collected, since there are an infinite number and variety of them. Are we to examine, for example, all the grains of sand in all the deserts and on all the beaches, and are we to record their shapes, their weights, their chemical composition, their distances from each other, their constantly changing temperature, and their equally changing distance from the center of the moon? Are we to record the floating thoughts that cross our minds in the tedious process? The shapes of the clouds overhead, the changing color of the sky? The construction and the trade name of our writing equipment? Our own life histories and those of our fellow investigators? All these, and untold other things, are, after all, among "all the facts up to now". Perhaps, then, all that should be required in the first phase is that all the relevant facts be collected. But relevant to what? Though the author does not mention this, let us suppose that the inquiry is concerned with a specified problem. Should we not then begin by collecting all the facts -- or better, all available data-relevant to that problem? This notion still makes no clear sense. Semmelweis sought to solve one specific problem, yet he collected quite different kinds of data at different stages of his inquiry. And rightly so; for what particular sorts of data it is reasonable to collect is not determined by the problem under study, but by a tentative answer to it that the investigator entertains in the form of a conjecture or hypothesis. Given the conjecture that mortality from childbed fever was increased bv the terrifying appearance of the priest and his attendant with the death hell, it was relevant to collect data on the consequences of having the priest change his routine; hut it would have been totally irrelevant to check what would happen if doctors and students disinfected their hands before examining their patients. With respect to Semmelweis' eventual contamination hypothesis, data of the latter kind were clearly relevant, and those of the former kind totally irrelevant. Empirical "facts" or findings, therefore, can be qualified as logically relevant or irrelevant only in reference to a given hypothesis, but not in reference to a given problem ... .

In sum, the maxim that data should be gathered without guidance by antecedent hypotheses about the connections among the facts under study is self-defeating, and it is certainly not followed in scientific inquiry. On the contrary, tentative hypotheses are needed to give direction to a scientific investigation. Such hypotheses determine, among other things, what data should be collected at a given point in a scientific investigation. It is of interest to note that social scientists trying to check a hypothesis by reference to the vast store of facts recorded by the U.S. Bureau of the Census, or by other data-gathering organizations, sometimes find to their disappointment that the values of some variable that plays a central role in the hypothesis have nowhere been systematically recorded. This remark is not, of course, intended as a criticism of data gathering: those engaged in the process no doubt try to select facts that might prove relevant to future hypotheses; the observation is simply meant to illustrate the impossibility of collecting all the relevant data" without knowledge of the hypotheses to which the data are to have relevance.

The second stage envisaged in our quoted passage is open to similar criticism. A set of empirical "facts" can be analyzed and classified in many different ways, most of which will be unilluminating for the purposes of a given inquiry. Semmelweis could have classified the women in the maternity wards according to criteria such as age, place of residence, marital status, dietary habits, and so forth; but information on these would have provided no clue to a patient's prospects of becoming a victim of childbed fever. What Semmelweis sought were criteria that would be significantly connected with those prospects; and for this purpose, as he eventually found, it was illuminating to single out those women who were attended by medical personnel with contaminated hands; for it was with this characteristic, or with the corresponding class of patients, that high mortality from childbed fever was associated. Thus, if a particular way of analyzing and classifying empirical findings is to lead to an explanation of the phenomena concerned, then it must be based on hypotheses about how those phenomena are connected; without such hypotheses, analysis and classification are blind.

Our critical reflections on the first two stages of inquiry as envisaged in the quoted passage also undercut the notion that hypotheses are introduced only in the third stage, by inductive inference from antecedently collected data ... ... . There are ... no generally applicable "rules of induction~, by which hypotheses or theories can be mechanically derived or inferred from empirical data. The transition from data to theory requires creative imagination. Scientific hypotheses and theories are not derived from observed facts, but invented in order to account for them. They constitute guesses at the connections that might obtain between the phenomena under study, at uniformities and patterns that might underlie their occurrence. "Happy guesses" of this kind require great ingenuity, especially if they involve a radical departure from current modes of scientific thinking, as did, for example, the theory of relativity and quantum theory. The inventive effort required in scientific research will benefit from a thorough familiarity with current knowledge in the field. A complete novice will hardly make an important scientific discovery, for the ideas that may occur to him are likely to duplicate what has been tried before or to run afoul of well-established facts or theories of which he is not aware.

Nevertheless, the ways in which fruitful scientific guesses are arrived at are very different from any process of systematic inference. The chemist Kekule, for example, tells us that he had long been trying unsuccessfully to devise a structural formula for the benzene molecule when, one evening in 1865, he found a solution to his problem while he was dozing in front of his fireplace. Gazing into the flames, he seemed to see atoms dancing in snakelike arrays. Suddenly, one of the snakes formed a ring by seizing hold of its own tail and then whirled mockingly before him. Kekule' awoke in a flash: he had hit upon the now famous and familiar idea of representing the molecular structure of benzene by a hexagonal ring. He spent the rest of the night working out the consequences of this hypothesis. This last remark contains an important reminder concerning the objectivity of science. In his endeavor to find a solution to his problem, the scientist may give free rein to his imagination, and the course of his creative thinking may be influenced even by scientifically questionable notions. Kepler's study of planetary motion, for example, was inspired by his interest in a mystical doctrine about numbers and a passion to demonstrate the music of the spheres. Yet, scientific objectivity is safeguarded by the principle that while hypotheses and theories may be freely invented and proposed in science, they can be accepted into the body of scientific knowledge only if they pass critical scrutiny, which includes in particular the checking of suitable test implications by careful observation or experiment ... .

Scientific knowledge, as we have seen, is not arrived at by applying some inductive inference procedure to antecedently collected data, but rather by what is often called "the method of hypothesis", i.e. by inventing hypotheses as tentative answers to a problem under study, and then subjecting these to empirical test. It will be part of such test to see whether the hypothesis is borne out by whatever relevant findings may have been gathered before its formulation; an acceptable hypothesis will have to fit the available relevant data. Another part of the test will consist in deriving new test implications from the hypothesis and checking these by suitable observations or experiments. As we noted earlier, even extensive testing with entirely favorable results does not establish a hypothesis conclusively, but provides only more or less strong support for it. Hence ... ... any "rules of induction" will have to be conceived . . . as canons of validation rather than of discovery ... ... . Philosophy of Natural Science, Carl G. Hempel