1

Computer Confirmation of “P” – A Biblicist’s Perspective

Computer Confirmation of “P” – A Biblicist’s Perspective By Joshua Berman

Joshua Berman is a lecturer in Tanach at Bar-Ilan University and an associate fellow at the Shalem Center. His most recent book, Created Equal: How the Bible Broke with Ancient Political Thought (Oxford, 2008) was a National Jewish Book Award Finalist in Scholarship in 2008.

Prof. Moshe Koppel—a revered friend and senior colleague—has produced what is for me as a biblicist is a fascinating and exciting study.[1] His demonstrated capacity to tease apart books like Jeremiah and Ezekiel when their verses are randomly shuffled will find good application, I am sure, within the field of biblical studies. At the conclusion of that study, Koppel and his team report that they applied their methodology to the Torah, and that their split of its verses “corresponds to the expert consensus regarding P and non-P for over 90% of the verses in the Pentateuch for which such consensus exists.” The claim has generated much theological debate. In this post, however, I wish to respond to these findings solely within the terms and assumptions found within the discipline of biblical studies itself. The findings produced by Prof. Koppel’s team constitute an academic, scientific claim and deserve to be addressed in kind. I wish to extend my heartfelt appreciation to Prof. Koppel for sharing his data with me, and for elucidating many points for me in subsequent exchanges between us. For two primary reasons, I conclude that the findings are of little consequence for Pentateuch studies, and to the extent that they are of consequence they suggest a different significance than that reported in his initial article. The Koppel team’s compelling work, teasing apart books such as Jeremiah and Ezekiel, does not, in its present form, provide sufficient basis for an analysis of the Torah. Here’s why: Jeremiah and Ezekiel are two whole, integral works. Buoyed by the success of separating two books Koppel’s team split the verses of the Torah into two “authorial categories,” one of which seems to have affinity with the set of verses source-critical scholars refer to as the priestly source of the Pentateuch. The other authorial category, however, “Non-P” is not a single, integral, distinct book. Source critics (again, I am constructing an argument here that would be valid within the field of biblical studies) are unanimous that the non-priestly material in the Pentateuch is an amalgam of many other sources. It is more difficult to separate a single integral book from an amalgam of many different works than it is to separate one integral work from another. Put differently, an alternative experiment needs to be carried out utilizing other biblical books, before these data about the Torah can be statistically validated. Prof. Koppel and team would need to take, say, all of Jeremiah, and shuffle it with smatterings of verses from several other books, and successfully retrieve the verses from Jeremiah. I am clueless about computational linguistics, and so I shared this intuition with Prof. Koppel. He confirmed for me that the experiments carried out separating whole integral books can only offer a “weak-indicator” for the case of the texts of the Torah. But let’s give that weak-indicator the benefit of the doubt, or let’s assume that those experiments have been successfully executed, extracting Jeremiah, say, from an amalgam of other texts. What do the data in hand about the Torah suggest? I would propose that the significance is rather different than that which has been reported. To appreciate the data, however, requires a brief whirlwind tour of the history of source criticism of the Pentateuch, particularly of developments of the last twenty years. Classical source-criticism does not merely claim that the first four books of the Torah may be divided into three sources, J, E and P. It claims that each of these three sources were originally complete histories of Israel, from Creation until the time of Moses. There was a time—in the early twentieth century—that the comprehensive theories of the great minds were considered to be the final word on the subject. Einstein, it was believed, had given us the final word on physics, Freud on psychology. In the world of biblical studies, it was Wellhausen who was believed to have given us the final word on the origins of the Pentateuch with his source-critical account. But in the last generation many scholars have walked away from the table of source-criticism. For some, source-criticism proved too unruly; scholars seemed never to agree on the criteria necessary to split the putative sources. Others found source-criticism disappointing on the level of content: the sources did not seem to produce clear, consistent and differentiated theological agendas. Others found that many of the so-called indicators of multiple authorship could better be explained by recognizing the literary techniques at work in these texts as coherent wholes. For others—and this is significant—the inconsistencies in the text are best understood as individuated issues. There were indeed two flood stories in ancient Israel, they would say, but why assign these to larger strands or sources?[2] Koppel and his team use the work of Richard Elliot Friedman, Who Wrote the Bible? as a baseline to identify the Priestly source. They chose that work because Friedman—like Driver before him—schematically assigns each and every verse of the Torah to one of the four classical sources. Friedman’s work—written in 1987—is still quoted in some circles. But it cannot be said to be the universally accepted gospel on the Torah within the field of biblical studies. It is telling that since Friedman no scholar of any note has deigned to offer (let alone defend) a schematic accounting of all the verses of P, much less the whole Pentateuch. This speaks volumes to the methodological malaise currently afflicting Pentateuch studies in the academy. Unable to find a common methodological language, biblicists themselves now inhabit a post-Babel world. Does this mean that there are no source-scholars left in the 21st Century? Does it mean that there is no consensus about anything pertaining to source-criticism within the academy? Hardly. Concerning biblical law, such consensus exists. Nearly all biblicists, for example, will happily identify legal sections pertaining to the cult as deriving from a source called P. Where there is far less consensus is in the area of narrative. It is true that those that engage in source-criticism of Pentateuchal narrative find a common language and accord on many passages in identifying them as P narratives. But they are a dwindling breed. For the vast majority of scholars working today, the determination that the laws of Leviticus should be ascribed to a source called P is an infinitely easier call than is the ascription to P of the account of Shimon and Levi in Shechem. It’s not that they’re convinced that the story should be ascribed to another source. Nor would they rule out the possibility that perhaps that account was penned by the author of a P source. The hesitation comes from misgivings about the entire enterprise of viewing Torah narratives as derived from distinct, continuous sources that each detailed the history of Israel. Now let’s go to the data. In many instances, verses that source-critics assign to a source other than P, the Koppel team’s program, likewise, assigns to the category “Non-P. ” At the same time, it is important to note that the program also fails to positively confirm what source critics see as priestly material. Here we come to the main point: in the genre of law, the program does a very good job of correlating with the sections that critics see as priestly law. These, not surprisingly, tend to be cultic law. In the genre of narrative, however, a very different story emerges. Of 253 verses in Genesis that source-critics identify as “priestly”, the program could only confirm 38 of them (15%). In Exodus the discrepancy is even greater, and the program confirms 5 out of 80 (6%) priestly narrative verses. In Numbers, the correlation is much higher, but overall, the program’s results correlate with the so-called priestly narrative less than half the time. Does this suggest a “flaw” in the Koppel team’s program? Not at all. Rather, it confirms the suspicions of a very large number of biblicists: the notion of priestly narrative in the Pentateuch is on much shakier ground than is the notion of priestly law. Finally, moving away from the Koppel team’s stimulating findings, I’d like to address an issue that hovers over the entire discussion and that is the definition of an “author” and the highly anachronistic use we make of that term, especially when analyzing ancient sources. It’s a good thing that Rashi wrote in the eleventh century, because had he come out with his commentary to the Torah today, he would likely be assailed in some quarters as a plagiarizer. In many instances, Rashi attributes the midrashim he cites. But in many other instances he simply borrows entire sentences from chaza”l and presents them in his own voice. Only the discerning disciple will note that Rashi, has, in fact, lifted entire sentences from an earlier source, without attribution. Of course, Rabban Shel Yisrael was no plagiarizer. Rather, Rashi, like many medieval and ancient writers before was writing in a world where old was good. The mark of a good writer was someone who appropriately saw himself as a member of a long and venerated tradition. Good writing was precisely writing that imitated earlier style—nay, sometimes a variety of styles–and in many cases, directly lifted and borrowed whole passages.[3] The same was true in ancient contexts as well. You’d think that to ask, “Who wrote the Code of Hammurabi?” is akin to asking, “Who’s buried in Grant’s Tomb?” But it isn’t. The issue is not simply that it was Hammurabi’s scribes who actually penned that classic, and not the Babylonian king himself. It’s that we now know that those scribes invoked a number of sanctioned styles in their composition, and lifted phrases from prior works.[4] The composition is unitary because a single agent—Hammurabi and his scribes—published this composition as something to be read as such. Ancient readers may, no doubt, have noticed the unevenness of its style. But they would have seen that as the mark of a great work, produced within a venerable tradition, in which writing routinely draws from a variety of canonical and classical works. Its status derived not from the fact that it was the original composition of a single author, but that it was produced by a single authority. Readers of the Code would never have even thought to ask whether every sentence originated with Hammurabi himself. But the fact that it was under his authority that multiple styles were brought together is what made it a great work. Many other classics of the ancient world, such as the Epic of Gilgamesh likewise exhibit a plurality of styles. This is how many of the great works of the ancient world looked.[5] In defense against the claims of higher biblical criticism, the faithful often counter that divine writing is different than human writing, for the Almighty speaks in several styles and voices. This dichotomy, whereby God alone can speak in many styles, but humans must exhibit consistency, is a false one. It is human writing, too, in the premodern, and especially ancient world, that expresses itself in a variety of styles. In his commentary to the beginning of parashat Pekudei, the Ralbag questions why the Torah repeats the parshiyot of the Mishkan, in a manner that seems overly repetitious. He concludes that what appears to us as repetitious and hence, unseemly, may not have been the case in an earlier age: ואפשר שנאמר שכבר היה מנהג האנשים ההם בזמן מתן תורה שיהיו סיפוריהם בזה האופן. והנביא אמנם ידבר לפי מנהג. We may say that it was the norm at the time of Matan Torah that compositions were crafted in this fashion, and that the prophet speaks in the style of the times. It may well be that divine writing expresses itself in multiple styles. But if the Almighty has indeed done so in the Torah, then He has done so, לפי המנהג – according to literary tastes that may seem grating to modern notions of authorship, but were entirely keeping with ancient sensitivities.

[1] See Moshe Koppel, Navot Akiva, Idan Dershowitz and Nachum Dershowitz, “Unsupervised Decomposition of a Document into Authorial Components” (link).
[2] On the waning interest in Pentateuchal source criticism within biblical studies, see Rolf Rendtorff, “What Happened to the “Yahwist”?: Reflections after Thirty Years” (http://www.sbl-site.org/publications/article.aspx?articleId=553) and David Clines, “Response to Rolf Rendtorff’s “What Happened to the Yahwist? Reflections after Thirty Years” (link).
[3] See A. J. Minnis. Medieval Theory of Authorship; Scholastic Literary Attitudes in the Later Middle Ages (London: Scholar Press, 1984).
[4] Victor Avigdor Hurowitz, Inu Anum sirum : Literary Structures in the Non-juridical Sections of Codex Hammurabi (Philadelphia: University Museum, 1994).
[5] See David M. Carr, Writing on the Tablet of the Heart. Origins of Scripture and Literature (New York: Oxford University Press, 2005); Karel Van Der Toorn, Scribal Culture and the Making of the Hebrew Bible (Cambridge: Harvard University Press, 2007).



Attribution and Misattribution: On Computational Linguistics, Heresy and Journalism

Attribution and Misattribution: On Computational Linguistics, Heresy and Journalism
by Moshe Koppel

Prof. Moshe Koppel is on the faculty of the Computer Science Department at Bar-Ilan University. He has published extensively on authorship attribution, as well as on a diverse array of topics of Jewish and scientific interest.

A few days ago, newspaper readers from New Jersey to New Zealand read about new computer software that “sheds light on the authorship of the Bible”[1]. By the time the news circled back to Israel, farteitcht and farbessert, readers of Haaretz were (rather gleefully) informed that the head of the project had announced that it had been proved that the Torah was written by multiple human authors[2], just as the Bible critics had been saying all along.
I’m always skeptical about that kind of grandiose claim and this is no exception, even though the person who allegedly made the claim in this particular case happens to be me. The news reports in question refer to a recently published paper[3] in computational linguistics involving decomposition of a document into authorial components. A brief reference to application of the method to the Torah (Pentateuch) is responsible for most of the noise.
In what follows, I’ll briefly provide some background about authorship attribution research, sketch the method used in the paper, outline the main results and say a few words about what they mean. My main purpose is to explain what has actually been proved and, more crucially in this case, what has not been proved.
Authorship Attribution
One of my areas of research for over a decade has been authorship attribution, the use of automated statistical methods to identify or profile the author of a given text. For example, we can determine, with varying degrees of accuracy, the age, gender and native language of the author of a text[4]. Under certain conditions, we can determine, with varying degrees of certainty, if two texts were written by the same person[5]. Some of this work has been applied to topics of particular interest to students of Jewish texts, such as strong evidence that the collection of responsa Torah Lishmah was written by Ben Ish Chai[6] (although he often quoted the work as if it were written by someone else) and that all of the letters in Genizat Harson are forgeries[7].
Whenever I have lectured on this topic, the first question has been: have you ever analyzed the Bible? The honest truth is that I never really understood the question and I suspect that in most cases the questioner didn’t have any very well-formed question in mind, beyond the vague thought that the Bible is of mysterious provenance and ought to be amenable to some sort of statistical analysis. I would always mumble something about the question being poorly defined, Bible books being too short to permit reliable statistical analysis, etc. But, while all those excuses were quite true, I also had a vague thought of my own, which was that whatever well-formed research question I could come up with regarding Tanach, it would probably land me in hot water.
One research question that I have been working on with my graduate student, Navot Akiva, involves decomposition of a document into distinct stylistic components. For example, if a document was written by multiple authors, each of whom presumably writes in some distinct style, we’d like to be able to identify the parts written by each author. (Bear in mind this is what is known in the jargon as an unsupervised problem: we don’t get known examples of each author’s writing to analyze. All we have is the composite text itself, from which we need to tease apart distinctive looking chunks of text.) The object is straightforward: given a text, split it up into families of chunks in the best possible way, where by “best” we mean that the chunks that are assigned to the same family are as similar to each other as possible.
Even I could see that this could have some bearing on Tanach. So when Prof. Nachum Dershowitz, a colleague with whom I share a number of research interests, introduced me to his son, Idan, a graduate student in the Tanach program at Hebrew University, we agreed to consider how to apply this work to Tanach (sort of fudging the question of whether this meant Torah or Nach). It happens that, apart from being the most studied and revered set of books ever written, Tanach offers another advantage as an object of linguistic analysis: precisely because it has been the subject of so much study, there are many available automated tools that we could exploit in our research.

The Method
Here’s how our computerized method works. Divide a text into chunks in some reasonable way. These chunks might be chapters or some fixed number of sentences or whatever; the details aren’t critical and need not concern us at this stage. I’m going to call these chunks “chapters” (only because it is a less technical sounding word), but bear in mind that we are not assuming that a chapter is stylistically homogeneous; that is, the split between authors might take place in the middle of a chapter.
Our object is to split our collection of chapters into families of stylistically similar chapters. (The chapters in a family need not be contiguous.) All the chapters that look a certain way, please step to the left; all others, please step to the right.
As a first step, for any pair of chapters, we’re going to have to measure the similarity between them. The trick is to measure this similarity in a way that captures style rather than content.
The way we do it is as follows: we begin by generating a list of synonym sets. For example, for the case of Tanach, we would consider synonym sets such as betoch, bekerev; begged, simla; sar, nasi; makel, mateh, shevet; and so on. There are about 200 such sets of Biblical synonyms. We generate this list automatically by identifying Hebrew roots that are translated by the same English root in the KJV. Note that not every occurrence of, for example, shevet (which can mean either “staff” or “tribe”) is a synonym for makel (which is always “staff”). We use online concordances to disambiguate, that is, to determine the intended sense of a word in a particular context. (In this respect, Tanach is especially convenient to work with.)
For every chapter and every such set of synonyms, we record which synonym (if any) that chapter uses. The similarity of a pair of chapters reflects the extent to which they make similar choices from among synonym sets. The idea is that if one chapter uses – for example – betoch, sar and mateh and the other uses bekerev, nasi and makel, the two chapters have low similarity. If a chapter doesn’t use any of the synonyms in a particular synonym set, that set plays no role in measuring the similarity between that chapter and any other chapter.
Once we know the similarity between every pair of chapters, we use formal methods to create optimal families. Ideally, we want all the chapters in the same family to be very similar to each other and to be very different from the chapters in other families. In fact, such clean divisions are unusual, but the formal methods will generally find a near-optimal clustering into families. (What we call families are called “clusters” in the jargon, and the process of finding them is called “clustering”. The particular clustering method we used is a spectral approximation method called n-cut.)
A key question you should ask at this point is: how many families will we get? You might imagine that the clustering method will somehow figure out the right number of families. Indeed, there are clustering methods that can do that. But – note this carefully – the number of families we obtain is not determined by the clustering method we use. Rather it is given by us as an input. That is, we decide in advance how many families we want to get and the method is forced to give us exactly what we asked for. This is a crucial point and we’ll come back to it when we get to the meaning of all these results below.
In any case, at this stage, we have a tentative division of chapters into however many families we asked for. (For simplicity, let’s assume that we have split the chapters into exactly two families.) This is not the final result, for the simple reason that we have no guarantee that the chapters themselves are homogeneous. The next step is to identify those chapters that are at the core of each family; these are the chapters we are most confident we have assigned correctly and are consequently the ones most likely to be homogeneous. (Note that when I say “we are confident” I don’t mean anything subjective and wishy-washy; all this is done automatically according to formal criteria a bit too technical to get into here.)
Now that we have a selection of chapters that are assigned to respective families with high confidence, we use them as seeds for building a “model” that distinguishes between the two families. Very roughly speaking, we look for common words (ones not tied to any specific topic) that appear more in one family than in the other and we use formal methods (for those interested, we use SVM) to find just the right weight to give to each such word as an indicator of one family or the other. We now use this model to classify individual sentences as being in one family or the other.

Results
Wonderful, so we did all sorts of geeky hocus-pocus. Why should you believe that this works? Maybe the whole synonym idea is wrong because we ignore subtle differences in meaning between “synonyms”. Maybe the same author deliberately switches from one synonym to the other for literary reasons. Maybe we are biased because we believe something wicked and we subtly manipulated the method to obtain particular results.
These are legitimate concerns. That’s why we test the method on data for which we know the right answer to see if the method gives that right answer. In this case, our test works as follows. We take two books, each of which we can assume is written by a single distinct author, mix them up in some random fashion, and check if our method correctly unmixes them. In particular, we took as our main test set random mishmashes of Yirmiyahu and Yechezkel.
We found that the method works extremely well. About 17% of the psukim could not be classified (no differentiating words appeared in these psukim or their near neighbors). Of the approximately 2200 psukim that were classified into two families, all the Yirmiyahu psukim went into one family and all the Yechezkel psukim went into the other, with a total of 26 (1.2%) exceptions. We obtained similar results on a variety of other book pairs.
So maybe we should have left well enough alone. But with a power tool like this in hand, how could you not want to see how it would split the chumash? Shoot me, but for me, like Rav Kahana hiding under his rebbe’s bed, Torah hee velilmod any tzarich. We did the experiment. I should hasten to mention, though, that the chumash experiment is only briefly mentioned in the published paper, which focuses on proving the efficacy of the method (it’s a computational linguistics paper, not a Bible paper).
Now, I should point out that until I got involved in this, I was a complete am haaretz in Bible Criticism, a perfectly agreeable state of affairs, as far as I was concerned. However, Idan Dershowitz immediately observed that our split was very similar to the split between what critics refer to as the Priestly (P) and non-Priestly portions of the Torah. Bear in mind that there are ongoing disagreements among the critics about precisely which psukim should be regarded as P and which not. We took two standard such splits, that of Driver and that of Friedman, and refer to the set of psukim for which they agree as “consensus” psukim. (They agree just over 90% of the time.)
Here’s the result. Our split of the Torah into two families corresponds with their split for about 90% of all consensus psukim.
Let me say a few words about the main areas of disagreement. To a significant extent, our split runs along lines of genre. One family is mostly – not completely – legal material and the other is mostly narrative. Since what the critics call the Priestly sections include pretty much all of Vayikra (which is mostly laws), as well as selected portions of Bereishis, Shemos and Bemidbar, their split also corresponds somewhat to the legal/narrative split. Most of the cases where our split is different than theirs involve narrative sections that they assign to P and our method assigns to the family that corresponds to non-P, for example, the first chapter of Bereishis. (The rest of the disagreements involve P sections that scholars now refer to as H and consider some sort of quasi-P, but I don’t want to get into all that, mostly because I’m still pretty clueless about it.)
Before you dismiss all this by saying that all we did was discover that stories don’t look like laws, let me point out there are plenty of narrative sections that the computerized analysis assigned to the P family (or, more precisely, to the nameless family that turns out to be very similar to what the critics call the P family). Two prominent examples are the story of Shimon and Levi in Shechem and the story of Pinchas and Zimri.
One more point: when we split the Torah into three or more families, our results do not coincide with those of the critics. In the case of three families, Devarim does seem to split off as its own family, as the critics claim, but there are a fair number of exceptions. And even with four or more families, no hint of the critics’ E/J split shows up at all.

Interpreting the Results
So does all this mean that we have proved that the Torah was written by at least two human authors, as the breathless reports claim? No.
First of all, as I noted above, our method does not determine the optimal number of families. That is, it does not make a claim regarding the number of authors. Rather, you decide in advance how many families you want and the method finds the optimal (or a near-optimal) split of the text into that number. If you ask it to split Moby Dick into two (or four or thirteen) parts, it will do so. Thus the fact that we split the Torah into two tells us exactly nothing about the actual number of authors.
Having said that, I want to temper any religious enthusiasm such a disclaimer might engender. First of all, with a few improvements to the method we could probably identify some optimal number of families for a given text. We simply haven’t done so. Second, the fact that – for the case of two families – the results of our method coincide (to some extent) with those of the critics would seem to suggest that the split the method suggests is not merely coincidental.
But, the deeper reason that our work is irrelevant to the question of divine authorship is simply that it does not – indeed, it could not – have a thing to say on that question. If you were to have some theory about what properties divine writing ought to have and close analysis revealed that a certain text probably did not have those properties, then you might have to change your prior belief about the divine provenance of that text. But does anyone really have some theory about what divine texts are supposed to look like? Several press reports about this work referenced the idea that “God could write in multiple voices”. I find that formulation a bit simplistic, but it captures the fact that any attempt to map from multiple writing styles to multiple authorship must be rooted in assumptions about human cognition and human performance that are simply not relevant to the question of divine action[8].
In short, our results seem to support some findings of higher Bible criticism regarding possible boundaries between distinct stylistic threads in the Torah. These results might have some relevance regarding literary analysis of the Torah. Taken on their own, however, they are not proof of multiple authorship. Furthermore, there is nothing in these results that should cause those of us committed to the traditional belief in divine authorship of the Torah to doubt that belief.
[3] M. Koppel, N. Akiva, I. Dershowitz and N. Dershowitz, (2011). Unsupervised Decomposition of a Document Into Authorial Components, Proceedings of ACL, pp. 1356-1364.
[4] S. Argamon, M. Koppel, J. Pennebaker and J. Schler (2009), Automatically Profiling the Author of an Anonymous Text, Communications of the ACM, 52 (2): pp. 119-123 (virtual extension).
[5] M. Koppel, J. Schler and E. Bonchek-Dokow (2007), Measuring Differentiability: Unmasking Pseudonymous Authors, JMLR 8, July 2007, pp. 1261-1276.
[6] M. Koppel, D. Mughaz and N. Akiva (2006), New Methods for Attribution of Rabbinic Literature , Hebrew Linguistics: A Journal for Hebrew Descriptive, Computational and Applied Linguistics, 57, pp. 5-18.
[7] מ. קופל, זיהוי מחברים בשיטות ממוחשבות: “גניזת חרסון”, ישורון כג (אלול ה’תש”ע), תקנט-תקסו.
[8] I realize that this argument comes close to asserting that the claim of divine authorship is unfalsifiable, which for some might cast doubt on the meaningfulness of that claim. A proper response to that concern would involve a discussion of the nature and content of religious belief, a discussion that is well beyond the scope of this brief peroration.