Computer Confirmation of “P” – A Biblicist’s Perspective

Computer Confirmation of “P” – A Biblicist’s Perspective By Joshua Berman

Joshua Berman is a lecturer in Tanach at Bar-Ilan University and an associate fellow at the Shalem Center. His most recent book, Created Equal: How the Bible Broke with Ancient Political Thought (Oxford, 2008) was a National Jewish Book Award Finalist in Scholarship in 2008.

Prof. Moshe Koppel—a revered friend and senior colleague—has produced what is for me as a biblicist is a fascinating and exciting study.[1] His demonstrated capacity to tease apart books like Jeremiah and Ezekiel when their verses are randomly shuffled will find good application, I am sure, within the field of biblical studies. At the conclusion of that study, Koppel and his team report that they applied their methodology to the Torah, and that their split of its verses “corresponds to the expert consensus regarding P and non-P for over 90% of the verses in the Pentateuch for which such consensus exists.” The claim has generated much theological debate. In this post, however, I wish to respond to these findings solely within the terms and assumptions found within the discipline of biblical studies itself. The findings produced by Prof. Koppel’s team constitute an academic, scientific claim and deserve to be addressed in kind. I wish to extend my heartfelt appreciation to Prof. Koppel for sharing his data with me, and for elucidating many points for me in subsequent exchanges between us. For two primary reasons, I conclude that the findings are of little consequence for Pentateuch studies, and to the extent that they are of consequence they suggest a different significance than that reported in his initial article. The Koppel team’s compelling work, teasing apart books such as Jeremiah and Ezekiel, does not, in its present form, provide sufficient basis for an analysis of the Torah. Here’s why: Jeremiah and Ezekiel are two whole, integral works. Buoyed by the success of separating two books Koppel’s team split the verses of the Torah into two “authorial categories,” one of which seems to have affinity with the set of verses source-critical scholars refer to as the priestly source of the Pentateuch. The other authorial category, however, “Non-P” is not a single, integral, distinct book. Source critics (again, I am constructing an argument here that would be valid within the field of biblical studies) are unanimous that the non-priestly material in the Pentateuch is an amalgam of many other sources. It is more difficult to separate a single integral book from an amalgam of many different works than it is to separate one integral work from another. Put differently, an alternative experiment needs to be carried out utilizing other biblical books, before these data about the Torah can be statistically validated. Prof. Koppel and team would need to take, say, all of Jeremiah, and shuffle it with smatterings of verses from several other books, and successfully retrieve the verses from Jeremiah. I am clueless about computational linguistics, and so I shared this intuition with Prof. Koppel. He confirmed for me that the experiments carried out separating whole integral books can only offer a “weak-indicator” for the case of the texts of the Torah. But let’s give that weak-indicator the benefit of the doubt, or let’s assume that those experiments have been successfully executed, extracting Jeremiah, say, from an amalgam of other texts. What do the data in hand about the Torah suggest? I would propose that the significance is rather different than that which has been reported. To appreciate the data, however, requires a brief whirlwind tour of the history of source criticism of the Pentateuch, particularly of developments of the last twenty years. Classical source-criticism does not merely claim that the first four books of the Torah may be divided into three sources, J, E and P. It claims that each of these three sources were originally complete histories of Israel, from Creation until the time of Moses. There was a time—in the early twentieth century—that the comprehensive theories of the great minds were considered to be the final word on the subject. Einstein, it was believed, had given us the final word on physics, Freud on psychology. In the world of biblical studies, it was Wellhausen who was believed to have given us the final word on the origins of the Pentateuch with his source-critical account. But in the last generation many scholars have walked away from the table of source-criticism. For some, source-criticism proved too unruly; scholars seemed never to agree on the criteria necessary to split the putative sources. Others found source-criticism disappointing on the level of content: the sources did not seem to produce clear, consistent and differentiated theological agendas. Others found that many of the so-called indicators of multiple authorship could better be explained by recognizing the literary techniques at work in these texts as coherent wholes. For others—and this is significant—the inconsistencies in the text are best understood as individuated issues. There were indeed two flood stories in ancient Israel, they would say, but why assign these to larger strands or sources?[2] Koppel and his team use the work of Richard Elliot Friedman, Who Wrote the Bible? as a baseline to identify the Priestly source. They chose that work because Friedman—like Driver before him—schematically assigns each and every verse of the Torah to one of the four classical sources. Friedman’s work—written in 1987—is still quoted in some circles. But it cannot be said to be the universally accepted gospel on the Torah within the field of biblical studies. It is telling that since Friedman no scholar of any note has deigned to offer (let alone defend) a schematic accounting of all the verses of P, much less the whole Pentateuch. This speaks volumes to the methodological malaise currently afflicting Pentateuch studies in the academy. Unable to find a common methodological language, biblicists themselves now inhabit a post-Babel world. Does this mean that there are no source-scholars left in the 21st Century? Does it mean that there is no consensus about anything pertaining to source-criticism within the academy? Hardly. Concerning biblical law, such consensus exists. Nearly all biblicists, for example, will happily identify legal sections pertaining to the cult as deriving from a source called P. Where there is far less consensus is in the area of narrative. It is true that those that engage in source-criticism of Pentateuchal narrative find a common language and accord on many passages in identifying them as P narratives. But they are a dwindling breed. For the vast majority of scholars working today, the determination that the laws of Leviticus should be ascribed to a source called P is an infinitely easier call than is the ascription to P of the account of Shimon and Levi in Shechem. It’s not that they’re convinced that the story should be ascribed to another source. Nor would they rule out the possibility that perhaps that account was penned by the author of a P source. The hesitation comes from misgivings about the entire enterprise of viewing Torah narratives as derived from distinct, continuous sources that each detailed the history of Israel. Now let’s go to the data. In many instances, verses that source-critics assign to a source other than P, the Koppel team’s program, likewise, assigns to the category “Non-P. ” At the same time, it is important to note that the program also fails to positively confirm what source critics see as priestly material. Here we come to the main point: in the genre of law, the program does a very good job of correlating with the sections that critics see as priestly law. These, not surprisingly, tend to be cultic law. In the genre of narrative, however, a very different story emerges. Of 253 verses in Genesis that source-critics identify as “priestly”, the program could only confirm 38 of them (15%). In Exodus the discrepancy is even greater, and the program confirms 5 out of 80 (6%) priestly narrative verses. In Numbers, the correlation is much higher, but overall, the program’s results correlate with the so-called priestly narrative less than half the time. Does this suggest a “flaw” in the Koppel team’s program? Not at all. Rather, it confirms the suspicions of a very large number of biblicists: the notion of priestly narrative in the Pentateuch is on much shakier ground than is the notion of priestly law. Finally, moving away from the Koppel team’s stimulating findings, I’d like to address an issue that hovers over the entire discussion and that is the definition of an “author” and the highly anachronistic use we make of that term, especially when analyzing ancient sources. It’s a good thing that Rashi wrote in the eleventh century, because had he come out with his commentary to the Torah today, he would likely be assailed in some quarters as a plagiarizer. In many instances, Rashi attributes the midrashim he cites. But in many other instances he simply borrows entire sentences from chaza”l and presents them in his own voice. Only the discerning disciple will note that Rashi, has, in fact, lifted entire sentences from an earlier source, without attribution. Of course, Rabban Shel Yisrael was no plagiarizer. Rather, Rashi, like many medieval and ancient writers before was writing in a world where old was good. The mark of a good writer was someone who appropriately saw himself as a member of a long and venerated tradition. Good writing was precisely writing that imitated earlier style—nay, sometimes a variety of styles–and in many cases, directly lifted and borrowed whole passages.[3] The same was true in ancient contexts as well. You’d think that to ask, “Who wrote the Code of Hammurabi?” is akin to asking, “Who’s buried in Grant’s Tomb?” But it isn’t. The issue is not simply that it was Hammurabi’s scribes who actually penned that classic, and not the Babylonian king himself. It’s that we now know that those scribes invoked a number of sanctioned styles in their composition, and lifted phrases from prior works.[4] The composition is unitary because a single agent—Hammurabi and his scribes—published this composition as something to be read as such. Ancient readers may, no doubt, have noticed the unevenness of its style. But they would have seen that as the mark of a great work, produced within a venerable tradition, in which writing routinely draws from a variety of canonical and classical works. Its status derived not from the fact that it was the original composition of a single author, but that it was produced by a single authority. Readers of the Code would never have even thought to ask whether every sentence originated with Hammurabi himself. But the fact that it was under his authority that multiple styles were brought together is what made it a great work. Many other classics of the ancient world, such as the Epic of Gilgamesh likewise exhibit a plurality of styles. This is how many of the great works of the ancient world looked.[5] In defense against the claims of higher biblical criticism, the faithful often counter that divine writing is different than human writing, for the Almighty speaks in several styles and voices. This dichotomy, whereby God alone can speak in many styles, but humans must exhibit consistency, is a false one. It is human writing, too, in the premodern, and especially ancient world, that expresses itself in a variety of styles. In his commentary to the beginning of parashat Pekudei, the Ralbag questions why the Torah repeats the parshiyot of the Mishkan, in a manner that seems overly repetitious. He concludes that what appears to us as repetitious and hence, unseemly, may not have been the case in an earlier age: ואפשר שנאמר שכבר היה מנהג האנשים ההם בזמן מתן תורה שיהיו סיפוריהם בזה האופן. והנביא אמנם ידבר לפי מנהג. We may say that it was the norm at the time of Matan Torah that compositions were crafted in this fashion, and that the prophet speaks in the style of the times. It may well be that divine writing expresses itself in multiple styles. But if the Almighty has indeed done so in the Torah, then He has done so, לפי המנהג – according to literary tastes that may seem grating to modern notions of authorship, but were entirely keeping with ancient sensitivities.

[1] See Moshe Koppel, Navot Akiva, Idan Dershowitz and Nachum Dershowitz, “Unsupervised Decomposition of a Document into Authorial Components” (link).

[2] On the waning interest in Pentateuchal source criticism within biblical studies, see Rolf Rendtorff, “What Happened to the “Yahwist”?: Reflections after Thirty Years” (http://www.sbl-site.org/publications/article.aspx?articleId=553) and David Clines, “Response to Rolf Rendtorff’s “What Happened to the Yahwist? Reflections after Thirty Years” (link).

[3] See A. J. Minnis. Medieval Theory of Authorship; Scholastic Literary Attitudes in the Later Middle Ages (London: Scholar Press, 1984).

[4] Victor Avigdor Hurowitz, Inu Anum sirum : Literary Structures in the Non-juridical Sections of Codex Hammurabi (Philadelphia: University Museum, 1994).

[5] See David M. Carr, Writing on the Tablet of the Heart. Origins of Scripture and Literature (New York: Oxford University Press, 2005); Karel Van Der Toorn, Scribal Culture and the Making of the Hebrew Bible (Cambridge: Harvard University Press, 2007).

On the Plagiarism of a Tach-ve-Tat Chronicle

During this period, between the 17th of Tamuz and the 9th of Av, there is an increased focus upon various historical calamities that befell the Jewish people. Jewish history is unfortunately replete with such examples. Some instances have spawned specific days of commemoration while others have produced whole bodies of literature. And, while the literature surrounding these events is diverse, covering liturgy, poetry, history, we focus on one type: the chronicle. Additionally, our focus is the Chmielnicki Massacres, or Gezerot Tach ve-Tat. The Hebrew refers to the dates – 1648-49 – when the majority of Jew-killing took place. While these events took place hundreds of years ago, its effects including the total number of Jews killed is still being debated by scholars. (See Jits van Straten, “Did Shmu’el Ben Nathan and Nathan Hanover Exaggerate: Estimates of Jewish Casualties in the Ukraine During the Cossack Revolt in 1648,” Zutot 6:1 (2009), 75-82, calling into question the lower estimates of Shaul Stampfer, “What Actually Happened to the Jews of Ukraine in 1648?” Jewish History 17:2 (May 2003), 207-27.)

The most well-known chronicle describing the events is that of R. Nathan of Hanover, Yaven Metzulah. There is an English translation of Hanover’s work, Abyss of Despair, translated by R. Abraham J. Mesch. The translation includes a “traditional drawing of Maharsha.”

While it is not noted, this illustration, that has Maharsha with long flowing hair first appears in the Vienna, 1814 edition of the Maharsha’s commentary (vol. I, vol. II). While Mesch indicates this is the “traditional drawing” we know of no earlier instance than the Vienna edition. This was not the only Vienna edition that includes a questionable portrait. The 1804 Vienna edition of R. Yitzhak Alfasi’s Halakhot also includes a portrait that is claimed to be R. Alfasi. Again, we know of no earlier evidence that would confirm such a rendering.

A collection of these chronicles was published most recently Gezerat Tach ve-Tat, Jerusalem, 2004. Additionally, Joel Raba, Between Remembrance and Denial, Columbia Univ. Press, 1995, discusses these chronicles as does the collection of articles that appears in the journal, Jewish History 17:2 (May, 2003).

We turn our attention, however, to a lesser known work from this period, Tzok ha-Itim. Tzok was actually the first chronicle regarding the 1648-49 events published. It was first published in Krakow, 1650 (link). Indeed, some have argued that Hanover relied heavily on Tzok in compiling Yaven Metzulah (first published in 1653).

Tzok was republished in Constantinople in 1652. This edition is exceedingly rare. According to Ya’ari, there is but one complete copy extant. (See Ya’ari, Kiryat Sefer, (16) 1939-40). This edition was published by R. Shmuel ben R. Shimson who on his way to Israel after fleeing the massacres. At the end of the book he includes a dirge (kinnah) about the events. He also penned his own introduction which describes his own suffering. He says that “I am the only remaining survivor in my family as the rest were killed sanctifying god’s name . . . although I was spared . . . my wife and children I buried, I lost all of my possessions . . . .” He explains that “all I wanted was to dwell in the bet midrash and therefore I decided to travel to Jerusalem” and that while he was on his journey he came across Tzok and decided to reprint it in Constantinople “so that what has occurred shall not be forgotten.” (Ya’ari, Mechkerei Sefer, Jerusalem, 1958 p. 16 reprints the entire introduction, he also provides other accounts of people, who, on their way to Israel, issued works related to 1648-49 massacres.)

Tzok was then reissued in Venice in 1656.

The first two editions list R. Meir ben Shmuel of Szczebrzeszyn as the author. The 1656 edition, however, lists a completely different author, R. Joshua ben David of Lemberg. It is not only on the title page that a different author is listed. The work itself is not composed as traditional narrative. Instead, it is written in verse. The first verses in all the editions spell out the author’s name in an acrostic. Thus, the 1650 and 1652 editions have an acrostic that spells out R. Meir of Szczebrzeszyn’s name while the 1656 edition acrostic spells out R. Joshua’s name. In some instances words are added to create the “new” acrostic, while in other instances, the highlighted letters are changed.

Here is the introduction to the Constantinople edition:

And here is the introduction to the Venice edition:

As an aside it is worth noting that this is not the only time a plagiarizer has been forced to change the acrostic to hide his stolen goods. (See Kitvei Pinchas Turburg, ed. A. R. Malachi, 24-36 for additional examples of acrostic changes, and see this earlier post discussing similar changes to hide the identity of the true author, and see this post where the plagiarizer was caught in the act and forced to admit his guilt and apologize). Additionally, at least in one instance the acrostic was able to demonstrate authorship. In the Siddur Bet Ya’akov (although attributed to R. Y. Emden, this siddur contains numerous additions as compared to R. Emden’s actual siddur called Ammudei Shamayim – Sha’arei Shamayim; this is one of them) the Belzer Rebbe asserts that the author of the zemer Yom Shabbat Kodesh Hu had his song stolen. He came across the plagiarizer and challenged him to prove authorship. Specifically, the real author showed that his name, Yonatan, could be seen in the acrostic, and with this he vanquished the thief. R. Emden uses this story to explain the meaning behind the final verse which loosely translated as “all the talk [about authorship] should [now] end now that I have enlarged the song [and demonstrated my authorship] . . . and that no one should ever steal from me as this song is my property.”

An example where the acrostic actually has the opposite effect, obscuring the original author is also a zemer, Yom Zeh le-Yisrael. At times, this song can be confusing depending upon which bencher one is using. This so, because some version have a shorter version while others have a longer ( see here for example). Some argue that the two versions are indicative of two authors, one, the original author which only spelled out Yitzhak (and then lamed vav) to which all the other verses were added, now spelling Yitzhak Luria Hazak. (Regarding this zemer see Naftali ben Menachem, Zemirot shel Shabbat, Israel, 1949, 144-45; I. Davidson, Thesaurus of Mediaeval Hebrew Poetry, Ktav, 1970, vol. II, 348.)

Returning to Tzok, because the acrostic lends support for either author, some didn’t know who the “real” author was. In the 1890s, a number of these chronicles regarding bad events in Jewish history were collected and published under the title Le-Korot ha-Gezerot ‘al Yisrael by C. Gorlin. Included is Tzok. But, instead of a traditional introduction, he prefaces Tzok with a section “Who is the real author?” Gorlin argues that the real author is indeed R. Meir and not R. Joshua. This is not the first time that there is some confusion regarding who is the real author and who is the thief, for another example see here and for another example of modern day plagiarism see here.

With regard to the Constantinople edition, Ya’ari demonstrates that this edition is better than the first, in that many of the typos and the like have been corrected. Unfortunately, perhaps due to its rarity, the 2004 edition of Tzok relies upon the 1650 edition and not the better 1652. Additionally, the 1652 edition is one of the works published by a convert. Of course, this is probably what first got Ya’ari interested as he provides a bibliography of works published by converts.

It should be noted that Tzok was rather popular even if it is no longer. When R. David ha-Levi Segal, author of the commentary on Shulhan Orakh, Turei Zehav, sent a delegation to the false messiah, Shabbatai Tzvi, when the delegation entered, they record that Shabbatai Tzvi had a copy of Tzok on the table. (See G. Scholem, Sabbati Sevi, Princeton Univ. Press, 1976, p. 623 quoting Leib Ozer, Sippurei Ma’ashi Sabbati Tzvi, p. 81 and Sefer Tziz Nobel Tzvi, ed. I. Tishby, pp. 77-79.)

Finally, we note that the most recent edition, the 2004 op. cit., uses the Krakow first edition, even though Ya’ari has already shown that the rare Constantinople edition corrected numerous errors that appear in the Krakow edition.

Attribution and Misattribution: On Computational Linguistics, Heresy and Journalism

Attribution and Misattribution: On Computational Linguistics, Heresy and Journalism

by Moshe Koppel

Prof. Moshe Koppel is on the faculty of the Computer Science Department at Bar-Ilan University. He has published extensively on authorship attribution, as well as on a diverse array of topics of Jewish and scientific interest.

A few days ago, newspaper readers from New Jersey to New Zealand read about new computer software that “sheds light on the authorship of the Bible”[1]. By the time the news circled back to Israel, farteitcht and farbessert, readers of Haaretz were (rather gleefully) informed that the head of the project had announced that it had been proved that the Torah was written by multiple human authors[2], just as the Bible critics had been saying all along.

I’m always skeptical about that kind of grandiose claim and this is no exception, even though the person who allegedly made the claim in this particular case happens to be me. The news reports in question refer to a recently published paper[3] in computational linguistics involving decomposition of a document into authorial components. A brief reference to application of the method to the Torah (Pentateuch) is responsible for most of the noise.

In what follows, I’ll briefly provide some background about authorship attribution research, sketch the method used in the paper, outline the main results and say a few words about what they mean. My main purpose is to explain what has actually been proved and, more crucially in this case, what has not been proved.

Authorship Attribution

One of my areas of research for over a decade has been authorship attribution, the use of automated statistical methods to identify or profile the author of a given text. For example, we can determine, with varying degrees of accuracy, the age, gender and native language of the author of a text[4]. Under certain conditions, we can determine, with varying degrees of certainty, if two texts were written by the same person[5]. Some of this work has been applied to topics of particular interest to students of Jewish texts, such as strong evidence that the collection of responsa Torah Lishmah was written by Ben Ish Chai[6] (although he often quoted the work as if it were written by someone else) and that all of the letters in Genizat Harson are forgeries[7].

Whenever I have lectured on this topic, the first question has been: have you ever analyzed the Bible? The honest truth is that I never really understood the question and I suspect that in most cases the questioner didn’t have any very well-formed question in mind, beyond the vague thought that the Bible is of mysterious provenance and ought to be amenable to some sort of statistical analysis. I would always mumble something about the question being poorly defined, Bible books being too short to permit reliable statistical analysis, etc. But, while all those excuses were quite true, I also had a vague thought of my own, which was that whatever well-formed research question I could come up with regarding Tanach, it would probably land me in hot water.

One research question that I have been working on with my graduate student, Navot Akiva, involves decomposition of a document into distinct stylistic components. For example, if a document was written by multiple authors, each of whom presumably writes in some distinct style, we’d like to be able to identify the parts written by each author. (Bear in mind this is what is known in the jargon as an unsupervised problem: we don’t get known examples of each author’s writing to analyze. All we have is the composite text itself, from which we need to tease apart distinctive looking chunks of text.) The object is straightforward: given a text, split it up into families of chunks in the best possible way, where by “best” we mean that the chunks that are assigned to the same family are as similar to each other as possible.

Even I could see that this could have some bearing on Tanach. So when Prof. Nachum Dershowitz, a colleague with whom I share a number of research interests, introduced me to his son, Idan, a graduate student in the Tanach program at Hebrew University, we agreed to consider how to apply this work to Tanach (sort of fudging the question of whether this meant Torah or Nach). It happens that, apart from being the most studied and revered set of books ever written, Tanach offers another advantage as an object of linguistic analysis: precisely because it has been the subject of so much study, there are many available automated tools that we could exploit in our research.

The Method

Here’s how our computerized method works. Divide a text into chunks in some reasonable way. These chunks might be chapters or some fixed number of sentences or whatever; the details aren’t critical and need not concern us at this stage. I’m going to call these chunks “chapters” (only because it is a less technical sounding word), but bear in mind that we are not assuming that a chapter is stylistically homogeneous; that is, the split between authors might take place in the middle of a chapter.

Our object is to split our collection of chapters into families of stylistically similar chapters. (The chapters in a family need not be contiguous.) All the chapters that look a certain way, please step to the left; all others, please step to the right.

As a first step, for any pair of chapters, we’re going to have to measure the similarity between them. The trick is to measure this similarity in a way that captures style rather than content.

The way we do it is as follows: we begin by generating a list of synonym sets. For example, for the case of Tanach, we would consider synonym sets such as betoch, bekerev; begged, simla; sar, nasi; makel, mateh, shevet; and so on. There are about 200 such sets of Biblical synonyms. We generate this list automatically by identifying Hebrew roots that are translated by the same English root in the KJV. Note that not every occurrence of, for example, shevet (which can mean either “staff” or “tribe”) is a synonym for makel (which is always “staff”). We use online concordances to disambiguate, that is, to determine the intended sense of a word in a particular context. (In this respect, Tanach is especially convenient to work with.)

For every chapter and every such set of synonyms, we record which synonym (if any) that chapter uses. The similarity of a pair of chapters reflects the extent to which they make similar choices from among synonym sets. The idea is that if one chapter uses – for example – betoch, sar and mateh and the other uses bekerev, nasi and makel, the two chapters have low similarity. If a chapter doesn’t use any of the synonyms in a particular synonym set, that set plays no role in measuring the similarity between that chapter and any other chapter.

Once we know the similarity between every pair of chapters, we use formal methods to create optimal families. Ideally, we want all the chapters in the same family to be very similar to each other and to be very different from the chapters in other families. In fact, such clean divisions are unusual, but the formal methods will generally find a near-optimal clustering into families. (What we call families are called “clusters” in the jargon, and the process of finding them is called “clustering”. The particular clustering method we used is a spectral approximation method called n-cut.)

A key question you should ask at this point is: how many families will we get? You might imagine that the clustering method will somehow figure out the right number of families. Indeed, there are clustering methods that can do that. But – note this carefully – the number of families we obtain is not determined by the clustering method we use. Rather it is given by us as an input. That is, we decide in advance how many families we want to get and the method is forced to give us exactly what we asked for. This is a crucial point and we’ll come back to it when we get to the meaning of all these results below.

In any case, at this stage, we have a tentative division of chapters into however many families we asked for. (For simplicity, let’s assume that we have split the chapters into exactly two families.) This is not the final result, for the simple reason that we have no guarantee that the chapters themselves are homogeneous. The next step is to identify those chapters that are at the core of each family; these are the chapters we are most confident we have assigned correctly and are consequently the ones most likely to be homogeneous. (Note that when I say “we are confident” I don’t mean anything subjective and wishy-washy; all this is done automatically according to formal criteria a bit too technical to get into here.)

Now that we have a selection of chapters that are assigned to respective families with high confidence, we use them as seeds for building a “model” that distinguishes between the two families. Very roughly speaking, we look for common words (ones not tied to any specific topic) that appear more in one family than in the other and we use formal methods (for those interested, we use SVM) to find just the right weight to give to each such word as an indicator of one family or the other. We now use this model to classify individual sentences as being in one family or the other.

Results

Wonderful, so we did all sorts of geeky hocus-pocus. Why should you believe that this works? Maybe the whole synonym idea is wrong because we ignore subtle differences in meaning between “synonyms”. Maybe the same author deliberately switches from one synonym to the other for literary reasons. Maybe we are biased because we believe something wicked and we subtly manipulated the method to obtain particular results.

These are legitimate concerns. That’s why we test the method on data for which we know the right answer to see if the method gives that right answer. In this case, our test works as follows. We take two books, each of which we can assume is written by a single distinct author, mix them up in some random fashion, and check if our method correctly unmixes them. In particular, we took as our main test set random mishmashes of Yirmiyahu and Yechezkel.

We found that the method works extremely well. About 17% of the psukim could not be classified (no differentiating words appeared in these psukim or their near neighbors). Of the approximately 2200 psukim that were classified into two families, all the Yirmiyahu psukim went into one family and all the Yechezkel psukim went into the other, with a total of 26 (1.2%) exceptions. We obtained similar results on a variety of other book pairs.

So maybe we should have left well enough alone. But with a power tool like this in hand, how could you not want to see how it would split the chumash? Shoot me, but for me, like Rav Kahana hiding under his rebbe’s bed, Torah hee velilmod any tzarich. We did the experiment. I should hasten to mention, though, that the chumash experiment is only briefly mentioned in the published paper, which focuses on proving the efficacy of the method (it’s a computational linguistics paper, not a Bible paper).

Now, I should point out that until I got involved in this, I was a complete am haaretz in Bible Criticism, a perfectly agreeable state of affairs, as far as I was concerned. However, Idan Dershowitz immediately observed that our split was very similar to the split between what critics refer to as the Priestly (P) and non-Priestly portions of the Torah. Bear in mind that there are ongoing disagreements among the critics about precisely which psukim should be regarded as P and which not. We took two standard such splits, that of Driver and that of Friedman, and refer to the set of psukim for which they agree as “consensus” psukim. (They agree just over 90% of the time.)

Here’s the result. Our split of the Torah into two families corresponds with their split for about 90% of all consensus psukim.

Let me say a few words about the main areas of disagreement. To a significant extent, our split runs along lines of genre. One family is mostly – not completely – legal material and the other is mostly narrative. Since what the critics call the Priestly sections include pretty much all of Vayikra (which is mostly laws), as well as selected portions of Bereishis, Shemos and Bemidbar, their split also corresponds somewhat to the legal/narrative split. Most of the cases where our split is different than theirs involve narrative sections that they assign to P and our method assigns to the family that corresponds to non-P, for example, the first chapter of Bereishis. (The rest of the disagreements involve P sections that scholars now refer to as H and consider some sort of quasi-P, but I don’t want to get into all that, mostly because I’m still pretty clueless about it.)

Before you dismiss all this by saying that all we did was discover that stories don’t look like laws, let me point out there are plenty of narrative sections that the computerized analysis assigned to the P family (or, more precisely, to the nameless family that turns out to be very similar to what the critics call the P family). Two prominent examples are the story of Shimon and Levi in Shechem and the story of Pinchas and Zimri.

One more point: when we split the Torah into three or more families, our results do not coincide with those of the critics. In the case of three families, Devarim does seem to split off as its own family, as the critics claim, but there are a fair number of exceptions. And even with four or more families, no hint of the critics’ E/J split shows up at all.

Interpreting the Results

So does all this mean that we have proved that the Torah was written by at least two human authors, as the breathless reports claim? No.

First of all, as I noted above, our method does not determine the optimal number of families. That is, it does not make a claim regarding the number of authors. Rather, you decide in advance how many families you want and the method finds the optimal (or a near-optimal) split of the text into that number. If you ask it to split Moby Dick into two (or four or thirteen) parts, it will do so. Thus the fact that we split the Torah into two tells us exactly nothing about the actual number of authors.

Having said that, I want to temper any religious enthusiasm such a disclaimer might engender. First of all, with a few improvements to the method we could probably identify some optimal number of families for a given text. We simply haven’t done so. Second, the fact that – for the case of two families – the results of our method coincide (to some extent) with those of the critics would seem to suggest that the split the method suggests is not merely coincidental.

But, the deeper reason that our work is irrelevant to the question of divine authorship is simply that it does not – indeed, it could not – have a thing to say on that question. If you were to have some theory about what properties divine writing ought to have and close analysis revealed that a certain text probably did not have those properties, then you might have to change your prior belief about the divine provenance of that text. But does anyone really have some theory about what divine texts are supposed to look like? Several press reports about this work referenced the idea that “God could write in multiple voices”. I find that formulation a bit simplistic, but it captures the fact that any attempt to map from multiple writing styles to multiple authorship must be rooted in assumptions about human cognition and human performance that are simply not relevant to the question of divine action[8].

In short, our results seem to support some findings of higher Bible criticism regarding possible boundaries between distinct stylistic threads in the Torah. These results might have some relevance regarding literary analysis of the Torah. Taken on their own, however, they are not proof of multiple authorship. Furthermore, there is nothing in these results that should cause those of us committed to the traditional belief in divine authorship of the Torah to doubt that belief.

[1] http://news.yahoo.com/israeli-algorithm-sheds-light-bible-163128454.html

[2] http://www.haaretz.co.il/captain/spages/1233355.html

[3] M. Koppel, N. Akiva, I. Dershowitz and N. Dershowitz, (2011). Unsupervised Decomposition of a Document Into Authorial Components, Proceedings of ACL, pp. 1356-1364.

[4] S. Argamon, M. Koppel, J. Pennebaker and J. Schler (2009), Automatically Profiling the Author of an Anonymous Text, Communications of the ACM, 52 (2): pp. 119-123 (virtual extension).

[5] M. Koppel, J. Schler and E. Bonchek-Dokow (2007), Measuring Differentiability: Unmasking Pseudonymous Authors, JMLR 8, July 2007, pp. 1261-1276.

[6] M. Koppel, D. Mughaz and N. Akiva (2006), New Methods for Attribution of Rabbinic Literature , Hebrew Linguistics: A Journal for Hebrew Descriptive, Computational and Applied Linguistics, 57, pp. 5-18.

[7] מ. קופל, זיהוי מחברים בשיטות ממוחשבות: “גניזת חרסון”, ישורון כג (אלול ה’תש”ע), תקנט-תקסו.

[8] I realize that this argument comes close to asserting that the claim of divine authorship is unfalsifiable, which for some might cast doubt on the meaningfulness of that claim. A proper response to that concern would involve a discussion of the nature and content of religious belief, a discussion that is well beyond the scope of this brief peroration.

Machon Yerushalayim’s Book Week 2011

Machon Yerushalayim’s Book week 2011

by Eliezer Brodt

As previously mentioned there are many book sales this time of year all over Eretz Yisroel, and many publishers release new titles for the occasion. One of the publishing houses that makes a special sale every year is Machon Yerushalayim (MY). This year for the sale MY released many new titles. Over twenty years ago MY released a beautiful new edition of the classic work Minchas Chinuch. Over time this edition has become a bestseller, selling thousands of copies worldwide. In the introduction of the work they wrote of plans for a fourth volume which would include indices and a collection of all the notes and discussions of the various gedolim on the Minchas Chinuch. For various reasons that project never got off the ground. Bu there was another work that does collect all the notes and discussions of the various gedolim on the Minchas Chinuch and comment on them, from Rabbi Schlessinger. This work, although beautiful and full of great stuff, looked like it would shape up to be over twenty volume,s but no more volumes were released in the past fifteen years. Recently, for Pesach and Shavuos MY released samples of this original project. Last week they released the first volume of this edition which goes up to the forty-first Mitzvah. The work is excellent. It is more concise than Rabbi Schlessinger edition and similar to the style they have used in the Otzar Miforshei Hatalmud. I just have one small complaint – at this rate it will be around twelve volumes (although they estimate only six). Although it is nice to have the notes on the same page of the Minchas Chinuch, but most people have already bought the Minchas Chinuch in three volumes, so it would be much easier for them to put all the notes in two separate volumes like originally planned. Hopefully MY will offer this option in the future. Volume two of this work is due in a month or two. Another very valuable work just printed is the sefer Shulchan Melachim. This work is based on manuscripts from Rabbi Yitzchak Bueno who was a Rav in Yerushlayim over three hundred and fifty years ago. It is on Orach Chaim and is full of pesakim of his and other giants of his time. The Prei Chadash and Chida used it in manuscript. This volume contains many useful footnotes and is printed with the Shulchan Orach on the page. Another work just printed is the extremely important work Tashbetz Koton based on many manuscripts. There are over one hundred manuscripts of this work. A few years ago an edition of Tashbetz Koton was printed by Rabbi Schneerson with many excellent notes (he promised a second volume which has still not been printed). Rabbi Schneerson’s edition is based on the version of the sefer that was in front of the Beis Yosef, who quoted from it extensively. However one weakness of this edition of Rabbi Schneerson’s is that it’s in a complete different order than many of the other editions, making it very hard to use. The new edition of MY is based on the order that most quote the sefer, making it easier to use than Rabbi Schneerson’s. Besides for this it is based on many manuscripts and contains many useful notes to help one understand the text. One small problem I had with the MY edition is they did not reprint the notes of R. Yeruchem Fischel Perlow but it could be they could not get permission. Another very valuable work just printed is called Pinkaso Shel Shmuel from the Rashash. This work is a collection of indexes of thousands of topics found in the Rashash’s writings organized according to topics, including topics related to Kelalim, Toldos Tanaim V’amoraim and more. It is mindboggling to see how many topics the Rashash touched upon in his works! One weakness in my personal opinion of this work is they should have already brought down exactly what he says instead of in an index form as he writes very concisely, so the sefer would only be a little bigger and even better. The editor of this work seemed be unaware of the great work on the Rashash from Shua Englman, “Rabbi Samuel Strashun (HaRaShaSh) and his Hagahot on the Babylonian Talmud,” (PhD dissertation, Bar-Ilan University, 2008; Hebrew). All in all it’s a beautiful job. Another work worth mentioning is a volume called Iggros Rabbenu Chaim Me-volozhiner. This small work (117 pp.) is mostly a commentary to a one-page letter of R. Chaim Volzhiner. Another volume which is due out any day is the fourth volume to the Shut Ha-Tashbetz. Some other volumes printed are two more volumes to their edition of the Shulchan Orach– Yoreh Deah, another volume of Yad Dovid, another volume of Sharei Torah and a shu”t from R. Chaim Kafusi. A very special title of theirs printed last year is the Shiltei Giborim. See here for a nice article about this work. Hopefully I will return to this work in a future post. In addition to all this all their seforim are available at reduced prices. Email me at Eliezerbrodt@gmail.com if you are interested in a complete catalog.

Upcoming Kestenbaum Auction #51 – Alphonse Cassuto Collection Part 2.

Kestenbaum & Co. will be holding an auction this Thursday, June 23. The catalog is available online at the Kestenbaum site (link). This auction includes the second part of the Alfonso Cassuto collection which is heavily focused on books originating or relating to the Iberian Peninsula. One can read more about that collection at the website or see the last auction catalog.

In addition, there are a few controversial books of note. First, lot 136 is the exceedingly rare first edition of Toldoth Ya’akov Yosef, the first Hassidic work published. It is both rare and controversial because it was the first and thus subject to bans and book burnings. Second, lot 153, is R. Azariah de Rossi’s Me’or Eynaim, Mantua, 1574. Of course, this book too was subject to a ban, in this case by R. Yosef Karo. De Rossi attempted to preempt his critics by removing certain pages and replacing with “corrected” pages. Third, lot 159, is R. Ya’akov Emden’s polemic against the Frankist movement, Sefer Shimush, Altona c. 1758-62. This work too was banned by the Va’ad Arba ha-Artzot. Aside from the controversial nature of the work, the work is also notable for the illustrations it includes at the end depicting the punishment that is due the Frankist. Also see this post by On the Main Line for another notable illustration in Sefer Shimush. Here are the illustrations of the punishments:

Fourth, lot 253, R. Manasheh of Ilya’s Binat Mikra. R. Menasheh himself was a controversial figure, [see R. D. Kaminetsky, Ha-Gaon R. Menasheh me-Ilya, Yeshurun vol. 20, pp. 729-81]. In addition, this exceedingly rare work is also controversial in part because R. Menasheh records that the Gra himself told him that one is not limited to the interpretations of texts advanced by the Talmud. Finally, we have Nathan of Gaza, Tikun Krei’ah le-Chol Yom, Frankfurt O.M., 1666 (lot 271). Nathan was Sabbatai Tzvi’s “prophet.”

Turning to illustrations, we have a R. Issachar Baer Eilenburg’s Be’er Sheva, Venice, 1614, lot 157. The title page prominently displays a bare-breasted woman. It is worth nothing that this copy belonged to the Sadigur Rebbi, R. Nachum Dov-Baer Friedman, and his stamps also appear prominently on the title page.

This is not the only work belonging to the Sadigur Rebbi that contains such illustrations. Lot 269, is the Sadigur Rebbe’s copy of R. Avraham Rapa’s Mincha Belula, which contains R. Rapa’s herald that similarly contains bare-breasted women. Indeed, as previously discussed here and here some have attempted to alter the herald to make it less objectionable. Although lot 262 does not appear to have belonged to the Sadigur Rebbi, it too has similarly imagery, this time on the title page. In this case, it is a set of Mishne Torah, Amsterdam, 1702-03. Aside from the figures of Moses and Moses Maimonides apparently dressed as Greek philosophers flanking the title page, on the edifice at the top of the page there are two bare-breasted women.

This is not the only work from Maimonides that contains potentially objectionable imagery. Lot 260, is the Moreh Nevuchim, Sabbionetta, 1553, and in this case, the Greek mythological figures, Mars and Minerva appear at the bottom [for more on this title page see Marvin J. Heller, Mars and Minerva on the Hebrew Title-Page, Papers of the Bibliographical Society of America, 98:3, Sept. 2004 (now reprinted in Studies in the Making of the Early Hebrew Book, Brill, 2007)].

Finally, a few other books of note are included in this auction. Lot 244 is R. Shmuel David Luzzato’s personal copy of the first Hebrew bibliography by a Jew, R. Shabbatai Bass’s (author of the popular commentary on Rashi, Siftei Hakhamim), Siftei Yesheinim. For more on R. Bass, see this post. Lot 145 is the Sefer Avreikh, Munkatch, 1893, which as Marc Shapiro has pointed out is one of the works that are written by extraordinary precocious authors, in this case, he was nine years old; see this post. Lot 258 is the first English edition of R. Yehudah Areyeh of Modena’s Riti, translated by Edmund Chilmead.

Book Week 2011

Book Week 2011

by Eliezer Brodt

Book week has just begin in Eretz Yisrael. As I have written in previous years every year in Israel, around Shavous time, there is a period of about ten days called Shavuah Hasefer – Book Week (see here, here and here). Shavuah HaSefer is a sale which takes place all across the country in stores, malls and special places rented out just for the sale. There are places where strictly “frum” seforim are sold and other places have most of the secular publishing houses. Many publishing houses release new titles specifically at this time. In my reviews I sometimes include an older title if I just noticed the book. As I have written in the past, I do not intend to include all the new books. Eventually some of these titles will be the subject of their own reviews. I try to include titles of broad interest. Some books I cannot provide much information about as I just glanced at them quickly. I apologize in advance for any mistakes regarding transliteration. I also apologize in advance for using the word excellent so many times when describing seforim in this post! Additionally this year I am offering a service, for a small fee to help one purchase these titles (or titles of previous years). For more information about this email me at Eliezerbrodt-at-gmail.com. Part of the proceeds will be going to support the efforts of the the Seforim blog. Earlier this year I posted a list of recent academic titles see here. It’s worth checking back to this post in the next few days as I will probably add some more tiles, especially from Mechon Yerushalaim. 1) Magnes Press has many special titles this year. First and foremost is Benny Brown’s book on the Chazon Ish (see here). Another special title is the three volume critical edition of Sifri from M. Kahane. Another nice title is Kabbalistic Manuscripts and Textual Theory from Daniel Abrams. A nice collection of articles is Chinuch Vedas. There is a new collection of articles related to R. J. B. Soloveitchik called Rav Be-olam Ha-chadosh. There are also some new works on philosophy of Halacha. One is called Halacha, Meta Halacha, u-Philosph. Another one is called Halacha Ke-mecholles Shinu. There is another collection from E. Melamed called Midrashe Halacha shel Ha-amoroyim be-talmud Bavli. Another title is from S. Tzefatman called Rosh Ve-rishon. Another title is from Shlomo Simonsohn on the Jews in Sicily. Another nice collection is related to the Cario Genizah called Ha-kanon Ha-somu Min Hayin. Another nice title is a collection of articles related to Gan Eden called Gan Eden Me-kedem. In English, there is a new title called The Pinnacle of Hatred: The Blood Libel and the Jews.2) Kibbutz Hamechuad has a very special new title from Chanan Gafni called Peshuto Shel Mishana (a table of contents is available upon request). I highly recommend it and hope to return to it a post in the very near future. 3) The Israel Democracy Institute has some very good new titles. One is from Benny Brown (author of the work on Chazon Ish) dealing with R. Elyahsiv, Shlita and Rav Shach. Another work which looks interesting is a collection of articles called Rabnut HaAtgur (two volumes; a table of contents is available upon request). There are some other good titles there for good prices. 4) Caramel reprinted N. Krochamal’s classic work Moreh Nevuchei Ha-zeman.5) The Israel Academy of Sciences and Humanities does not have anything new but has some very good deals on old titles. 6) Merkaz Zalman Shazar has some new special titles. Amongst them is Amram Tropper’s Kechomer Beyad Hayotzer and the Hebrew translation of Professor Kanarfogel’s Peering through the Lattices, called Sod U-magiah Uprishut Bemishnsom Shel Balei Hatosfot. 7) Reuvan Mass has a few new titles. Amongst them works on the Maggid Me-Mezritch, Ha-shar Le-aon and a work on R. Tzvi Mezigitov, Al Derech Ha- avodah.8) The Bialik Institute has a few new titles, amongst them Lo Yossur Shevet Me-Yehudah in honor of Professor Simon Schwarzfuchs, which is a excellent collection of articles including pieces from of Professors Chaim Soloveitchik, Shnayer Z. Leiman, A. Grossman. M. Rosman, Eric Zimmer, R. Reiner ,Simcha Emanuel and many others (a table of contents is available upon request). Another nice title is Legalot Nistorot from Chana Werman. Aharon Shemesh has a book related to halacha in the Dead Sea Scrolls. Another title, Yosef Dat, is a Sefer Hayovel for Y. Salomon. In English there is a new title edited by Boaz Huss called Kabbalah and Contemporary Spiritual Revival. 9) Bar Ilan University has some new titles. The latest volume of Sidra is something specia,l with an all star line-up of writers (see here). A good title for those interested in academic Talmud is called Meleches Machsevet (see here). There are new volumes of Badad and Iyunei Mikra. Another title is on Zionist movement in Poland (see here). 10) Mechon Ben Zvi has some new titles, amongst them some new volumes in their set of critical editions of classics of Sefer ha-Makabim. Other works include Safrut Ha-mikra and a new translation of Shemoneh Perakim of the Rambam from M. Schwartz and two more volumes in their special series.