On Verifying Wordprint Studies:
Book of Mormon Authorship
John L. Hilton is adjunct professor of statistics at Brigham Young University. This chapter is reprinted by permission, with slight alteration for clarification. First published in BYU Studies 30/3 (1990): 89—108.
In 1980 Wayne A. Larsen, Alvin C. Rencher, and Tim Layton published the first complete analysis of the Book of Mormon using the then adolescent tool of computerized stylometry, or wordprinting.1 They analyzed author-specific word-use rate to show that the purported authors in the Book of Mormon are statistically different—that not one but many authors contributed to the book. Since then the science of wordprinting has continued to undergo considerable critical evaluation, particularly in its application to the Book of Mormon.
Shortly after BYU Studies published Larsen, Rencher, and Layton’s pioneering work, I joined forces with a small group of scientists in Berkeley, California, who were attempting to verify the accuracy of wordprinting in general and to check the Larsen-Rencher-Layton results specifically. After seven years of study and development, we concluded that wordprint measurements are now at the stage where scholars can use such tests confidently and without personal bias to analyze contested authorship in many literary works, including the Book of Mormon. This paper explores our conclusion by (1) discussing some general ideas about wordprints and wordprinting, (2) reviewing some early wordprint studies in the evolution of wordprint science, (3) summarizing the development of a new measurement technique, including important control studies to verify the objectivity of that technique, and (4) setting forth some verified Book of Mormon measurements. Before proceeding, I will establish the need for wordprints for the Book of Mormon and also discuss one important caveat.
The need for rigorous, legitimate wordprint measurements is obvious in attempting to settle some of the most prominent controversies surrounding the Book of Mormon: Are the word patterns of Joseph Smith, Oliver Cowdery, or Solomon Spaulding measurable in the Book of Mormon? Can wordprinting show that different sections of the Book of Mormon were written by different authors? Does Joseph Smith’s role as translator obfuscate patterns unique to ancient authors? Fortunately the Book of Mormon is a near-ideal document for such objective wordprint studies, provided the measurement is made correctly.
Of course, wordprint analysis, while it can measure certain facts objectively, cannot prove the holiness of the Book of Mormon. The understanding that the Book of Mormon has a divine origin is obtainable only by developing faith. Thus, while valid and objective wordprinting is no substitute for faith, wordprinting can, nevertheless, bolster the establishment of faith by rigorously demonstrating factual information about the book.
Wordprints and Wordprinting
Wordprinting is a developing science, notwithstanding that the first written suggestions that something like wordprinting might be useful in objectively identifying authors appeared at least as early as 1851. Yet, because of the complexity of the measurements, the first credible studies had to await the availability of modern computers with their precise counting accuracy and high-speed computation. Therefore, wordprinting has undergone almost all of its significant development during the last thirty years.
As is common in all developing sciences, wordprinters have had to identify and abandon those preliminary methods and theories that were later shown to be inaccurate. However, while wordprinting will undoubtedly continue to evolve toward ever-increasing reliability and sensitivity, the science has now developed to the point where one can construct a conservative, rigorous, measuring technique which yields reliable answers when measuring singly authored documents of at least a few thousand free-flow, original words.2 (In the context of wordprinting, free-flow words are written without outside influence or superimposed structures that change an author’s personal word selection.)
Many people have difficulty believing that a clever author cannot fool a rigorous, quantifiable approach to measuring fixed writing habits. After all, when we read the fictional words of characters created by a good author, we all think the narrative sounds like different people telling the story. Nevertheless, wordprint measurements taken with our most recent methodology continue to show that there are extensive noncontextual word patterns hidden in the narrative that are unique to each author regardless of the character portrayed. Our wordprinting technique has shown that most highly skilled authors (e.g., Twain, Johnson, Heinlein, etc.), when intentionally trying to imitate the writings of different persons, are unable to successfully change their own free-flow noncontextual word patterns enough to simulate a different wordprint. Because of the mind’s inability to consciously recognize the extent of word patterns that are tabulated in the computer-assisted wordprint measurement, wordprinting is practically immune to deception by a forger.3
Most modern wordprint techniques measure only the placement of “noncontextual” words. Noncontextual words like the, and, a, of, etc. are often capable of being interchanged or even dropped without a loss of overall meaning; they seem to add little in context information, often being consciously ignored by writer and reader alike. Obviously, measuring noncontextual words makes word-printing less sensitive to the subject matter. In addition, the technique improves statistical accuracy. Noncontextual words typically make up 20% to 45% of the total text, thereby providing a high number of statistical “events,” and the larger the statistical measurement is, the more reliable the results are. Wordprint measurements made from large numbers of noncontextual words continue to show that an author’s free-flow writings use these words in a habitual, nearly subconscious, unique way.4 However, if the author consciously imposes an external structure, the free flow of the author’s wordprint pattern is modified, and accurate wordprint measurements become more difficult to obtain.
Wordprinting measures the difference in the way noncontextual word patterns occur in two compared texts. Usually one of the texts is of disputed authorship while the other is by an author suspected of writing the disputed text. If the same word pattern is found to be statistically different between the two texts, we identify the difference as a rejection.5 The total of the rejections measured when the two texts are tested for a large number of word patterns is identified as the number of rejections. The larger the number of rejections, the more likely the disputed text was not written by the author of the other compared text. Thus, testing a contested document against comparable texts from all possible candidate-authors will identify the most likely writer by eliminating authors whose texts generate high numbers of rejections.
Finding the most likely writer depends on the word-printing technique’s accuracy. The accuracy (and usefulness) of a wordprint measuring technique critically depends on the statistical reliability in detecting which of its tested text pairs are not written by the same author. Statistical reliability is rigorously demonstrated by using the technique with a large number of control-author texts for the purpose of verifying the authorship of known texts. These texts correspond in size and include examples of the different literary parameters (genre, subject matter, writing period, position in an author’s career) that are to be studied later. The verifying measurements made between two control texts written by the same author are identified as within-author tests. The tests between texts written by different authors are called between-author tests. The statistical separation measured between the overall distributions of a large number of the within-author and between-author tests is the valid measurement of what will be expected when a contested author is later tested with the same technique. In other words, the difference between the number of rejections found between texts by the same author and texts by different authors will serve as a standard. This standard is used to evaluate the numbers of rejections found when testing texts of contested authorship.
Measuring the differences in word patterns between texts is the basic process of a wordprinting technique. Verifying such a technique, while straightforward in principle, is in practice very tedious. Thus, during the years of wordprinting development, many proposed wordprint measuring systems were verified only superficially on a narrow set of texts. Unfortunately, researchers often assumed that a wordprint measuring technique shown valid for one set of literary parameters would also be valid for all others. We now realize such assumptions are not valid; we must successfully verify each wordprint measuring methodology with control texts which represent all the literary parameters that are to be reliably measured later on.
Some Early Wordprint Studies
Perhaps one of the earliest successful wordprint studies in the United States was the classical work by the statisticians Frederick Mosteller and David L. Wallace, who published their work on author identification in 1964.6 While not the first scholars to attempt computer-assisted stylo-metry, they published one of the first complete and internally consistent studies on a set of historically important documents. Their work convincingly identified the author of several anonymously published Federalist Papers. Mosteller and Wallace measured the rates at which simple, noncontextual words were used per 1000 words of text. This statistical model appeared adequately sensitive and valid to unambiguously show that James Madison was the author of the disputed documents. They showed that the other two possible candidate-authors were overwhelmingly excluded as authors of any of the twelve disputed documents.
The wordprint study on the Federalist Papers had several advantages which facilitated statistical measurements. First, the documents are lengthy, were written in the same genre on the same subject, and have essentially the same vocabulary. Additionally, for control texts Mosteller and Wallace were able to use uncontested writings by the suspected authors, writings which are of the same length, genre, subject, and vocabulary as the suspect texts. That the simple Mosteller and Wallace wordprint technique had been shown to be valid only for their single, nearly ideal class of texts was at first not appreciated as important.
Not all succeeding studies had documents that presented as favorable a situation as did that of Mosteller and Wallace. In addition, most later wordprinters did not execute their studies in such a thorough way. Many omitted any independent control studies to confirm that their wordprint techniques were valid for their given case. As a consequence, some published studies purportedly giving objective answers later proved to be inaccurate.
Rev. A. Q. Morton of Edinburgh, Scotland, a long-time contributor to the development of wordprinting, was one of the scholars who recognized that the simple, noncontextual word-use rate (i.e., the frequency with which each of the noncontextual words is used per 1000 words of text), as studied by Mosteller and Wallace, was not always reliable for authorship measurements.7 Working with several colleagues, he discovered that better “stylometric” measurements were obtained when he extended his studies to measure carefully chosen noncontextual word-pattern ratios. By 1985 he had studied several different types of word patterns and recommended a battery of about 65 word patterns which had been successfully used in many different literary situations. We have found his 1985 list to be generally reliable. (See Appendix 1.)
A recent study (1986) that further verified the usefulness of Morton’s word-pattern ratios over the simple noncon-textual word-use rate is the methodical work of Kendra L. Lindsay. She studied noncontroversial Greek documents of seven classical writers chosen for their comparability to the writings of Paul of the New Testament. She found that by using the standard statistical assumptions and analyzing the texts by counting the simple noncontextual word-use rate, she was able to correctly identify only 2 of the 7 authors. However, when she measured the ratios of word-pattern counts, she correctly identified 6 of the 7.8
The first extensive wordprint measurements of the Book of Mormon appeared in 1978 when Alvin C. Rencher and Wayne A. Larsen began reporting their pioneering study in author identification. This work was followed by their complete report in 1980.9 They also coined the term wordprint, and introduced to Church and world scholars the interesting possibility of objective author identification in the Book of Mormon. They used information gained from earlier approaches and applied the simple noncontextual word-use rate of Mosteller and Wallace’s technique but coupled it with a powerful, multivariate statistical analysis.
Unlike previous studies which introduced the concept of hand-tabulated word measurements to the Book of Mormon,10 the 1980 wordprint study published by Larsen, Rencher, and Layton was widely recognized as important both within and without the Church.11 If the measurement technique was in fact objective and verifiable, any competent student could duplicate the calculations to determine answers to a number of questions that have remained controversial among Book of Mormon believers and detractors.
Along with others who found the reported work of the BYU team of Larsen-Rencher-Layton interesting and challenging was a small group of scientific researchers in northern California to which I belonged. Our group, later known as the Berkeley Group, included major contributors from different scientific disciplines and differing religious persuasions. All of us shared the scientific curiosity which led us to test the intriguing Larsen-Rencher-Layton claim. In the fall of 1980, we began our study. As the major LDS contributor in the group, I was little different from my agnostic and Jewish colleagues: each of us seriously questioned whether objective measurement could determine who did or did not write a controversial document like the Book of Mormon. Therefore, armed with a healthy skepticism, we began a confirmational study—the kind of study scientists typically perform in the physical sciences—to recalculate the wordprint measurements while correcting any procedural or calculational flaws which could potentially have confused the results of the original study.
Because most members of the Berkeley Group doubted that stable wordprints could be objectively measured in the writings of most authors, we were not willing to accept the standard assumptions of the Larsen-Rencher-Layton study. Therefore, we began developing a completely new set of computer codes based on a very conservative, independently derived and verified theoretical model. While we tentatively thought that our study to verify Book of Mormon wordprints could be completed in a year, it soon became apparent that, with the redevelopment of wordprint theory as part of the work, the study would take much longer. It was not until September 1987, after perhaps 10,000 hours of work, that a paper describing the results of our efforts was completed.12
While one part of our Berkeley Group was redeveloping and verifying wordprint theory, others of us prepared a computer file of the earliest available Book of Mormon manuscripts (see Appendix 2). All reported Book of Mormon wordprint measurements in this paper were computed from files of the needed length, author, and literary form taken from this “Most Primitive Book of Mormon Manuscript.”13
During the time our Berkeley Group was doing its work, other Book of Mormon scholars were also studying the approach proposed by the Larsen-Rencher-Layton team. One of the most notable of these is the University of Utah statistician, D. James Croft. His work is that of a competent scholar as well as a conscientious believer in the divinity of the Book of Mormon. His published work is a carefully reasoned critique of the Larsen-Rencher-Layton paper.14 As would be expected from a scholar of the exact sciences, he cautioned his LDS readers about the unverified nature of the methodology: “Close scrutiny of the methodology of the BYU authorship study reveals several areas which seem vulnerable to criticism.” After calling for a redevelopment of methodology which could circumvent the specific areas he found questionable, he concludes, “Certainly any research done in the future will be indebted to Larsen, Rencher and Layton, who called our attention to an interesting and challenging area of Book of Mormon study. At the present time , however . . . it would be best to reserve judgment concerning whether or not it is possible to prove the existence of multiple authors of the Book of Mormon” (21).
We kept in close contact with Dr. Croft and others15 who were contributing to the continuing refinement of wordprinting during the years when our independent methodology was under development. We appreciated the continuing contributions of these scholars as they helped us insure that the suspect areas recognized in the earlier methodologies would be avoided and that the verification of our new wordprint measuring technique would be complete enough to insure reliable answers.
The rationale for our wordprint model and methodology was developed from basic information theory and basic statistics. Our resulting model was conservative and yet still able to calculate answers for the Book of Mormon authorship questions with very high statistical certainty. All results reported in this paper were calculated using this methodology. A detailed description of the evolution of the model and methods is reported in “On Maximizing Author Identification by Measuring 5000 Word Texts” by John L. Hilton and Kenneth D. Jenkins.16
Our new conservative measurements incorporate six points which were not used in earlier Book of Mormon wordprint studies. These points contribute to improved reliability when 5000-word texts are tested. They are (1) measuring the author’s wordprint by studying the use rate of sixty-five noncontextual word-pattern ratios as proposed by Morton (1985); (2) abandoning the commonly accepted statistical assumption of “normality” of word distribution and instead using the Mann-Whitney nonparametric statistic, which does not require the unverifiable normality simplification; (3) developing a “wrap-around” word-group counting method which helps break apart clusters of similar words in the sampled text words (this method helps provide the statistically required word-group homogeneity); (4) making comparison measurements between just two texts at a time; (5) using the oldest extant Book of Mormon manuscripts (the texts used do not include the repetitive use of the phrase and it came to pass, nor do they include significant direct quotations from the King James Bible—including such text would distort the noncontextual word counts for each author); and (6) verifying the sensitivity of the computer coding and measurement methodology by measuring a diverse set of texts of nondisputed authorship which represent the appropriate literary parameters.
Developing and Verifying the Technique
Deriving the model becomes relatively unimportant compared to designing the control studies which verify or disprove the validity of the method. For our control studies, we specifically chose a representative set of literary texts which would test the extremes found in English-language writings. When we tested these control texts, we found that our technique yielded well-defined, bell-shaped distributions, showing that our new wordprint technique is essentially insensitive to the textual changes introduced by the differing literary parameters of genre, subject matter, writing period, position in an author’s career, or normal publication editing.
Specifically, this extended verification study tested the validity of our model by calculating 325 diverse wordprint tests. These tests studied 26 noncontroversial 5000-word texts which had been written under various conditions by nine different control authors (see Appendix 3). The within-author and between-author results rigorously supported the basic wordprint assumption: although all authors have many writing habits in common, they each show measurably unique, stable rates for some noncontextual word patterns. Among the nondisputed documents that were used in the testing were texts by Oliver Cowdery and samples of Joseph Smith’s autographic and dictated writings.
We also studied English translations of semiclassical texts written by different German authors. These academic translations were all carefully done by the same German-to-English translator. The wordprint measurements regarding translations provided three significant results: (1) each translated author is consistent within himself; (2) when several German authors are translated by the same person, the English rendition of each author is clearly separable from the others; and (3) the translator’s other English writings have consistent wordprints that differ from any of his translated works.17 These findings demonstrate that, at least when an academic translator tries to produce a close translation from one modern language to another, the uniqueness of an original author’s wordprint can actually survive the translation process.18
The results of our verification tests are displayed in figure 1. Thirty-three of these tests are made by comparing texts written by the same author; 292 of the tests compared one author’s writing against that of another author. The black bars represent the 33 within-author measurements, which yield a statistically smooth distribution peaking at about 2 rejections, a result that is theoretically expected.19 The distribution peak for between-author comparisons is about 7 rejections. Therefore, about two-thirds of the true between-author measurements fall above even the extremes of the within-author distribution. This result means that when any 5000-word disputed text is tested against a known author’s comparable works and measures 7 or more rejections, the two texts are very likely not written by the same author.20 The lower the number of rejections, the greater the likelihood that the two texts were written by the same author; the higher the number of rejections, the more likely that different authors composed the two compared texts.
If we have only two 5000-word texts and their paired testing measures 1 to 6 rejections (as is expected for a true between-author pair in about one-third of the cases), we cannot assign authorship unambiguously because the within-author and between-author distributions overlap each other in this range. Similarly, for the few tests (about ten percent of the true within-author cases) that measure zero rejections, there is a high probability that the compared texts were written by the same author.
Some Book of Mormon Wordprint Measurements
We wished to make the most conservative measurement possible; therefore we compared the two Book of Mormon authors who have the largest number of 5000-word texts. Further, even though our verification testing showed that our new wordprint measuring technique is not unduly sensitive to normal changes of genre, we still chose the more conservative comparison by testing only within the same literary form. Therefore, we selected for our critical Book of Mormon verification measurements three independent, 5000-word texts from the didactic writings of each of the two major purported Book of Mormon authors, Nephi and Alma. Those texts are the largest same-genre pair in the book. Besides eliminating any possible lingering concern that changing genre might artificially cause additional rejections, the use of the didactic genre has the advantage of essentially excluding the possibly troublesome phrase and it came to pass. This phrase is the only phrase used repetitively enough in the Book of Mormon to be troubling to wordprint measurements.
Our results are displayed starting with figure 2, which shows the distribution of the number of wordprint rejections for the six possible within-author tests of Nephi against Nephi and Alma against Alma. The within-author tests for both show the same distribution as the within-author tests of our control studies, shown in grey in figure 2.
Figure 3 is a plot of the rejection distribution calculated from the between-author tests of direct interest to the Book of Mormon authorship question. The black bars show the comparisons of the texts purportedly written by Nephi when tested against those purportedly written by Alma. The tests show the same relatively large number of rejections found in the between-author distribution in figure 1 (shown in figure 3 in grey), which was derived from the comparisons made between the texts of the different control authors.
Table 1 shows the measurements for the individual wordprint tests used in producing figures 2 and 3. Taking the comparisons of Nephi versus Alma, we found that in eight of the nine tests, 5 or more rejections resulted. Four of these tests produced 7, 8, 9, and 10 rejections. These four high-rejection tests (which yielded 7, 8, 9, and 10 rejections) independently measure a statistical confidence of greater than 99.5%, 99.9%, 99.99%, and 99.997% probability that the measured rejections show that the author’s patterns are very different consistent with Nephi’s texts having been written by a different author than wrote Alma.21 Therefore the Book of Mormon measures patterns of different authors according to its own internal claims.
For the within-author comparisons of the Nephi vs. Nephi and Alma vs. Alma texts, the rejections range from 1 to at most 5, with the most numbers of rejections peaking at 2. Similarly, the other within-author tests show a tight internal consistency between the two Oliver Cowdery, two Solomon Spaulding, and three Joseph Smith 5000-word texts.22
By using a new wordprint measuring methodology which has been verified, we show that it is statistically indefensible to propose Joseph Smith or Oliver Cowdery or Solomon Spaulding as the author of the 30,000 words from the Book of Mormon manuscript texts attributed to Nephi and Alma. Additionally these two Book of Mormon writers have wordprints unique to themselves and measure statistically independent from each other in the same fashion that other uncontested authors do. Therefore, the Book of Mormon measures multiauthored, with authorship consistent to its own internal claims. These results are obtained even though the writings of Nephi and Alma were “translated” by Joseph Smith. We also described control studies of modern language academic translations where, in practice, a single translator can consistently preserve the unique wordprints of the several original authors he has translated.
Useful noncontextual word patterns meet the following conditions: they yield an unambiguous count, they occur frequently, they have common alternate expressions, their use rates tend to become habitual, and they are minimally affected by the period of the writer’s career, the subject matter, and the genre. Therefore, useful word patterns are typically made up of key words such as common articles, conjunctions, and prepositions. Measurements are calculated from the ratio of the overall key-word-use rate against the same key-word-use rate in certain sentence positions, word collocations, proportional pairs, or the use of key words adjacent to certain parts of speech and novel vocabulary words.
After defining sentence as all groups of words ending in a logical full stop, Morton (1985) lists the symbols used to interpret his battery of word-pattern ratios as follows:
# represents the number of end of sentence markers fws represents the first word in a sentence lws represents the last word in a sentence 2nd lws represents the second to last word in a sentence fb means “followed by” pb means “preceded by” x represents any word r+l means that the word to the right and left are unique within the original 1000-word block.
For example, the test “A(fws)/#” yields this ratio: the number of times A appears as the first word in a sentence divided by the total number of sentences. Morton’s word-pattern ratios follow:
|A(fws)/#||AS x AS/AS||A(fws)/#|
|AN(fws)/#||AS x x AS/AS||AN(fws)/#|
|OF(2nd lws)/#||I(fb HAVE)/I||OF(2nd lws)/#|
|THE(fws)/#||I x I/I||THE(fws)/#|
|THE(2nd lws)/#||I x x I/I||THE(2nd lws)/#|
|WITH(2nd lws)/#||IN(fb A)/IN||WITH(2nd lws)/#|
|A(2nd lws)/A||OF(fb A)/OF||A(2nd lws)/A|
|A(fb adj)/A||OF(fb A)/OF||A(fb adj)/A|
|A(fb x AND)/A||OF(fb THE)/OF||A(fb x AND)/A|
|A(fb x OF)/A||OF(fb x AND)/OF||A(fb x OF)/Av|
|A x A/A||THE(pb AND)/THE||A x A/A|
|A x x A/A||THE(pb OF)/THE||A x x A/A|
|AND(fb adj)/AND||THE(pb IN)/THE||AND(fb adj)/AND|
|AND(fb THE)/AND||THE(pb TO)/THE||AND(fb THE)/AND|
|AND(fb x OF)/AND||THE(fb x AND)/THE||AND(fb x OF)/AND|
|AND x AND/AND||THE(fb x THE)/THE||AND x AND/AND|
|AND x x AND/AND||THE(fb x x THE)/THE||AND x x AND/AND|
The photonegative of the 1966 filming of the Book of Mormon printer’s manuscript was courteously supplied, without endorsement, by the History Commission of the RLDS Church. By October of 1982, a board of seven editors prepared a primitive Book of Mormon text using the following sources: (1) a computer file of the 1830 Palmyra first printed edition of the Book of Mormon developed in the BYU Language Research Center by L. K. Browning, (2) the photo-offset copy of the first edition printed by Wilford C. Wood, (3) a copy of the text of extant sections of the original dictation manuscript collected by L. K. Browning, and (4) the complete printer’s manuscript. The editors prepared a composite file of the oldest sections from each manuscript to complete a Book of Mormon text computer file which we named “The Most Primitive Book of Mormon Manuscript Text.” The editors also prepared and verified line headers which identified the apparent original author, the literary form, modern book, chapter, verse, and line notation for each line of text. Similar line headings are now published in Book of Mormon Critical Text: A Tool for Scholarly Reference, Foundation for Ancient Research and Mormon Studies (hereafter cited as FARMS) STF-84aa, 3 vols. (Provo, Utah: FARMS, 1984—87).
All control-author samples were drawn from what were thought to be statistically independent source texts from each author’s heretofore noncontested works. Care was taken in author and text selection so as to represent a wide variety of writing ability, general background, time period, literary training, genre or literary form, working vocabulary, and apparent purity of the nominally specified single author. The authors and texts (of 4998 words each unless marked otherwise) used in the verification study are as follows:
I. Samuel Clemens (Mark Twain)
- Does the Race of Man Love a Lord? an essay on American and European mores (1902) in The Complete Humorous Sketches and Tales of Mark Twain, ed. Charles Nelder (New York: Doubleday & Company, 1961), 686—96.
- “Early Days,” a narrative (1875) in Mark Twain’s Autobiography (New York: Harper & Brothers Publishers, 1875), 81—123.
- “Extracts from Adam’s Diary,” fanciful fiction, a spoofing translation, likely a satire on the Book of Mormon (1893) from “The Diary of Adam and Eve” in The Complete Short Stories of Mark Twain, ed. Charles Nelder (New York: Doubleday and Company, 1985), 272—80, 288—94.
- “Eve’s Diary (Translated from the Original),” companion to “Extracts from Adam’s Diary” (C. above), author attempting to write for two different people (1905), 281—8.
II. Oliver Cowdery
- Written religious discourse and biographical essays from Messenger and Advocate (1830).
- A second selection from the same article series as used in (A) (1830).
III. Dr. William Dodd
- Life of William Shakespeare, an essay, only 3528 words (about 1770). Photocopy in possession of the author, original found in Yale Library.
IV. Robert Heinlein
- The Number of the Beast, fanciful science-fiction narrative; first-person narrative chapters simulating the writing of his character Hilda (New York: Ballantine, 1980).
- A second selection from The Number of the Beast, chapters simulating the first-person narrative of his character Deety (A. above).
V. Samuel Johnson
- The Rambler, first part of the newspaper essays (1750).
- A second selection from The Rambler (1751).
- The Idler, newspaper essays (1758).
- A Journey to the Western Islands of Scotland, a personal travelogue (1775).
- A second selection from (D) above (1775).
- The Fountains: A Fairy Tale, fanciful narrative (1766), only 4879 words (London: Elkin Mathews and Manot, 1927), 9—48.
VI. Joseph Smith
- Autographic letters to wife Emma, friends, and the Church (1834—38) in The Personal Writings of Joseph Smith, comp. and ed. Dean C. Jessee (Salt Lake City: Deseret Book, 1984).
- A second selection from (A) above (1836).
- Pearl of Great Price, Joseph Smith—History 1:1—75, dictated and carefully polished with the assistance of his clerks (1834—38).
VII. Harry Steinhauer
- “The Novella,” an essay, written in English, in Twelve German Novellas, ed. and trans. Harry Steinhauer (Berkeley: University of California Press, 1977), Introduction, ix—xxiii.
- A second selection from (A) above plus 1000 words from (C) below (1977 and 1974).
- Heine and Cecile Furtado: A Reconsideration, biographical essay, written in English, Modern Language Notes 89 (April 1974): 422—47.
VIII. Heinrich Von Kleist
- Michael Kohlhaas, novella, written in German (about 1850), trans. Harry Steinhauer (1977—see VII. A. above).
- A second selection from (A) above (about 1850).
- A third selection from (A) above (about 1850).
XI. Christoph M. Wieland
- Love and Friendship Tested, novella, written in German (about 1770), trans. Harry Steinhauer (1977—see VII. A. above).
- A second selection from (A) above (about 1770).
Book of Mormon wordprint studies are ongoing. The author notes, “This paper would not be possible were it not for seven years of critical work by Kenneth Jenkins, a gifted scientist with an untiring demand for accuracy, and by Lewis Carroll, whose time and knowledge of information theory contributed significantly to the statistical accuracy of our wordprint model. Thanks is expressed to all of the many participants who worked on each of the projects of the Berkeley Group. The editorial assistance and continuing encouragement from my wife Jan, my son Courtland Hilton, Dow Wilson, and John Welch is gratefully acknowledged.”
1. For Larsen, Rencher, and Layton’s Book of Mormon wordprint study, see “Who Wrote the Book of Mormon? An Analysis of Wordprints,” BYU Studies 20 (spring 1980): 225—51.
2. For a detailed discussion of wordprinting single-authored texts with a few thousand words, see John L. Hilton and Kenneth D. Jenkins, “On Maximizing Author Identification by Measuring 5000 Word Texts” (Provo, Utah: FARMS, 1987).
3. Works known to be written prior to computer-aided authorship are essentially immune. In principle one can argue that a modern, computer-assisted forger could manufacture a document capable of deceiving an authorship measurement. To attempt such a forgery would be an enormous task and would still leave the forger unsure beforehand as to which of all of the possible word patterns the wordprinter would ultimately use to test the manufactured document. Of course, such a fraudulent document would be susceptible to detection by the standard procedures now used to identify any pastiche.
4. To be a valid measurement, the words must be essentially the free-flow choice of the purported author. Extensive quoting of someone else’s words is different from free paraphrasing and, of course, tends to produce a wordprint closer to the pattern of the one being quoted. Further, deliberately writing to an externally imposed pattern which restricts the normal noncontextual word choices of the writer or repetitively using normally noncontextual words in textually important ways can also change the wordprint patterns. For an example of deliberate change in a wordprint, see Tim Hiatt and John Hilton, “Can Authors Alter their Wordprints? Faulkner’s Narrators in As I Lay Dying,” Selected Papers from the Proceedings of the Sixteenth Annual Symposium, ed. Melvin Luthy (Provo, Utah: Deseret Language and Linguistic Society, 1990). Examples of these wordprint problems found in the Book of Mormon are the extensive quotations from the King James Bible and the repetitive use of the phrase and it came to pass. Proper wordprint testing must take these special problems into account.
5. A rejection results from the statistical calculation of a null-hypothesis rejection (p<.05) for any one of the tested word patterns as the two texts are compared. A rejection is considered statistically useful only for word patterns that can be found five or more times in either of the compared 5000-word texts.
6. For the 1964 study, see F. W. Mosteller and D. Wallace, Inference and Disputed Authorship: The Federalist Papers (Reading, Mass.: Addison-Wesley, 1964); second edition published as Frederick Mosteller and David L, Wallace, Applied Bayesian and Classical Inference: The Case of the Federalist Papers (New York: Springer-Verlag, 1984).
7. Morton’s arguments for using word-pattern ratios instead of simple word-use rates are found in A. Q. Morton, Literary Detection: How to Prove Authorship and Fraud in Literature and Documents (New York: Charles Scribner’s Sons, 1978).
8. Kendra L. Lindsay, “An Authorship Study of the Pauline Epistles” (master’s thesis, Brigham Young University, 1986).
9. Larsen, Rencher, and Layton, “Who Wrote the Book of Mormon? An Analysis of Wordprints.”
10. Perhaps the most significant of the precomputer studies was Glade L. Burgon’s “An Analysis of Style Variations in the Book of Mormon” (Master’s thesis, Brigham Young University, 1950).
11. Some publications that support Larsen, Rencher, and Layton’s work, besides those referenced in nn. 8 and 14, include New Era 9 (November 1979): 10—3, and Noel B. Reynolds’s Book of Mormon Authorship: New Light on Ancient Origins, Religious Studies Monograph Series, vol. 7 (Provo, Utah: BYU Religious Studies Center, 1982).
Perhaps the latest neutral reference to their work, representing those in the scholarly community, would be Joseph Rudman at the Dynamic Text Conference, Toronto, Canada, 7 June 1989. In his presentation on authorship attribution in the literary computing session, Rudman noted their work as significant.
Among the anti-Book of Mormon references, likely the most extensive work provoked by the Larsen-Rencher-Layton study was an attempt at a wordprint measurement by Ernest H. Taves as reported in his book Trouble Enough: Joseph Smith and the Book of Mormon (Buffalo, N.Y.: Prometheus Books, 1984), 225—60. Unfortunately, the Taves study was fundamentally flawed as described in the critique of his work (John L. Hilton, “Review of Ernest Tares’ Book of Mormon Stylometry,” [Provo, Utah: FARMS, 1986]), and therefore did nothing to add to or detract from their work.
12. Hilton and Jenkins, “On Maximizing Author Identification.”
13. The Berkeley Group prepared extended word listings and counts from this composite Book of Mormon manuscript computer file during the time of its preparation and verification. These studies are in my possession. Representative of these studies are the following: “A Listing of the (Salt Lake) Book of Mormon References to Passages from the Text of the Printer’s Manuscript of the Book of Mormon for the Twenty-Four Major Authors, Their Literary Forms and Word Counts”; “Differences between the 1830 Edition and the Printer’s Manuscript of the Book of Mormon”; “Word Counts and Listings of Modern (Salt Lake) Book of Mormon References to Passages from the Text of the Printer’s Manuscript of the Book of Mormon for Each of the Nineteen Authors Having More than 2000 Words in a Single Literary Form”; “Individual Vocabularies and Word Counts for Each of the Twenty-Three Sections Which Were Assigned as a Single Literary Form from Text Taken from the Printer’s Manuscript of the Book of Mormon”; “Common Phrases between the King James Bible and the Book of Mormon.”
14. D. James Croft, “Book of Mormon ‘Wordprints’ Reexamined” Sunstone 6 (March—April 1981): 15—21.
15. Significant assistance was received from Yehuda Radday of the Department of General Studies, Technion University, Haifa, Israel; Kenneth R. Beesley, graduate student working with Sidney Michaelson and A. Q. Morton, University of Edinburgh, School of Epistemics, Edinburgh, Scotland; and A. Q. Morton, The Abbey Mannse, Culross, Fife, Scotland. Personal communications.
16. Hilton and Jenkins, “On Maximizing Author Identification.”
17. Subsequent to our study of the works of two German authors, we extended our work to include three more semiclassical German novella authors, all of whom had been translated by the same German-to-English translator, Harry Steinbauer. All of our new measurements gave the same results as before: each German author’s translated work was internally consistent but distinctly different from all other translated authors’ measurements.
18. Not all translators need show these differing patterns. Some translators think their nonliteral “free translation” is preferable. Complete free translations could be expected to yield only the translator’s personal paraphrase of the ideas from the original text. In the extreme, free translations would produce only a single wordprint pattern for all of the translator’s personal writings and translations of different foreign authors’ works.
19. Typically between 40 to 47 of Morton’s 65 word patterns are measured often enough to be accepted as statistically useful. We therefore expected that true within-author comparisons show an average rejections number at slightly over 5% (i.e., .05 x 40=2) as we compared the two texts, at alpha .05 or 95% probability. Our results confirmed our expectations.
20. The level of confidence that two texts were written by different authors is calculated using the number of measured rejections against the full within-author distribution of rejections. Using a one-tailed student “t” test from xbar=2.58, s=1.60, df=32, we find:
7 rejections (t=2.76) gives >99.5% confidence that the two texts are statistically different and therefore written by different authors.
8 rejections (t=3.39) gives >99.9% confidence that the two texts are statistically different and therefore written by different authors.
9 rejections (t=4.02) gives >99.99% confidence that the two texts are statistically different and therefore written by different authors.
10 rejections (t=4.64) gives > 99.997% confidence that the two texts are statistically different and therefore written by different authors.
21. Furthermore, because the data are categorical and in a statistical sense (approximately) independent, the probability is vanishingly small that Nephi and Alma could have had the same author in spite of all four texts measuring with high rejections. The combined probability would approach 1.3 x 10-14. (This calculation is simply the product of each of the four probabilities for same authorship—one minus the probability for different authorship reported above—which would be .005 x .001 x .0001 x .00003 = 1.3 x 10-14.) Approximate independence of the four paired-test texts is assumed, as is customary in wordprinting (see A. Q. Morton, 154—5, n. 7). This approximate simultaneous calculation shows an enormous statistical overkill, demonstrating overwhelming statistical separation between the didactic writings of the purported Book of Mormon authors Nephi and Alma.
22. Care was taken to insure that the texts used to represent the free-flow writing of Oliver Cowdery, Solomon Spaulding, and Joseph Smith were correctly chosen for minimal editorial rework and that they were correctly entered into the computer. In the case of Joseph Smith two of the three 5000-word files were taken from his own autobiographic writings, the third from the earliest version of his dictated work used for Pearl of Great Price, Joseph Smith—History 1:1—75. Solomon Spaulding was sampled from a certified transcript of his manuscript labeled “Manuscript Story.” Oliver Cowdery is represented from bylined articles taken from numbers of the Kirtland, Ohio, newspaper Messenger and Advocate printed during the time he was the active editor.