from Medieval Academy News
Publish or Perish: How It Used to Work for Texts
as Well as Authors
by John L. Cisne
Manuscript
traditions grow like family trees and can be studied in at least
as many ways. So far, however, study of the growth process has concentrated
almost exclusively on only one of these: deducing the branching
in a manuscript tradition's stemma from inferred slips of the pen,
in much the same way that a family tree's branching now can be deduced
from inferred slips in the copying of DNA. I described a complementary
alternative in "How Science Survived: The 'Demography' of Manuscripts
and the Survival of Classic Texts," Science 307 (2005), 13057.
The approach
itself was pioneered on family trees by Charles Darwin's younger
cousin Francis Galton, one of the founders of mathematical statistics.
Concerned for the survival of Britain's peerage, Galton posed what
came to be known as the Family Name Problem. This can be stated
in various ways: given statistical fluctuations in the growth of
the family tree, how is the number of a peer's male heirs likely
to change with time? How likely is the peer's title to pass eventually
to collateral relatives? How long could they expect to wait?
Change a few
words and the Family Name Problem becomes the Publish or Perish
Problem discussed here: how likely was a text to perish through
statistical fluctuation in scribes' "publication" of manuscripts?
How are surviving manuscripts now likely to be distributed by age?
The Family
Name Problem crops up in many other contextsthe growth of
a biological population, the fate of a mutant gene, the contagion
of a disease, the detonation of an atomic bomb. The common element
is that the maximum rate at which the statistical population can
grow is directly proportional to the population's size: the larger
the population, the faster it can grow. This is the recipe for exponential
growth. My paper points out that it also applies to manuscript traditions.
Unbeknownst to me when I wrote it, this had already been pointed
out by the late Michael P. Weitzman in "The Evolution of Manuscript
Traditions," Journal of the Royal Statistical Association, A150
(1987), 287308. Had this young philologist lived, he would
have left me with nothing to add and much to read.
Obviously,
each of these examples has unique attributes that must be taken
into account in building a successful model. To get a better sense
of the abstractions, approximations, and simplifications involved,
let us concentrate for now on Galton's work rather than my own because
everyone has a feeling for what it is to be a twig on a genealogical
tree but not on a stemmatic one.
In an instance
of lèse-majesté perhaps unsurpassed in science, Galton conceptualized
the growth of Britain's great families in such general terms that
his model could apply to bacteria. As if inspired by Gilbert and
Sullivan's arch-aristocrat Pooh-Bah, who traced his ancestry back
to "a protoplasmal primordial atomic globule," Galton supposed that
in any given instant a peer has a certain probability ? per unit
time of budding off a son and heir, and likewise a certain probability
? per unit time of dying. The statistical population's expected
growth rate per capita will be the difference between the birth
probability ? and the death probability ?.
Galton's predictions
tended to confirm his fears. If death probability ? exceeds birth
probability ?, the lineage is doomed. It is expected to decay exponentially
in size and to go extinct with probability one. If birth probability
? exceeds death probability ?, on the other hand, the lineage is
expected to grow exponentially, but may go extinct anyway. In theory,
the population will go extinct with probability ?/? if it can grow
indefinitely large, or with probability one if not. In practice,
a potentially viable population is expected either to be extinct
within relatively few generations or to have grown so large it likely
will survive until doomsday.
Change a few
words and the preceding applies to manuscript traditions.
What Galton
discovered has come to be known as the birth-and-death process,
one form of the branching process. Though built to describe statistical
populations that either explode or fizzle, the birth-and-death model
can be recast to describe other statistical processes in which the
population can approach a steady state. I adapted one of these,
the logistic process, which occurs so widely that it was once held
up as a universal law of growth. In population biology, the logistic
differential equation has proved so important that one introductory
textbook actually pictured it emerging from a cloud, like some figure
on the ceiling of the Sistine Chapel.
Once a model
has been sketched out, the next step in applying the scientific
method is to test its predictions against observations to determine
whether it could indeed apply to the real world. Galton pioneered
many of the standard testing procedures. One useful statistic is
the squared standard deviation of the differences between observed
and predicted values divided by the total variance, the squared
standard deviation of the observed values. This can be written 1
R2, where R2 is the coefficient of determination, the fraction
of the total variance explained by the model. For the perfect, too-good-too-be-true
model (R2 = 1), predictions plot on top of the observations along
a perfectly straight line. For the utterly hopeless model (R2 =
0), points typically form a round, featureless cloud.
The idea in
developing a successful model is to explain as much as possible
with as little as possible, that is, to maximize R2 using a plausible
equation that contains as few constants as possible that must be
estimated in fitting a curve to the data. For manuscript traditions,
as for biological populations, the logistic model is about as simple
as can be. My version predicts the distribution of surviving manuscripts
by age given the number of surviving manuscripts and the time of
appearance. The shape of the curve is determined by ?/?, the ratio
of the death to the birth probabilities encountered above, and it
changes continuously from an S-shaped logistic curve (?/? = 0) to
a more or less exponential growth curve (?/? > 0.2).
The paper
tests this no-frills model on four likely candidates, all works
by the Venerable Bede, on technical matters. It fits curves to the
data points by maximizing R2, as described above, while simultaneously
estimating ? and ?.
So how well
does the model stand up to scrutiny? In each case, the model explains
more than 95% of the variance (R2 > 0.95), leaving less than 5%
to be explained by any number of real or imagined complicating factors.
For similar data on biological populations living in field or laboratory
under conditions favorable for logistic growth, this would be considered
very good agreement. Past a certain point, scrutinizing a simple
model becomes as pointless as looking at a map with a microscope.
Trying to explain the unexplained 5% without more and better data
seems well past that point.
To verify
that the model can indeed be falsified when tested (qualifying it
as science as opposed to pseudoscience), my paper's online supplement
takes advantage of the well-documented perturbation of the English
monastery system by Vikings to demonstrate, using Bede's History
of the English Church and People, that the model does not test positive
where it is not supposed to.
The results,
rough as they are: ? ~ 3/century, ? ~ 0.1/century, and ?/? ~ 0.03,
which translates as a logistic-looking curve shaped more like a
running sigma than an S-shaped logistic one.
Conclusions?
At least for Bede's four works, Bernhard Bischoff was about right
after all in estimating that roughly one in seven manuscripts survives
in some form from Carolingian libraries. Medieval librarians drew
up remarkably short inventories, but claimed losses beyond measure
after the barbarians came or the building burned down. Insurance
adjustors must see cases like this all the time.
Feedback on
the paper has been something of an experiment in itself. From the
first, the signal seems not to have been received too clearly at
the other end of C. P. Snow's Bridge between the Two Cultures. The
resulting confusion even echoed back to the other side, and perhaps
even back again. Readers are invited to judge from themselves from
Science 307 (2005), 12089; 309(2005), 698701; 310 (2005),
1618, and, especially in the last case, their online supplements
(available at the Science Website through libraries or other subscribers).
Two Cultures
are too many. I interpret the experiment as showing the need to
reverse the trend toward even more, and for my part resolve to improve
my deplorable Latin and all but nonexistent Greek.
Editor's note:
John Cisne teaches the course on dinosaurs at Cornell University.
He thanks Robert Ziomkowski, the medievalist patiently collaborating
with him in following up on the research discussed here, who deserves
much credit and none of the blame.
|