George Orwell wrote that “Each generation imagines itself to be more intelligent than the one that went before it, and wiser than the one that comes after it.”
And so it is with scientists. Until recently I was largely oblivious of the enormous corpus of botanical literature from the 18th and 19th centuries. Of course I was fully aware of the thousands of dusty volumes in libraries, but I stupidly assumed that they didn't contain much of worth. It wasn't until the Biodiversity Heritage Library made these volumes digitally available and searchable that I realized what a precious mine of information they contained.
Early Floras and Faunas are not just of historical interest; they contain descriptions of species, habitats and landscapes that give unique information on the biota of their time. This information is practically impossible to gather from any other source. The growth of trees can be examined using the width of their rings, and sometimes preserved remains can be found in bogs and in sediments, but for many plants there are no recent remains, except for what was written about them in books.
Europe in particular has the most to gain from the digitization of this literature. It has, by far, the biggest legacy of biodiversity literature, starting with the ancient Greeks and herbalists through to Linnaeus and the great plant hunters.
Fully digitizing a book requires scanning; optical character recognition (OCR); manual correction of the OCR results and semantically enhancing the text with annotations, such as the modern names of species, coordinates of localities and linking references to their original source. This can take a lot of effort, but digitization is work that only needs to be done once, particularly if the results are made openly available for everyone to share.
I recently published a paper where I use biodiversity literature from over 200 years to examine habitat and distribution changes in the small, smelly, weed, stinking goosefoot (Chenopodium vulvaria) (https://peerj.com/articles/723/). It was possible to show that the descriptions of this plant's habitat have changed with time. I look forward to the time when so much of our legacy literature is digitized that the same exercise can be conducted on hundreds of species so that the last two hundred year's changes in biodiversity can be reconstructed from what has been written. Obviously, those changes will be viewed through the filter of those hundreds of authors' words, but it nonetheless will be a unique resource for environmental history.
If you’re interested in contributing to the digitization and transcription of biodiversity literature investigate the Biodiversity Heritage Library; the transcription of books on Wikisource and the full semantic publishing of legacy literature conducted by Pensoft.
Below are graphs showing change in the use of various habitat categories that where used to describe the habitat of stinking goosefoot in literature and on specimen labels. The top four categories show significant changes, but the bottom four show no significant change.