ArticlePDF Available

The Natural History Production Line: An Industrial Approach to the Digitization of Scientific Collections

Authors:

Abstract and Figures

In 2010, Naturalis Biodiversity Center started one of the largest and most diverse programs for natural history collection digitization to date. From a total collection of 37 million specimens and related objects, 7 million relevant objects are to be digitized in a 5-year period. This article provides an overview of the program and discusses the chosen industrial production line approach, the applied method for prioritization of collections that are to be digitized, and some preliminary results.
Content may be subject to copyright.
3
The Natural History Production Line: An Industrial
Approach to the Digitization of Scientific Collections
MAARTEN HEERLIEN, JOOST VAN LEUSEN, STEPHANIE SCHN ¨
ORR, SUZANNE DE JONG-KOLE,
NIELS RAES, and KIRSTEN VAN HULSEN, Naturalis Biodiversity Center
In 2010, Naturalis Biodiversity Center started one of the largest and most diverse programs for natural history collection
digitization to date. From a total collection of 37 million specimens and related objects, 7 million relevant objects are to be
digitized in a 5-year period. This article provides an overview of the program and discusses the chosen industrial production line
approach, the applied method for prioritization of collections that are to be digitized, and some preliminary results.
Categories and Subject Descriptors: E.1 [Data]: Records; I.3.3 [Computer Graphics]: Digitizing and Scanning
General Terms: Natural History, Collection Digitization
Additional Key Words and Phrases: Scientific collections, prioritization, industrial approach, production lines, public engage-
ment, crowdsourcing
ACM Reference Format:
Maarten Heerlien, Joost van Leusen, Stephanie Schn¨
orr, Suzanne de Jong-Kole, Niels Raes, and Kirsten van Hulsen. 2015. The
natural history production line: An industrial approach to the digitization of scientific collections. ACM J. Comput. Cult. Herit.
8, 1, Article 3 (February 2015), 11 pages.
DOI: http://dx.doi.org/10.1145/2644822
1. INTRODUCTION
In 2010, the newly formed Naturalis Biodiversity Center, the national museum and research insti-
tute for biodiversity of the Netherlands, was awarded with a 30 million euro government grant from
the Dutch Fund for Economic Structure enhancement to give shape to the new institute. From this
grant, 13 million euro was allotted to a digitization program. Through this program, Naturalis aims
to digitize in detail a cross section of at least 7 million relevant specimens and related objects from a
total collection of 37 million specimens, whereas the remaining 30 million objects are to be digitized
on a higher level in a 5-year period. Furthermore, the program focuses on developing a sustainable
This fund, Fonds Economische Structuurversterking, in Dutch, was established in 1995 to invest profits from the natural gas
reserves from the northern regions of the Netherlands in the Dutch infrastructure and as of 2005 also in the Dutch knowledge
economy. Naturalis Biodiversity Center was among the last institutions to receive a grant from the fund, which was discon-
tinued in 2011. The other 17 million euro was used to (1) integrate the collection of the National Museum of Natural History
Naturalis with those of the Zoological Museum of Amsterdam and of the National Herbarium of the Netherlands, with Naturalis
Biodiversity Center being the result of a merger between the three, and (2) to establish a DNA barcoding facility.
Authors’ addresses: M. Heerlien, J. van Leusen, S. Schn¨
orr, S. de Jong-Kole, N. Raes, and K. van Hulsen, Naturalis Biodiversity
Center, Darwinweg 2, 2333 CR Leiden, The Netherlands; emails: {Maarten.Heerlien, Joost.vanLeusen, Stephanie.Schnoerr,
Suzanne.deJong-Kole, Niels.Raes, Kirsten.vanHulsen}@naturalis.nl.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided
that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
2015 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 1556-4673/2015/02-ART3 $15.00
DOI: http://dx.doi.org/10.1145/2644822
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
3:2 M. Heerlien et al.
infrastructure that will allow Naturalis continuing digitization. This article provides an overview of
the program, which is one of the largest natural history collection digitization programs to date. The
program approach with respect to prioritization of collections, to the digitization process, and to public
engagement is discussed. We conclude with an overview of program results so far.
1.1 Motivation for Digitization
Natural history collections like those maintained by Naturalis are of great importance to the com-
munity at large, as they give invaluable insights to the past and present state of global biodiversity
and potentially help to solve current challenges in many areas, such as the effects of environmental
change, public health, and crop pollination [Baird 2010]. Digitizing these collections makes the en-
closed knowledge more accessible to scientists researching these issues and facilitates more efficient
collection management as well as protection of collected specimens from overhandling. Additionally,
digitization makes virtual repatriation of collections possible to the countries from which they were
gathered—in Naturalis’ case, mainly former Dutch colonies such as Suriname and Indonesia.
1.2 Challenge
The challenge in this respect, and with regard to the heterogeneity of collection types maintained by
Naturalis, is to determine which 7 million objects out of a total of 37 million are the most relevant in
relation to current scientific, social, and economic issues and how to develop a digitization process for
these that would not exceed the 13 million euro budget, allowing for a maximum average price of 1.86
euro per digitized object, including overhead, permanent storage, and equipment costs. At the time
Naturalis applied for the grant, the average digitization cost per object was estimated to be close to
5.00 euro, based on experience from previous project in which the traditional digitization approach was
taken. In this approach, high-quality images of selected specimens are made and all available specimen
data is registered—a labor-intensive method that requires expert knowledge and is therefore costly but
does not result in datasets that cover complete collections. With the objective to digitize 7 million spec-
imens for 1.86 euro per specimen, Naturalis needed to divert from this traditional method.
1.3 Approach
To deal with this challenge, an industrial approach was chosen based on the following starting points:
—To develop a framework for prioritization to determine which collections should be digitized
—To develop digitization processes based on the collection types, such as collections preserved in al-
cohol, dry collections, microscopic slides, and printed publications, enabling the digitization of any
collection in that type category regardless of its specific content
—To divide complicated and labor-intensive processes into a series of shorter tasks, each executed by
a coworker specialized in that task
—To standardize the data entry process through the use of one metadata standard and central data
management systems
—To limit metadata capture to a minimum by only registering metadata needed for collection man-
agement and for basic accessibility to researchers
—To only capture photographic reproductions of specimens where this has a proven added value
—To make use of (commercial) third parties for digitization where it makes sense (price/quality-ratio
driven).
Naturalis is not the first natural history institute to apply these kinds of starting points to develop a
large-scale digitization program. In recent years, there has been growing concern within the scientific
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
The Natural History Production Line 3:3
Fig. 1. Schematic overview of the Naturalis approach to digitization process organization.
heritage community about the slow pace of the digitization of the estimated three billion zoologi-
cal, botanical, and geological specimens that are maintained in natural history collections worldwide
[Vollmar et al. 2010]. The Natural History Museum in London applied a similar approach in the digi-
tization of its entomological collection [Blagoderov et al. 2012], as did the Mus´
eum National d’Histoire
Naturelle in Paris to digitize its entire herbarium collection, which is the largest in the world. The
Digitarium in Finland has applied a production line approach to both entomological and herbarium
collections [Tegelberg et al. 2012]. However, the Naturalis program is unique in the diversity of the
collection types that are being digitized. Whereas London, Paris, and Finland have focused on insect
drawers and herbarium sheets, Naturalis has diversified the approach to facilitate the digitization of
virtually all types of collections that natural history heritage encompasses on both the specimen and
storage unit level (Figure 1).
The starting points presented earlier were worked out further in a series of 3- to 6-month pilots,
each focusing on developing and optimizing the process workflow and tools needed for the digitization
of a specific collection type, as well as developing a business case and success indicators to determine
whether or not the chosen approach was viable. The results were presented to the program’s steering
committee, consisting of the institute’s general and scientific management, to make a final decision
on whether or not a pilot would be promoted to a full project. Most pilots were taken into production,
whereas a few were deemed ineffective and have been shut down, such as a pilot for 3D capturing of
dry vertebrates for which at the start of the program the investment turned out too large in relation
to the added value for scientists and collection managers of the 3D images captured [Van den Oever
and Gofferj´
e 2012]. In 2014, however, the decision not to apply 3D digitization has been reevaluated.
A new pilot for experimenting with various 3D imaging techniques (CT scanning, laser scanning and
photogrammetry) was started in the summer of 2014.
1.4 Digistreets for Detailed Object Digitization
Projects that are taken into production are executed in so-called digitization streets, or Digistreets—
facilities comparable to production lines. Currently, Naturalis operates seven Digistreets, with all
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
3:4 M. Heerlien et al.
Table I. Targeted Number of Objects to Be Digitized, Currently Realized Number of Digitized Objects,
and Average Cost per Object per Digistreet as of April 1, 2014
Average Cost per Object in Euros
Digistreet Target Realized (including overhead)
Herbarium sheets—Leiden 3,800,000 450,000 (3,500,000 scanned) 1.29
Microscopic slides 900,000 350,000 (590,000 scanned) 1.57
2D objects (books, journals, etc.) 900,000600,0001.87
Entomology 850,000 600,000 1.51
Herbarium sheets—Wageningen 800,000 800,000 1.47
Mollusks 510,000 640,000 1.37
Vertebrates dry 325,000 180,000 2.37
Geological specimens 220,000 70,000 1.90
Wood samples 125,000 125,000 1.27
Specimens preserved on alcohol 100,000 70,000 4.65
Total 8,530,000 3,885,000
Number of pages.
except one situated within the institute itself and each equipped to handle the digitization of a spe-
cific collection type. These are herbarium sheets, microscopic slides, entomology collections, 2D objects
(journals, rare books, archives, etc.), dry vertebrates, geological specimens, and specimens preserved
in alcohol. Two other Digistreets—the street for mollusks and the wood samples street—met their tar-
geted results in 2013 (the mollusk street surpassed its target by more than 100.000 specimens; Table I)
and were wound down.
For each Digistreet the targets are worked out in further detail based on the business case developed
in the pilot to determine how many objects can be digitized during the production phase; how many
people are needed to meet these targets; and how many objects are needed to process per day, week,
and month.
1.5 Prioritization Framework
To facilitate the decision making process for which 7 million objects are to be digitized in detail, a
framework for collection prioritization has been developed. This ensures the consistency of the digitized
collections with current scientific, educational, or economic affairs. The framework was used to fill in
80% of the total number of targeted objects per Digistreet, with 20% being reserved for additional
on-demand digitization (e.g., at the request of external research institutes).
In the first phase, scientists and collection managers are invited to submit proposals for the digiti-
zation of specific collections or parts thereof. A typical proposal contains a description of the collection
that is to be digitized as well as solid arguments for digitization, such as the expected benefits from
digitizing the specified collection with regard to current research programs and collection preservation.
In all cases where the method was applied, the total number of objects in the collections for which digi-
tization was proposed surpassed the target set for the Digistreet in question. To weigh the relevance of
the proposed projects as objectively as possible and to rank them accordingly, the second phase consists
of a two-step approach. First, a number of hard selection criteria are set, including the following (in
random order):
rRelation to the institute’s own research priorities
rRelation to the institute’s own public and educational programs
rRelation to national and international biodiversity programs
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
The Natural History Production Line 3:5
rEconomic importance of the proposed collection
rAvailability of existing collection documentation and data
rPhysical state of the proposed collection.
The proposals are judged and ranked according to these criteria. Second, an online survey is held
among a large group of stakeholders, who are asked to rank the significance of each proposal with
regard to their own fields of expertise. In the third phase of the method, the results of both these
steps are processed into a recommendation on the most favorable proposals, which is submitted to the
program’s steering committee for a final decision.
1.6 Digistreet Production Process
After a positive decision, the collections are digitized in the Digistreet. Although the details of the
digitization process vary from street to street with respect to scanning techniques, registered data,
and so forth, the overall approach is the same for all Digistreets. Here, the process is illustrated by the
Digistreet for microscopic slides. This Digistreet, known as the Glass Street, is one of the largest that
is operational within Naturalis at this time. The target set for this street is to digitize (i.e., to capture
high-resolution images) and register all relevant label data of the entire collection of microscopic slides
maintained by Naturalis, comprising approximately 900,000 objects. In this sense, the Glass Street
diverges from regular practice, as in this case there is no need for prioritization. To reach this target,
an innovative industrial process approach is developed where the Glass Street acts like an efficient
production line.
Step one in this line (Figure 2) is to supply the Glass Street with microscopic slides. This is done by
curators authorized to transport specimens from the storage facility to the Digistreet and back. Once
in the Digistreet, the next step is to label every slide with a unique data matrix code, which is linked
to an empty data record in the central collection registration system, after which they are placed in
a scan tray. This custom-made tray can harbor 100 slides at once. The tray is scanned with the use
of a SatScan R
collection scanner, a system for capturing high-resolution images of large area objects,
developed by SmartDrive. Naturalis operates two of these devices; the other is used to scan entomology
drawers containing pinned insects. (For details on SatScan, see Blagoderov et al. [2012] and Mantle
et al. [2012]). The result is a 600Mb high-resolution overview image containing all 100 slides. Subse-
quently, specially developed software cuts each individual slide from the overview image and renames
each cut-out image according to the object number contained in its data matrix code. The result is a
set of 100 individual slide images of 4Mb that are stored in an online repository and used for data reg-
istration in a later process stage. The image quality does not facilitate digital image–based specimen
research. Producing images with a resolution high enough for such purposes would take more produc-
tion time per slide as well as more digital storage capacity, thus making the process much less efficient.
Using this scanning production line, the Glass Street scans between 2,000 and 2,500 slides per day,
with four to five employees operating it, each executing one step in the process. The scan tray is used for
the normal size microscopic slides of 1 by 3 inches. However, 30% of the collection consists of geological
slides that have a different size. These are scanned while in their storage trays. Besides the normal
and geological slides, there is a collection of slides that that have irregular sizes. Depending on the
number and size, these slides are either photographed individually or are scanned using a specially
sized scan tray. After the scanning process, the microscopic slides are replaced in their original storage
container to which each individual slide is digitally linked. Subsequently, the container is digitally
linked to its physical storage location.
In the last step of the Glass Street production line, the label data from the microscopic slides is
registered. To do so, street workers use the high-resolution images instead of the slides themselves.
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
3:6 M. Heerlien et al.
Fig. 2. Schematic representation of the Glass Street production process. Illustration: Ben van Arkel.
This ensures minimal handling of the physical objects. The images are automatically linked to their
corresponding data entry record by the unique code that was assigned to the slide at the beginning
of the production line. Because of this, the image is visible once the corresponding data record in the
collection registration system is opened and the label data can be registered. Entering the data is a
manual process. Most labels are handwritten, ruling out automatic data capture.
1.7 Outsourcing
Although six of the seven currently operational Digistreets are situated within Naturalis and are being
operated by permanent and temporal staff members, operation of the seventh Digistreet, aimed at
digitizing herbarium sheets from Leiden, Amsterdam, and Utrecht3has been outsourced to Picturae,
a Dutch service provider in the field of collection digitization and digital collection management. This
Herbarium Street launched in July 2013 and was the last of the originally planned Digistreets to
become operational. It is also the largest Digistreet within the Naturalis digitization program with
a target to digitize—that is, to capture high-resolution images and register all of the relevant label
1In addition to the herbarium sheets within the Leiden collection, there is also a (smaller) herbarium collection in the city of
Wageningen. This collection is digitized in a separate, internal Digistreet in the Wageningen location of the former National
Herbarium of the Netherlands, now part of Naturalis Biodiversity Center; see the Introduction and Table I.
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
The Natural History Production Line 3:7
data—3,800,000 herbarium sheets. To reach this target in a cost-effective way, an innovative process
has been developed that closely resembles an actual industrial production line.
First, the Herbarium Street is provided with boxes of herbarium sheets. Because of the volume of the
operation, this is done by a professional transport company. Once in the Digistreet, the box is labeled
with a unique data matrix code, after which it is emptied and placed onto a conveyer belt. Subsequently,
the corresponding covers and herbarium sheets are labeled with unique data matrix codes and placed
onto the conveyor belt as well. All three items are photographed with a Nikon D800e camera. Built-in
software checks the label and reads out color and sharpness. If an error occurs, the conveyer belt stops
and automatically takes a few steps back so that the photo can be remade. In all, it takes 8 seconds to
produce the final file.
The output of the process is a 300ppi TIFF image of the herbarium sheet, a 150ppi image of the
box, and the cover and a comma-separated value (csv) file format. This file contains all information
from the image that is needed to start the data entry. At the end of the conveyer belt, the herbarium
sheets are packed into their boxes again in the exact same order as they were before digitization,
after which they are transported back to Naturalis. Using this production line, the Herbarium Street
is able to digitize between 22,000 and 24,000 herbarium sheets a day, with three conveyer belts and
12 employees operating it.4
The last step of the Herbarium Street production line concerns the metadata entry. This is done
in Paramaribo, Suriname, by a team of 40 employees trained in transcribing handwritten labels. For
this purpose, jpeg derivatives of the high-resolution images created in Leiden are used. As in the Glass
Street, the images are automatically linked to their corresponding data entry record by the unique data
matrix code that was assigned to them in the Herbarium Sheet at the beginning of the production line.
Entering the data is a manual process. Most labels are handwritten, and some of them are hundreds
of years old, ruling out automatic data capture.
1.8 Public Engagement
Since the digitization program is entirely financed with public funds, transparency with respect to
expenditure, approach, and results is key. Here, the Glass Street plays an important role, as it is
situated in one of the museum exhibition spaces where it functions as a public demonstrator of the
digitization program. In this exhibition space, called LiveScience, museum visitors get to observe the
digitization process and interact with the Digistreet workers, being only separated from them by a low,
open barrier.5Before the launch of the Glass Street in LiveScience in April 2013, the exhibition space
was home to the Mollusk Street, which opened in May 2011 and reached (and surpassed) its target of
400,000 specimens in early 2013, after which it was wound down.
The public is also invited to participate in digitization. The Mollusk Street and Glass Street appli-
cations were developed that enable enthusiasts to transcribe object labels. The Web app developed
for Mollusk transcription was primarily aimed at fostering appreciation for the digitization of scien-
tific heritage among museum visitors by letting them try for themselves and to a lesser extent at
producing high volumes of user-generated collection records.6The Glass Street crowdsourcing appli-
cation took this to a next level. Here, an existing Dutch online platform for transcribing handwritten
heritage objects, VeleHanden (Many Hands), was used for a full-scale crowdsourcing project aimed at
the Dutch-speaking regions, called Glashelder! (roughly translating to Crystal Clear).7The public was
2For a visual presentation of the Herbarium Street digitization process, see http://www.youtube.com/watch?v=hmG4twyHXkE.
3See http://www.naturalis.nl/en/museum/livescience.
4See http://www.naturalis.nl/en/museum/livescience/crowd-sourcing.
5See http://www.velehanden.nl/projecten/bekijk/details/project/nat˙nbc (in Dutch).
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
3:8 M. Heerlien et al.
Fig. 3. The transcription module of the Glashelder! project. Left: The current data entry form with (top to bottom) the field’s
scientific name, author, and year (of publication of the scientific name), sex, type specimen, host species (relevant in case of
parasites), gathering country, locality, collection date, collector, number of specimens, and previous registration code. Right:
Microscopic slide image with several Aphids of the species Necatorosiphon persicae Sulzer. Image: Vele Handen–Picturae. With
permission of Picturae.
encouraged to sign up for the project on the VeleHanden platform and to transcribe microscopic slide
labels, aided by a comprehensive manual, FAQs, and a forum for project members to discuss occurring
issues among themselves and with museum staff. Glashelder! served as an experiment to determine to
which extent online transcription of natural history data by volunteers can contribute, both quantita-
tively and qualitatively, to collection digitization. To determine this, a separate production target was
set for the project—transcription of 100,000 microscopic slides in a period of 6 to 9 months—as well as
a set of success indicators.
The Glashelder! project was launched on March 26, 2013, after a 1-month trial period during which
members of the VeleHanden platform (i.e., people with little, if any, knowledge about natural his-
tory collections) tested the transcription module and supporting documentation. Based on their user
feedback, several changes were made to the project design, most notably a simplification of the tran-
scription form (Figure 3), the first version of which left too much room for interpretation while at the
same time did not provide enough room for the recording of exceptions in, for instance, the zoologi-
cal nomenclature, thus raising many questions among the test crowd. The final form was reduced to
11 fields.
The Glashelder! project is regarded a success. On December 30, 2013, 9 months after launch, the
last of the 100,000 glass slide labels was transcribed, whereas the validation of the transcription was
finished on January 19, 2014. During the project, a total of 511 participants had signed on, about one
third of whom did so in the first project month. In part, this can be credited to a media campaign,
but mostly the project was able to benefit from the community already present at VeleHanden. About
150 project members were regarded as active participants, each having transcribed up to 1,000 la-
bels. During this project, 23 members were regarded as super participants, each having transcribed
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
The Natural History Production Line 3:9
1,000 slides or more. Although the project was not specifically aimed at biologists and did not require
any prior knowledge of the domain to participate, an inquiry among the project members showed that
most either had a professional background in the natural sciences or were nature enthusiasts with an
above-average interest in and knowledge of the species preserved in the slides, mostly Mites, Aphids,
and Springtails.
Together, the participants produced 200,000 transcriptions in 9 months, resulting in 100,000 val-
idated label transcriptions, as the label of each microscopic slide was transcribed twice by different
participants with a third, more experienced, participant validating the two transcriptions using a
comparison tool and submitting a final version. The average daily number of produced transcriptions
over the course of the entire project lay at 712 slide labels, whereas the highest number of slide labels
transcribed on 1 day is 1,914 (August 7, 2013). On average, 334 slides were validated daily. The first
of the crowd-generated datasets, the Collembola or Springtails set, has been published through the
Global Biodiversity Information Facility (GBIF), with the rest of the data to come.8
1.9 Application of Digitized Collections
The largest part of the collections digitized during the program was chosen because of relevance to
current research topics. Therefore, the digitized collections play a vital role in the national and in-
ternational research programs that Naturalis leads or is otherwise involved in, such as the current
research program on the decline of pollinators in Europe and the effects on crop pollination and food
supply [Carvalheiro et al. 2013], in which the digitized bee and bumblebee collections were used to
analyze changes in the occurrences of these pollinators over the past 60 years. In addition, the high-
resolution images taken of the bumblebees are currently used to develop algorithms for automated
species identification based on their wing veneration, the patterns of which provide the only means to
distinguish between some species of bumblebees.
The digitization of the herbarium sheets contributes to biological conservation research that identi-
fies global hotspots of botanical diversity, how past (glacial) climatic conditions have shaped the spatial
distribution of hotspots, and how future climate change predictions will likely impact their distribu-
tion. The digitized records are also used to determine the distribution and growth conditions of plants
that are the (crop) wild relatives of human food crops [Raes et al. 2013]. The breeding of crop species
with species that are evolutionary closely related and are currently found to grow under warmer, drier,
or more saline conditions will result in crops that are resistant to future climate change (IPCC 2014);
this type of research is known as climate smart agriculture.
1.10 Current Program Status
Currently, the digitization program is approaching the end of its fourth year—and its third year of
full-scale production. In the past 3 years, all Digistreets were made operational and close to 4 million
specimens and related objects have been digitized. The realized average costs of object digitization
vary from street to street due to the different nature of the treated objects. However, through ongo-
ing optimization of the production processes and tools, the overall average cost of object digitization
has been reduced to 1.52 euro per object, including overhead costs. This enables Naturalis to digitize
1.5 million more collection objects during the digitization program than was originally planned. Table I
provides an overview of targets and objects realized per April 1, 2014, and average cost per object per
Digistreet. In addition, 70,000 entomology drawers have been digitized on a higher, less detailed level
by registering each drawer, the species and quantities it contains, the geographic location from which
6See http://www.gbif.org/dataset/4f8de55f-5967-46c4-b689-31de17090ed4.
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
3:10 M. Heerlien et al.
the species were gathered, and the exact depot location of the drawer.9Based on that experience, a
new project was started to treat the rest of the specimens that cannot be digitized in detail in a similar
way. In this 30M project (where the M stands for million), the remaining specimens will be digitized
on the storage unit level, with a storage unit being a unit in (or on) which objects are stored, such
as boxes, drawers, and shelves. The 30 million specimens to which the project name refers are stored
in approximately 150,000 of these units, scattered throughout several collection facilities. During the
30M project, the taxonomic and geographical information recorded on the exterior of each of these
storage units is registered in the central collection registration system, whereas specimen-specific in-
formation will not be recorded. This way, the basic information of all 37 million collection specimens
will be digitized, traceable, and virtually accessible by the end of 2015.
A series of additional projects to develop central data registration systems and workflows contributed
significantly to these positive results, most notably the implementation of a central collection registra-
tion system for zoological and geological specimens and the subsequent implementation of a data model
to fit all collection data—the ABCD 2.06 standard for biological collections—thus aligning data regis-
tration in all Digistreets.10 Additionally, a central workflow for processing the images captured in the
various Digistreets has been developed in cooperation with the Dutch National Institute for Sound and
Vision. Through this workflow, the captured images from each Digistreet are centrally processed into
user copies of various resolutions, whereas the original images, mostly tiff files, are sent to a facility
for durable storage of digital content at Sound and Vision—all overnight. This process is capable of
handling 15,000 images encompassing 600Gb per day.
With regard to visibility of the digitized collections, 3 million specimen records have been made
available through GBIF.11 In addition, a cross section of about 100,000 digitized objects have been
made available to Europeana,12 whereas the digitized 2D objects are being made available through the
European branch of the Biodiversity Heritage Library.13 The release of Naturalis’ own Web-accessible
public collection portal is planned for the end of the third quarter of 2014, in addition to a public
API that enables third parties to retrieve sets of data and multimedia from several of Naturalis’ core
content management systems. To advance the reuse of the digitized collections further, the data and
images generated in the Digistreets are provided under the Creative Commons Zero (CC0) copyrights
waiver, effectively placing them in the public domain.
1.11 Conclusion
At this time, the Naturalis program remains one of the largest and most diverse digitization programs
in the natural history community. Although it may not seem like it, with little more than 3 million
specimens and related objects digitized in 3 years and more than 5 million objects to be digitized in
the remaining year, the program is well on schedule. The largest part of these are to be digitized by
the Leiden Herbarium Street, which has been operational since July 2013 and has already produced
92% of its targeted images; the data transcription is up to speed as well. The regular Digistreets will
digitize the remaining 1.3 million objects. The digitization program will come to an end in June 2015.
By then, Naturalis Biodiversity Center will have made 23% of its entire collection digitally available
7This is done separately from the Entomology Street, where a selection from the entomology collections, 850.000 specimens out
of a total of 17 million specimens, is digitized in detail. See http://youtu.be/TywNYCigY0k.
8For the botanical collections a dedicated collection registration system (Brahms) and uniform data model were already in place
at the beginning of the program.
9See http://www.gbif.org/publisher/396d5f30-dea9-11db-8ab4-b8a03c50a862.
10See http://www.europeana.eu/portal/search.html?query=DATA PROVIDER%3A“Naturalis+Biodiversity+Center”.
11See http://www.bhl-europe.eu/.
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
The Natural History Production Line 3:11
in detail, with the rest of it on a metalevel, and the institute will possess the digital infrastructure as
well as the expertise to keep digitizing in the years after.
REFERENCES
R. Baird. 2010. Leveraging the fullest potential of scientific collections through digization. Biodiversity Informatics 7, 130–136.
V. Blagoderov, I. J. Kitching, L. Livermore, T. J. Simonsen, and V. S. Smith. 2012. No specimen left behind: Industrial scale
digitization of natural history collections. Zookeys 209, 133–146.
L. G. Carvalheiro, W. E. Kunin, P. Keil, J. Aguirre-Guti´
errez, W. N. Ellis, R. Fox, Q. Groom, S. Hennekens, W. Van Landuyt, D.
Maes,F.VandeMeutter,D.Michez,P.Rasmont,B.Ode,S.G.Potts,M.Reemer,S.P.Roberts,J.Schamin
´
ee, M. F. Wallisdevries,
and J. C. Biesmeijer. 2013. Species richness declines and biotic homogenisation have slowed down for NW-European pollinators
and plants. Ecology Letters 16, 870–878.
B. L. Mantle, J. La Salle, and N. Fisher. 2012. Whole-drawer imaging for digital management and curation of a large entomolog-
ical collection. Zookeys 209, 147–163.
R. K. Pachauri and L. Meyer (ed.). Climate Change 2014, Synthesys Report. (IPCC, 2014).
N. Raes, L. G. Saw, P. C. van Welzen, and T. Yahara. 2013. Legume diversity as indicator for botanical diversity on Sundaland,
South East Asia. South African Journal of Botany 38, 265–272.
R. Tegelberg, J. Haapala, T. Mononen, M. Pajari, and H. Saarenmaa. 2012. The development of a digitising service centre for
natural history collections. Zookeys 209, 75–86.
J. P. Van den Oever and M. Gofferj´
e. 2012. From pilot to production: Large scale digitisation project at Naturalis Biodiversity
Center. Zookeys 209, 87–92.
A. Vollmar, J. A. Macklin, and L. S. Ford. 2010. Natural history specimen digitization: Challenges and concerns. Biodiversity
Informatics 7, 93–112.
Received February 2014; revised April 2014; accepted July 2014
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 3, Publication date: February 2015.
... For example, the field of anthracology, the study of ancient wood charcoal, requires access to reference collections of wood tissues, both historical and modern. Unfortunately, only a few herbaria digitization and imaging projects have included collections besides herbarium sheets, such as slides; moreover, when slides are digitized, they may not be imaged in a way that allows for specimen level research, prohibiting further scientific use of the digitized collection (Allan et al., 2019;Decker et al., 2018;Heerlien et al., 2015;Musson et al., 2020). ...
... For slide imaging projects, there have been a variety of solutions that weigh each of these constraints differently. The slide imaging projects from the Naturalis Biodiversity Center, Netherlands (Heerlien et al., 2015) and the Natural History Museum, London (Allan et al., 2019) prioritized efficiency and cost by digitizing hundreds of thousands of slides in a short amount of time, with a limited number of staff, by taking overhead photos of batches of 100 slides and using optical character recognition (OCR) to automate metadata capture. This approach relies on strict staging and label protocols to digitize the record of each specimen. ...
Article
Full-text available
As herbaria move to digitize their collections, the question remains of how to efficiently digitize collections other than standard herbarium sheets, such as wood slide collections. Beginning in September 2018, the Harvard University Herbaria began a project to image and digitize the wood slides contained in the Bailey-Wetmore Wood Collection. The primary goal of this project was to produce images of the wood tissue that could be used for specimen-level research and to make them available on the internet for remote scholarship. A secondary goal was to establish best practices for digitizing and imaging a microscope slide collection of tissue sections. Due to the size of the wood slide collection (approximately 30,000 slides), a medical histology scanner and virtual microscopy software were used to image these slides. This article outlines the workflow used to create these images and compares the results with digital resources currently available for wood anatomy research. Prior to this project, the very little of the Bailey-Wetmore Wood Collection was cataloged digitally and none of it was imaged, which made access to this unique collection difficult. By imaging and digitizing 6605 slides in the collection, this project has demonstrated how other institutions can make similar slide collections available to the broader scientific community.
... Technological advances and innovative workflows are allowing natural history museums to enter a new age of mass digitisation of NHCs (e.g., Blagoderov et al. 2012;Heerlien et al. 2015;Hudson et al. 2015;Blagoderov et al. 2017). Modern imaging technologies also enable scientists to extract new data from the same specimens (e.g., Schmidt et al. 2013;Cunningham et al. 2014). ...
Thesis
Present-day ecological communities and the deep-time fossil record both inform us about the processes that give rise to, and maintain, diversity of life on Earth. However, these two domains differ in temporal, spatial and taxonomic scales. Integrating these scales remains a major challenge in biodiversity research, mainly because the fossil record gives us an incomplete picture of the extinct communities. Planktonic Foraminifera provide an excellent model system to integrate present and past changes in biodiversity. They are single-celled marine zooplankton that produce calcite shells, yielding a remarkably complete fossil record across millions of years, and are alive today enabling genetic and ecological studies. Their fossil record has been widely used in the fields of stratigraphy and palaeoclimate. However, we have limited knowledge about their ecology, preventing us from fully understanding the evolutionary processes that shaped their diversity through time. The primary objective of this thesis is to improve our understanding of community ecology of extant planktonic Foraminifera species, to enable us to more comprehensively study their fossil record. I created a large image dataset of over 16,000 individuals from a historical museum collection (Chapter 2) and assessed its potential biases (Chapter 3). Using the data gathered from the collection, I investigated the extent to which individuals of the same species vary in shell size (Chapter 4). Size relates to many physiological and ecological characteristics of an organism, thus understanding how it varies within species and across space gives us insights about the function of the species in the ecosystem. Planktonic Foraminifera species greatly differ in how much size variation is explained by environmental (temperature and productivity) and/or ecological (local relative abundance) conditions, suggesting that the known pattern of large size at favourable conditions is not widespread in the group. Next, I explored how planktonic Foraminifera species interact with each other in ecological communities (Chapter 5). Their fossil record suggests that competition among species is an important ecological interaction limiting the number of species that can emerge within the group. I tested whether species are competing today in the oceans, and found no evidence for negative interactions. This result suggests that either the ecological processes acting on communities today are different than the ones driving planktonic Foraminifera evolution, or that competition among species did not shape the patterns we observe in their fossil record. Together, these discoveries extend our current understanding of planktonic Foraminifera biology and highlight the complexity of ecological dynamics. Future work using the planktonic Foraminifera fossil record to understand marine biodiversity changes will require scientific research across different scales as well as considering other interacting plankton groups.
... To digitize and store specimen metadata for each swallowtail together with their image, the Collection Registration System (CRS) in use by Naturalis was chosen. CRS was developed during the FES Collection Digitization project (Heerlien et al. 2015) from 2010 to 2015 to store collection related data and support collection management activities. It now holds over 8 million specimen records at object level and 32 million specimens at species/storage unit level. ...
Article
Full-text available
In terms of amateurs and professionals studying and collecting insects, Lepidoptera represent one of the most popular groups. It is this popularity, in combination with wings being routinely spread during mounting, which results in Lepidoptera often taking up the largest number of drawers and space in entomological collections. As resources grow increasingly scarce in natural history museums, any process that results in more efficient use of resources is a welcome addition to collection management practices. Therefore, we propose an alternative method to process papered Lepidoptera: a workflow to digitize (imaging and data registration) papered specimens and to store them (semi)permanently, still unmounted, in glassine envelopes. The mounting of specimens will be limited to those for which it is considered essential. The entire workflow of digitization and repacking can be carried out by non-expert volunteers. By releasing data and images on the internet, taxonomic experts worldwide can assist with identifications. This method was tested for Papilionidae. Results suggest that the workflow and permanent storage in glassine envelopes described here can be applied to most groups of Lepidoptera.
... In 2015, the Museum's DCP ran a pilot project for mass digitisation of microscope slides using a multi-slide imaging template and downstream image segmentation (Summerfield et al. 2019), similar to that run at Naturalis, Leiden (Heerlien et al. 2015). This pilot project utilised a volunteer workforce of 45 people, in teams of 3 -7 people per day, to scañ 100,000 microscope slides over 10 months using the SatScan (Smartdrive Ltd.) and industrial approaches as described by Blagoderov et al. (2012). ...
Article
Full-text available
The Natural History Museum, London (NHM) has embarked on an ambitious programme to digitise its collections. One aim of the programme has been to improve the workflows and infrastructure needed to support high-throughput digitisation and create comprehensive digital inventories of large scientific collections. This paper presents the workflow developed to digitise the entire Phthiraptera (parasitic lice) microscope slide collection (70,663 slides). Here we describe a novel process of semi-automated mass digitisation using both temporary and permanent barcode labels applied before and during slide imaging. By using a series of barcodes encoding information associated with each slide (i.e. unique identifier, location in the collection and taxonomic name), we can run a series of automated processes, including file renaming, image processing and bulk import into the NHM’s collection management system. We provide data on the comparative efficiency of these processes, illustrating how simple activities, like automated file renaming, reduces image post-processing time, minimises human error and can be applied across multiple collection types.
Article
The digitization of herbarium collections is greatly transforming plant biodiversity science, yet most herbarium data remain inaccessible. Here, we present a novel, single‐user photostation and associated workflow for efficiently mobilizing herbarium specimens. Our apparatus represents a significant improvement to existing technology and is scalable to a variety of digitization tasks from small to massive collections.
Article
Full-text available
DiSSCo, the Distributed System of Scientific Collections, is a pan-European Research Infrastructure (RI) mobilising, unifying bio- and geo-diversity information connected to the specimens held in natural science collections and delivering it to scientific communities and beyond. Bringing together 120 institutions across 21 countries and combining earlier investments in data interoperability practices with technological advancements in digitisation, cloud services and semantic linking, DiSSCo makes the data from natural science collections available as one virtual data cloud, connected with data emerging from new techniques and not already linked to specimens. These new data include DNA barcodes, whole genome sequences, proteomics and metabolomics data, chemical data, trait data, and imaging data (Computer-assisted Tomography (CT), Synchrotron, etc.), to name but a few; and will lead to a wide range of end-user services that begins with finding, accessing, using and improving data. DiSSCo will deliver the diagnostic information required for novel approaches and new services that will transform the landscape of what is possible in ways that are hard to imagine today. With approximately 1.5 billion objects to be digitised, bringing natural science collections to the information age is expected to result in many tens of petabytes of new data over the next decades, used on average by 5,000 – 15,000 unique users every day. This requires new skills, clear policies and robust procedures and new technologies to create, work with and manage large digital datasets over their entire research data lifecycle, including their long-term storage and preservation and open access. Such processes and procedures must match and be derived from the latest thinking in open science and data management, realising the core principles of 'findable, accessible, interoperable and reusable' (FAIR). Synthesised from results of the ICEDIG project ("Innovation and Consolidation for Large Scale Digitisation of Natural Heritage", EU Horizon 2020 grant agreement No. 777483) the DiSSCo Conceptual Design Blueprint covers the organisational arrangements, processes and practices, the architecture, tools and technologies, culture, skills and capacity building and governance and business model proposals for constructing the digitisation infrastructure of DiSSCo. In this context, the digitisation infrastructure of DiSSCo must be interpreted as that infrastructure (machinery, processing, procedures, personnel, organisation) offering Europe-wide capabilities for mass digitisation and digitisation-on-demand, and for the subsequent management (i.e., curation, publication, processing) and use of the resulting data. The blueprint constitutes the essential background needed to continue work to raise the overall maturity of the DiSSCo Programme across multiple dimensions (organisational, technical, scientific, data, financial) to achieve readiness to begin construction. Today, collection digitisation efforts have reached most collection-holding institutions across Europe. Much of the leadership and many of the people involved in digitisation and working with digital collections wish to take steps forward and expand the efforts to benefit further from the already noticeable positive effects. The collective results of examining technical, financial, policy and governance aspects show the way forward to operating a large distributed initiative i.e., the Distributed System of Scientific Collections (DiSSCo) for natural science collections across Europe. Ample examples, opportunities and need for innovation and consolidation for large scale digitisation of natural heritage have been described. The blueprint makes one hundred and four (104) recommendations to be considered by other elements of the DiSSCo Programme of linked projects (i.e., SYNTHESYS+, COST MOBILISE, DiSSCo Prepare, and others to follow) and the DiSSCo Programme leadership as the journey towards organisational, technical, scientific, data and financial readiness continues. Nevertheless, significant obstacles must be overcome as a matter of priority if DiSSCo is to move beyond its Design and Preparatory Phases during 2024. Specifically, these include: Organisational: Strengthen common purpose by adopting a common framework for policy harmonisation and capacity enhancement across broad areas, especially in respect of digitisation strategy and prioritisation, digitisation processes and techniques, data and digital media publication and open access, protection of and access to sensitive data, and administration of access and benefit sharing. Pursue the joint ventures and other relationships necessary to the successful delivery of the DiSSCo mission, especially ventures with GBIF and other international and regional digitisation and data aggregation organisations, in the context of infrastructure policy frameworks, such as EOSC. Proceed with the explicit aim of avoiding divergences of approach in global natural science collections data management and research. Strengthen common purpose by adopting a common framework for policy harmonisation and capacity enhancement across broad areas, especially in respect of digitisation strategy and prioritisation, digitisation processes and techniques, data and digital media publication and open access, protection of and access to sensitive data, and administration of access and benefit sharing. Pursue the joint ventures and other relationships necessary to the successful delivery of the DiSSCo mission, especially ventures with GBIF and other international and regional digitisation and data aggregation organisations, in the context of infrastructure policy frameworks, such as EOSC. Proceed with the explicit aim of avoiding divergences of approach in global natural science collections data management and research. Technical: Adopt and enhance the DiSSCo Digital Specimen Architecture and, specifically as a matter of urgency, establish the persistent identifier scheme to be used by DiSSCo and (ideally) other comparable regional initiatives. Establish (software) engineering development and (infrastructure) operations team and direction essential to the delivery of services and functionalities expected from DiSSCo such that earnest engineering can lead to an early start of DiSSCo operations. Adopt and enhance the DiSSCo Digital Specimen Architecture and, specifically as a matter of urgency, establish the persistent identifier scheme to be used by DiSSCo and (ideally) other comparable regional initiatives. Establish (software) engineering development and (infrastructure) operations team and direction essential to the delivery of services and functionalities expected from DiSSCo such that earnest engineering can lead to an early start of DiSSCo operations. Scientific: Establish a common digital research agenda leveraging Digital (extended) Specimens as anchoring points for all specimen-associated and -derived information, demonstrating to research institutions and policy/decision-makers the new possibilities, opportunities and value of participating in the DiSSCo research infrastructure. Establish a common digital research agenda leveraging Digital (extended) Specimens as anchoring points for all specimen-associated and -derived information, demonstrating to research institutions and policy/decision-makers the new possibilities, opportunities and value of participating in the DiSSCo research infrastructure. Data: Adopt the FAIR Digital Object Framework and the International Image Interoperability Framework as the low entropy means to achieving uniform access to rich data (image and non-image) that is findable, accessible, interoperable and reusable (FAIR). Develop and promote best practice approaches towards achieving the best digitisation results in terms of quality (best, according to agreed minimum information and other specifications), time (highest throughput, fast), and cost (lowest, minimal per specimen). Adopt the FAIR Digital Object Framework and the International Image Interoperability Framework as the low entropy means to achieving uniform access to rich data (image and non-image) that is findable, accessible, interoperable and reusable (FAIR). Develop and promote best practice approaches towards achieving the best digitisation results in terms of quality (best, according to agreed minimum information and other specifications), time (highest throughput, fast), and cost (lowest, minimal per specimen). Financial Broaden attractiveness (i.e., improve bankability) of DiSSCo as an infrastructure to invest in. Plan for finding ways to bridge the funding gap to avoid disruptions in the critical funding path that risks interrupting core operations; especially when the gap opens between the end of preparations and beginning of implementation due to unsolved political difficulties. Broaden attractiveness (i.e., improve bankability) of DiSSCo as an infrastructure to invest in. Plan for finding ways to bridge the funding gap to avoid disruptions in the critical funding path that risks interrupting core operations; especially when the gap opens between the end of preparations and beginning of implementation due to unsolved political difficulties. Strategically, it is vital to balance the multiple factors addressed by the blueprint against one another to achieve the desired goals of the DiSSCo programme. Decisions cannot be taken on one aspect alone without considering other aspects, and here the various governance structures of DiSSCo (General Assembly, advisory boards, and stakeholder forums) play a critical role over the coming years.
Article
Documenting, naming and classifying the diversity of life on Earth provides baseline information on the biosphere, which is crucially important to understand and mitigate the global changes of the Anthropocene. We should meet three main challenges, using new technological developments without throwing the well-tried and successful foundations of Linnaean nomenclature overboard. 1. Fully embrace cybertaxonomy, machine learning and DNA taxonomy to ease, not burden the workflow of taxonomists. 2. Emphasize diagnosis over description, images over words. 3. Understand promises and pitfalls of omics approaches to avoid taxonomic inflation.
Article
Full-text available
More and more herbaria are digitising their collections. Images of specimens are made available online to facilitate access to them and allow extraction of information from them. Transcription of the data written on specimens is critical for general discoverability and enables incorporation into large aggregated research datasets. Different methods, such as crowdsourcing and artificial intelligence, are being developed to optimise transcription, but herbarium specimens pose difficulties in data extraction for many reasons. To provide developers of transcription methods with a means of optimisation, we have compiled a benchmark dataset of 1,800 herbarium specimen images with corresponding transcribed data. These images originate from nine different collections and include specimens that reflect the multiple potential obstacles that transcription methods may encounter, such as differences in language, text format (printed or handwritten), specimen age and nomenclatural type status. We are making these specimens available with a Creative Commons Zero licence waiver and with permanent online storage of the data. By doing this, we are minimising the obstacles to the use of these images for transcription training. This benchmark dataset of images may also be used where a defined and documented set of herbarium specimens is needed, such as for the extraction of morphological traits, handwriting recognition and colour analysis of specimens.
Article
Full-text available
Access to digitised specimen data is a vital means to distribute information and in turn create knowledge. Pooling the accessibility of specimen and observation data under common standards and harnessing the power of distributed datasets places more and more information and the disposal of a globally dispersed work force, which would otherwise carry on its work in relative isolation, and with limited profile and impact. Citing a number of higher profile national and international projects, it is argued that a globally coordinated approach to the digitisation of a critical mass of scientific specimens and specimen-related data is highly desirable and required, to maximize the value of these collections to civil society and to support the advancement of our scientific knowledge globally.
Article
Full-text available
The Global Legume Diversity Assessment (GLDA) proposes the legume family (Fabaceae or Leguminosae) – one of the largest and economically important plant families – as a target for a global botanical diversity assessment project. Where in the Neotropics and Africa legumes dominate the rain forest in terms of diversity and abundance, the Dipterocarpaceae claim this role in South East Asia and on Sundaland in particular. This raises the question whether legumes are an indicator for overall botanical diversity on Sundaland? To answer this question we use the largest compiled database of collection records of the region and species distribution modelling techniques. As a proxy for total botanical diversity we selected seven plant families; Dipterocarpaceae, Ericaceae, Fagaceae, Lauraceae, Moraceae, Myristicaceae, and Sapindaceae. Although the legumes were the most diverse family, the predictive power of legume diversity for overall botanical diversity was poor. This related to the fact that the other seven selected families largely represent trees, whereas legume species more equally represent all different growth forms. After assigning individual legume species to different growth habits (tree, liana, herb, miscellaneous) we were able to predict 78% of the variance in botanical diversity on Sundaland. The lianas represent the single growth habit that best predicted (66%) the variance in botanical diversity. The herb- and miscellaneous growth habits had an inverse relationship to botanical diversity. Legumes can be used as a predictor of overall botanical diversity in tropical and seasonal rain forests, but the relationship should be fitted for different biogeographic regions individually.
Article
Full-text available
Concern about biodiversity loss has led to increased public investment in conservation. Whereas there is a widespread perception that such initiatives have been unsuccessful, there are few quantitative tests of this perception. Here, we evaluate whether rates of biodiversity change have altered in recent decades in three European countries (Great Britain, Netherlands and Belgium) for plants and flower visiting insects. We compared four 20-year periods, comparing periods of rapid land-use intensification and natural habitat loss (1930-1990) with a period of increased conservation investment (post-1990). We found that extensive species richness loss and biotic homogenisation occurred before 1990, whereas these negative trends became substantially less accentuated during recent decades, being partially reversed for certain taxa (e.g. bees in Great Britain and Netherlands). These results highlight the potential to maintain or even restore current species assemblages (which despite past extinctions are still of great conservation value), at least in regions where large-scale land-use intensification and natural habitat loss has ceased.
Article
Full-text available
Whole-drawer imaging is shown to be an effective tool for rapid digitisation of large insect collections. On-line, Whole-drawer images facilitate more effective collection management, virtual curation, and public engagement. The Whole-drawer imaging experience at the Australian National Insect Collection is discussed, with an explanation of workflow and examples of benefits.
Article
Full-text available
Traditional approaches for digitizing natural history collections, which include both imaging and metadata capture, are both labour- and time-intensive. Mass-digitization can only be completed if the resource-intensive steps, such as specimen selection and databasing of associated information, are minimized. Digitization of larger collections should employ an "industrial" approach, using the principles of automation and crowd sourcing, with minimal initial metadata collection including a mandatory persistent identifier. A new workflow for the mass-digitization of natural history museum collections based on these principles, and using SatScan® tray scanning system, is described.
Article
Full-text available
By the end of 2009 the Dutch Government awarded the establishment of NCB Naturalis with €30M funding. The amount is invested in three programs: Scientific Infrastructure for DNA Barcoding, Integration and Relocation of collections and Collection Digitisation. In this article we describe the highlights of the Digitisation Programme.
Article
Full-text available
Digitarium is a joint initiative of the Finnish Museum of Natural History and the University of Eastern Finland. It was established in 2010 as a dedicated shop for the large-scale digitisation of natural history collections. Digitarium offers service packages based on the digitisation process, including tagging, imaging, data entry, georeferencing, filtering, and validation. During the process, all specimens are imaged, and distance workers take care of the data entry from the images. The customer receives the data in Darwin Core Archive format, as well as images of the specimens and their labels. Digitarium also offers the option of publishing images through Morphbank, sharing data through GBIF, and archiving data for long-term storage. Service packages can also be designed on demand to respond to the specific needs of the customer. The paper also discusses logistics, costs, and intellectual property rights (IPR) issues related to the work that Digitarium undertakes.
Article
Full-text available
A survey on the challenges and concerns involved with digitizing natural history specimens was circulated to curators, collections managers, and administrators in the natural history community in the Spring of 2009, with over 200 responses received. The overwhelming barrier to digitizing collections was a lack of funding or issues directly related to funding, leaving institutions mostly responsible for providing the necessary support. The uneven digitization landscape leads to a patchy accumulation of records at varying qualities, and based on different priorities, ultimately influencing the data's fitness for use. The survey results also indicated that although the kind of specimens found in collections and their storage can be quite variable, there are many similar challenges across disciplines when digitizing including imaging, automated text scanning and parsing, geo-referencing, etc. Thus, better communication between domains could foster knowledge on digitization leading to efficiencies that could be disseminated through documentation of best practices and training.
  • K Pachauri
  • L Meyer
K. Pachauri and L. Meyer (ed.). Climate Change 2014, Synthesys Report. (IPCC, 2014).