PresentationPDF Available

Long Term Digital Preservation Storage Infrastructures for Libraries, Archives and Research Institutions

Authors:

Abstract

This presentation overviews the landscape for digital preservation storage infrastructures for libraries, archives and research institutions. The presentation introduces the topic of digital preservation storage and unique existing library models and characteristics. It then pragmatically focuses upon how to get started with a working group and methodology following the example of Texas State University Libraries and discussing background and history, middleware necessary (Archivematica), storage needs estimates and consortial possibilities (Texas Digital Library). The presentation then overviews recommendation methods through environmental scans and peer groups and narrowing focus necessary to arrive at a digital storage recommendation and solution. Final Candidates for long term digital storage solutions are reviewed with a cost/benefit analysis and outlining of options and considerations and benefits of various existing services: Preservica, In House Solutions, Amazon Web Services (Glacier, S3, Chronopolis, Duracloud, Lyrasis, Duraspace, Texas Digital Library). The presentation concludes with a glance at larger rationales for long term digital storage. This overview will be suitable for any institution thinking about adopting and implementing 'long term' digital preservation storage solution and the methodologies and tools needed to get started
Long Term
Digital Preservation
Storage Infrastructures
for Libraries, Archives and
Research Institutions
Ray Uzwyshyn, Ph.D. MBA MLIS
Director Collections and Digital Services
Texas State University Libraries
http://rayuzwyshyn.net , r_u@txstate.edu
February, 2020
What is Library Digital
Preservation Storage?
Simply put, Very long-term storage.
The University Libraries, The Wittliff
Collections and University Archives
increasingly collect and gather digital
information, media and data.
This data requires longer term storage
in line with Research library national
standards (ISO standards: 16363, 16919,
14721)and longer-term new millenia
archival perspectives
Digital Preservation in Research Libraries
follows a Unique Library Model
3-Legged Stool Model
Organization
Leverages existing human resources in libraries to
build on their archival/stewardship expertise for the
digital age
Technology
Synthesizes Technological Capabilities to meld with
Traditional Library Archival/Collection Preservation
Models
Resources
Utilize Both Library Human Resources and Library
Network resources.
Anne Kenney/Nancy McGovern, 2007
Unique Characteristics of
Long-Term Digital Preservation Migration and Preservation of
Formats for Long Term Storage
(Normalization)
Risk Mitigation for Data and
Content. Multiple bit-level copies,
stored in disparate locations
geographically, administratively, and
technologically.
Leverages the libraries’ role and
in academic environments as
keeper of the scholarly record in
a digital sphere
Texas State University Libraries
Digital Preservation Working Group
Background & History
Formed 2015 and consists of members of Libraries Digital
and Web Services (Digitalization Lab, Institutional
Repositories) University Archives, Wittliff Collections, Library
General Collections
Group began by investigating and then authoring the
Libraries’ first Digital Preservation Policy Document (August
2016), benchmark minimums for preservation Masters etc.
Created Dedicated Local Server Space for Preservation Files
and Use Files with TR
Opened and Developed an ongoing relationship with
Windows Team (Todd)
2016-2018 New Digital Preservation Tools,
Platforms and Resources Became Available
Archivematica: Middleware standard for Digital
Preservation Metadata and Integrity
Archivematica bundles micro-services for normalizing files, managing
metadata and verifying file types, bit-level integrity (checksums) etc.
Arch
Texas State Began R&D with Archivematica on Linux
Ubuntu and first deployed production level instance
on a new Archivematica Linux Red Hat platform
University Archives and Wittliff Collections began
experimenting with, learning and utilizing Software
All areas gained expertise in Metadata/middleware
workflow process (Archivematica) to create AIP’s
(Archival Information Packages) to safely store,
archive and retrieve files and metadata for later use
Digital Preservation Group
Conducted Initial Digital
Storage Needs Estimate (2016)
Conclusions: 10-12 TB/year for all access files needed
(Not permanent Digital Storage, requiring now 60-70 TB)
University Archives:
Thesis project: 500 GB per year
Yearbook/Football negatives: 235GB per year
San Marcos Daily Record Negatives 1500 GB per year
Audio digitization: 500 GB per year.
Misc imaging: 500GB per year
Wittliff Collections:
Unique digitization projects. Lonesome Dove Dailies (20 TB), Powers (10 TB) , Broyles (300
GB). Jerry Jeff Walker 2# reel tapes .
O’Connor Collection/New Major Donation example (2TB).
Austin Film Festival: 1.5 TB per year, (2+ years).
Misc imaging: 2 TB per year
Audio digitization: Wittliff: 200 GB / year
General Collections:
Streaming media archive: 2 TB per year, General Collections (Covered by LOCKSS, PORTICO
Memberships)
2016-2018 Texas Digital Library
Forms First State Digital Preservation
Resource Infrastructure
2016 TDL Preservation Services Initiated
(Hires Courtney Mumma to Focus on
State Digital Preservation Services
2016 TDL Forms Alliance with DuraCloud
(Digital Preservation focused Non-Profit
Duracloud @ TDL )
2017 TDL Creates Digital Preservation Services
Members receive “Space” in DuraCloud@TDL for
ingesting content, based on membership level.
2018 Texas wide TDL Archivematica Users Group
Formed
2018-2019
Digital Preservation Working Group
Storage Recommendation Charge
Charge
Methodology
Conduct Environmental
Scan: to Identify Library Digital
Preservation Storage Options
Compare Texas Peer Groups
(TDL) and National Best
Practices for Research Libraries
Narrow The Focus to
pragmatic options suitable for
University Libraries Needs
Forward Recommendation:
for AVP and VPIT Review and
Approval
2019
Digital Preservation
Storage Focus
Investigation begins into various Historic,
Library Centered, University and
Commercial Solutions
Growing recognition of permanent digital
preservation storage needs
Growing recognition that Resource
Possibilities are maturing and widely
available both commercially and in the
library space
Possible solutions ranged from new to
previous model and In-House to
Outsourcing possibilities
Environmental Scan
Digital Preservation Solutions (Peer Institutions)
Texas Peer
Institutions
University of
Texas at San
Antonio
University of
Houston
UT Rio
Grande
Valley
University
of Texas
(Austin)
Texas A & M
University
Digital
Preservation
Solutions
Duracloud
Directly (not via
Texas Digital
Library, TDL)
Amazon S3
and Glacier
Directly (Not
via Texas
Digital Library,
TDL)
Chronopolis
via DuraCloud
through TDL
LTO Tape,
moving to
Texas
Advanced
Computing
Center
Chronopolis
and Amazon via
Duracloud
@
TDL
Three Final Candidates for Texas State
University Preservation Storage
Option 1: Outsource Preservation Digital Storage
Preservica
Option 2: In-House Texas State Data Center Solution
files.txstate.edu
Option 3: Duracloud through Texas Digital Library
Options
AmazonS3
Amazon Glacier
Chronopolis
Option 1: Outsource
(All in One Outsource Option, Preservica)
Benefits
Considerations
Preservica
creates AIP’s
(Archival Information Packages,
Metadata) and provides all
technology set
-up and support
Costs: $35,000.00/year for
20TB
Established Archival Best
Practices
No local control or entrance to
underlying technology (black
box)
Recognized Library Peer and
Community of Practice
Variable Response to Local
Needs (similar considerations
to @mire)
Option 2: In House
Expand TR/Texas State Data Center Relationship
-related expertise or best
-12 TB/year
-day window for recovery is currently not
Option 3:
Duracloud through TDL
(Texas Digital Library)
to Chronopolis Option
Chronopolis: Geographically Distributed
Preservation Network
UC San Diego
National Center for Atmospheric Research
University of Maryland, Institute for Advanced Computing
Studies
Texas Digital Library/TACC
Benefits
Considerations
Geographic Distribution at any
3 technologically diverse
partner nodes
Subscription cost:
$2500 annual fee includes
2TB/year storage and ingest
$1000 initial setup (1st year
only)
Non
-Commercial solution
rooted in libraries and cultural
heritage community
Storage $165/year/additional
TB
$120 ingest fee/additional TB
Library community of practice
around this
(TDL/
Duracloud/Chronopolis)
Significant Human resources/time
investment for initial technological
integration
File Fixity and Data Integrity
processes are transparent
Option 3: Duracloud Through
the Texas Digital Library (TDL)
Duracloud is a hosted middleware service from
DuraSpace that lets organizations control where and
how digital content is preserved.
The parent organization Duraspace is a non-profit
organization providing academic library leadership
for open source technologies focused upon durable,
persistent access to digital data. (i.e. Fedora,
Dspace).
Currently, Duraspace is part of Lyrasis, a
longstanding library related organization supporting
libraries and technology initiatives
Option 3: Duracloud Through
the Texas Digital Library (TDL)
Duracloud would be administered through our TDL
membership with these consortial relationships,
advantages (usergroups, networks etc) and constraints
The Texas Digital Library is a Consortial Organization
consisting of 22 Texas University Library Organizations
Focused on enabling Texas Libraries Digital
Infrastructure and new digital technology Projects.
Option 3: Duracloud through TDL
Duracloud through TDL/Amazon S3 and Glacier Option
Benefits
Considerations
TDL possesses established community of
practice.
Part of
Duracloud Suite
Commercial: not tailored to cultural heritage
institutions. Does not meet requirements
for
geographic, administrative and technological
distribution
S3 suitable for streaming, dynamic
access or Glacier for long
-term dark
archive needs
File fixity and data integrity is a black box (process
hidden from owners)
Subscription cost
$2500 annual fee includes 2TB/year
$1000 initial setup (1st year only)
S3 $265/year per additional TB
Glacier $50 / year per additional TB
HR/Time Investment for Initial Technological
Integration
Digital Preservation
Storage Working Group
Final Recommendation
Chronopolis via DuraCloud
through TDL (Texas Digital Library)
Provides strong library support through four
academic library focused organizations
(Chronopolis, Duraspace, TDL, Lyrasis) for long
term viability and peer support networks
Anticipated Budgetary Request:
Year 1: $3500.00 ($2500.00 TDL
Preservation/year, $1000.00 Initial Set-
up/Onboarding, Includes 2 TB Storage)
Year 2-3: $2785.00/year (includes
additional 1 TB storage/year)
Review Storage and Staff Needs Annually.
Deeper Rationale
For Long Term
Digital Preservation
Storage Infrastructure
New Level of Service Expected by Donors,
Researchers, Faculty and students.
Present Area of Focus for Research Libraries
Connects Library with many State and National
Library Technology Organizations focused on these
Issues (TDL, Texas Digital Library, CNI, Coalition of
Network Information, JISC, LITA Library Information
Technology Association, Chronopolis, Duraspace)
Places Texas State Libraries in Line with institutions
we have joined and are aspiring towards (GWLA,
Greater Western Library Association and ARL,
Association of Research Libraries)
Questions?
Preprint
Full-text available
In the new millennium, long term digital preservation infrastructures have become important areas for libraries and memory institutions. Academic libraries have particularly taken these areas to heart because of their unique historical role as stewards of knowledge and our collective memory. This article overviews and pragmatically focuses on building frameworks for digital preservation storage infrastructures in academic libraries, what these frameworks are, libraries’ unique digital preservation models and a best practice model currently implemented at Texas State University Libraries. The work discusses requirements for forming a Digital Preservation Working Group, surveying standard digital preservation tools (i.e. Archivematica, new cloud based storage space models) and how to conduct a local digital storage needs estimate. Necessities of investigating both consortial and commercial possibilities (Amazon Web Services, Duracloud, Texas Digital library) are discussed. Processes involved in conducting an environmental scan to make a storage provider recommendation are reviewed. Options such as outsourcing, working in house, staff needs and hybrid option combinations are weighed in a comparative fashion. Deeper rationale for long term digital preservation storage will be reviewed as well as special considerations needed by Archives and Special Collections.
ResearchGate has not been able to resolve any references for this publication.