ArticlePDF Available

The Problem with Dates: Applying ISO 8601 to Research Data Management

Authors:

Abstract

Dates appear regularly in research data and metadata but are a problematic data type to normalize due to a variety of potential formats. This suggests an opportunity for data librarians to assist with formatting dates, yet there are frequent examples of data librarians using diverse strategies for this purpose. Instead, data librarians should adopt the international date standard ISO 8601. This standard provides needed consistency in date formatting, allows for inclusion of several types of date-time information, and can sort dates chronologically. As regular advocates for standardization in research data, data librarians must adopt ISO 8601 and push for its use as a data management best practice.
Journal of eScience Librarianship
'85=6. B;;=. :<2,5.

#e Problem with Dates: Applying ISO 8601 to
Research Data Management
Kristin A. Briney
University of Wisconsin - Milwaukee
Corresponding Author(s)
:2;<27:27.@*<*%.:>2,.; 2+:*:2*7&72>.:;2<@8/(2;,87;27!25?*=4..*:</8:->.
!25?*=4..(+:27.@=?6.-=
8558?<12;*7-*--2<287*5?8:4;*< 1F9;.;,185*:;129=6*;;6.-.-=3.;52+
#*:<8/<1. *<*580270*7-!.<*-*<*86687;7/8:6*<287 2<.:*,@86687;*7-<1.
%,185*:5@866=72,*<28786687;
D2;?8:42;52,.7;.-=7-.:* :.*<2>.86687;F:2+=<287 2,.7;.
D2;6*<.:2*52;+:8=01<<8@8=+@.%,185*:;129&!!%<1*;+..7*,,.9<.-/8:27,5=;287278=:7*58/.%,2.7,. 2+:*:2*7;129+@*7*=<18:2A.-
*-6272;<:*<8:8/.%,185*:;129&!!%8:68:.27/8:6*<28795.*;.,87<*,< 2;*#*56.:=6*;;6.-.-=
$.,866.7-.-2<*<287
:27.@:2;<27D.#:8+5.6?2<1*<.;995@270%"<8$.;.*:,1*<*!*7*0.6.7< Journal of eScience
Librarianship . 1F9;-828:03.;52+
#e Problem with Dates: Applying ISO 8601 to Research Data
Management
Keywords
;<*7-*:-;-*<*27/8:6*<28752<.:*,@-*<.<26./8:6*F270-*<*;<*7-*:-;
Creative Commons License
D2;?8:42;52,.7;.-=7-.:* :.*<2>.86687;F:2+=<287 2,.7;.
Rights and Permissions
89@:201<:27.@C
Acknowledgments
D.*=<18:<1*74;)*;6..7%18:2;1/8:1.:>*5=*+5./..-+*,487*-:*E8/<12;,866.7<*:@
D2;,866.7<*:@2;*>*25*+5.278=:7*58/.%,2.7,. 2+:*:2*7;129 1F9;.;,185*:;129=6*;;6.-.-=3.;52+>852;;
Journal of eScience Librarianship
e1147 | 1
ISSN 2161-3974 JeSLIB 2018; 7(2): e1147
doi:10.7191/jeslib.2018.1147
Correspondence: Kristin Briney: briney@uwm.edu
Keywords: st andards, data informat ion literacy, date-time formatting, data standards
Rights and Permissions: Copyright Briney © 2018
Commentary
The Problem with Dates: Applying ISO 8601 to Research Data Management
Kristin Briney
University of Wisconsin-Milwaukee, Milwaukee, WI, USA
All content in Journal of eScience Librarianship, unless otherwise noted, is licensed under
a Creative Commons Attribution 4.0 International License.
Abstract
Dates appear regularly in research data and metadata but are a problematic data type to
normalize due to a variety of potential formats. This suggests an opportunity for data librarians
to assist with formatting dates, yet there are frequent examples of data librarians using diverse
strategies for this purpose. Instead, data librarians should adopt the international date standard
ISO 8601. This standard provides needed consistency in date formatting, allows for inclusion
of several types of date-time information, and can sort dates chronologically. As regular
advocates for standardization in research data, data librarians must adopt ISO 8601 and push
for its use as a data management best practice.
Journal of eScience Librarianship
e1147 | 2
The Problem with Dates JeSLIB 2018; 7(2): e1147
doi:10.7191/jeslib.2018.1147
Dates are a common element of managing research data. Researchers regularly record dates
as data points, write dates in research notebooks, label observations by date, and
communicate dates to collaborators. Dates also represent a significant hurdle in data cleaning
due to inconsistent and culturally specific formatting. For example, depending on where you
are in the world, “9/1/91” can represent either September 1, 1991 or January 9, 1991. The
same date may also be written “Sept 1, 1991,” “01-09-1991,” “1.Sep.1991,” etc. Normalizing
dates is an annoyance, yet not an uncommon issue when working with research data.
Data librarians use a variety of strategies for managing and normalizing dates. This represents
a huge gap in our data management toolkit, given the prevalence of date data and our
expertise with standardization. Date-time formatting should be considered within the suite of
regular research data management advice that data librarians dispense. This commentary
asserts that data librarians should adopt the international date standard, ISO 8601
(International Organization for Standardization 2004), to format dates and liberally advise
researchers to do the same.
As librarians, we are familiar with standards and it should come as no surprise that a standard
exists for formatting dates. ISO 8601 was first developed in 1988, bringing together several
existing ISO standards for date and time. It is currently in its third edition, dating from 2004,
with updates expected in the near future. Other ISO 8601-based date and time standards
exist, such as the W3 Note on Date and Time Formats (Wolf and Wicksteed 2018) and
RF3339 (Internet Engineering Task Force 2002), with more non-ISO 8601 standards within
specific cultures and software tools.
There are many benefits to using a consistent date format and ISO 8601 in particular.
Consistent dates are easier to process and easier to reformat, if necessary, and can reduce
ambiguity regarding the exact date to which a value refers. ISO 8601 is an internationally
recognized standard that can be used to create that consistency. The standard comes with
added benefits that the format is extensible, allows for sorting, and enables mathematical
comparison between dates. For extensibility, the standard actually consists of several different
variants under one umbrella standard, allowing researchers to also include extra information
like time (more on this below). With respect to sorting, ISO 8601 formatted dates sort
chronologically as information is ordered from largest unit of time to smallest; this gives the
standard an edge in usability. Finally, ISO 8601 expresses all date information numerically
which facilitates easier calculation and comparison when using dates as data. Given the
prevalence of dates in research data, ISO 8601 is a natural standard to adopt.
As mentioned above, ISO 8601 is a standard comprised of several variants. The most readily
adoptable are the date formats YYYY-MM-DD or YYYYMMDD. So September 1, 1991 would
be written as either 1991-09-01 or 19910901. Both are acceptable under the ISO 8601
standard, though the version with dashes is more human readable. Adoption of one or the
other may also depend on software requirements or character limitations.
A few other useful formats under the ISO 8601 umbrella include:
 Year and month: YYYY-MM (e.g. 1991-09)
 Year: YYYY (e.g. 1991)
Journal of eScience Librarianship
e1147 | 3
The Problem with Dates JeSLIB 2018; 7(2): e1147
doi:10.7191/jeslib.2018.1147
 Date and time: YYYY-MM-DDTHH:MM:SS (e.g. 1991-09-01T11:00:00)
 Year and week: YYYY-Www (e.g. 1991-W35)
 Year, week, and day: YYYY-Www-D (e.g. 1991-W35-7)
 Year and ordinal day: YYYY-DDD (e.g. 1991-244)
Note that the week starts on Monday and time uses a 24-hour clock. It is too much to cover
every ISO 8601 variation in this short commentary—see the standard itself for more
specifics—but this list of most useful variations highlights the standard’s breadth. While
YYYY-MM-DD is probably the most commonly used format, specific research needs will dictate
the use of other variants.
In practice, applying the ISO 8601 standard leads to it being used in both research data and
metadata. ISO 8601 has the benefit that all date information is expressed numerically, allowing
for easier calculation, smoother analysis, and comparison between date values. Additionally,
some software packages expect dates in the ISO 8601 format, such as in the “lubridate” library
in R (Grolemund and Wickham 2011). The one analysis tool that will likely be the most
challenging when working with ISO 8601 is Excel. Excel has a long history of mangling dates
(Bahlai 2014; Woo 2014; Kosmala 2016; Broman and Woo 2018) and even interpreting
non-date data as dates (Ziemann, Eren, and El-Osta 2016), so it should not be surprising that
its date problems extend to ISO 8601. There are a few strategies for working with ISO
8601-formatted dates in Excel. First, the cells can be reformatted into ISO 8601, though this is
a cosmetic change and can easily be reverted (this is because reformatting only alters the
display and not the underlying configuration in which Excel stores date information). Second,
dates can be represented as YYYYMMDD and interpreted by Excel as an 8-digit number.
Finally, date parts can be stored in separate columns, one each for year, month, and day. The
latter represents the best option as it is least likely to be mangled yet the information remains
readily computable (Bahlai and Pawlik 2016). Always refer to the specifics of your preferred
analysis tool for how it does or does not support ISO 8601-style dates.
Dates in metadata are another important use of ISO 8601. In many cases these dates act like
dates appearing in a dataset, as discussed in the previous paragraph, but there is a special
case worth further consideration: dates in file names. ISO 8601 and file names are a match
made in heaven. The reason for this is that 8601-formatted dates sort chronologically. In
combination with a consistent file naming scheme, this makes for wonderfully organized files.
One useful example is in the file names of meeting notes, such as “Meeting_2018-10-31.docx.”
Given a whole group of such files, it is simple to sort and scan through documents to find what
one needs.
While ISO 8601 has many uses within research data management, it isn’t perfect. One
problem is that ISO 8601 is based on the western, Gregorian calendar, which is not used in all
countries. Additionally, while the standard can theoretically handle BCE (Before the Common
Era) dates, it is not an ideal format for this information. Moreover, few people are familiar with
ISO 8601, which may lead to date confusion. This is compounded by the fact that some
8601-formatted dates are less human readable.
Journal of eScience Librarianship
e1147 | 4
The Problem with Dates JeSLIB 2018; 7(2): e1147
doi:10.7191/jeslib.2018.1147
As data librarians advise researchers to adopt more standardized workflows, we should not
forget to apply date standards to this work. ISO 8601 is a natural partner for research data
management, yet there are many examples of data librarians not utilizing this standard. I have
adopted ISO 8601 liberally in my own data, my own file names, and in committees to which I
belong, and will never go back—the benefits I reap from readily scanned file names and easily
analyzed dates are simply too great. I therefore urge my peers to learn the benefits of this
standard themselves and, in turn, advocate for its adoption with the researchers they advise.
Acknowledgments
The author thanks Yasmeen Shorish for her valuable feedback on a draft of this commentary.
Disclosure
The author reports no conflict of interest.
References
Bahlai, Christie. 2014. “Dealing with Dates as Data in Excel.” Practical Data Management for Bug Counters.
https://practicaldatamanagement.wordpress.com/2014/07/02/dealing-with-dates-as-data-in-excel
Bahlai, Christie, and Aleksandra Pawlik. 2016. “Data Organization in Spreadsheets: Dates as Data.”
Data Carpentry. https://datacarpentry.org/spreadsheet-ecology-lesson/03-dates-as-data
Broman, Karl W., and Kara H. Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician
72(1): 2-10. https://doi.org/10.1080/00031305.2017.1375989
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with Lubridate.” Journal of
Statistical Software 40(3): 1-25. http://dx.doi.org/10.18637/jss.v040.i03
International Organization for Standardization. 2004. “ISO 8601:2004.” https://www.iso.org/standard/40874.html
Internet Engineering Task Force. 2002. “RF3339.” https://www.ietf.org/rfc/rfc3339.txt
Kosmala, Margaret. 2016. “Beware This Scary Thing Excel Can Do to Your Data!” EC0L0GY B1TS.
http://ecologybits.com/index.php/2016/07/06/beware-this-scary-thing-excel-can-do-to-your-data
Wolf, Misha, and Charles Wicksteed. 2018. “Date and Time Formats.” W3C. https://www.w3.org/TR/NOTE-datetime
Woo, Kara. 2014. “Abandon All Hope, Ye Who Enter Dates in Excel – UC3:: California Digital Library.” UC3 Blog.
https://uc3.cdlib.org/2014/04/09/abandon-all-hope-ye-who-enter-dates-in-excel
Ziemann, Mark, Yotam Eren, and Assam El-Osta. 2016. “Gene Name Errors Are Widespread in the Scientific
Literature.” Genome Biology 17(1): 177. https://doi.org/10.1186/s13059-016-1044-7
... Several general articles have provided overarching recommendations on creating and navigating the DMP process (Briney et al., 2020;Fadlelmola et al., 2021;Michener, 2015;Schiermeier, 2018;Wright, 2016) and identified the challenges for researchers and institutions as the work in the underlying processes (Fadlelmola et al., 2021;Lefebvre et al., 2020). Further, more detailed guidance has included recommendations related to dates (Briney, 2018); spreadsheet management (Broman & Woo, 2018); guidance for reviewers (Fearon et al., 2018); calls for more formal data management instruction integrated into science education (Tenopir et al., 2016). One limitation to the generation of these plans, however, is that the officers at the federal agencies charged with reviewing DMPs have received little training on assessing these (Bishop et al., 2021). ...
... Codes were alphabetic to be as human readable as possible, with site codes consisting of two-letter abbreviations and theme codes consisting of three-letter abbreviations. Dates were always written in ISO 8601 format (Briney, 2018). ...
Article
Full-text available
While data management planning for grant applications has become commonplace alongside articles providing guidance for such plans, examples of data plans as they have been created, implemented, and used for specific projects are only beginning to appear in the scholarly record. This article describes data management planning for an eight-institution, multi-year research project. The project leveraged four data management plans (DMP) in total, one for the funding application and one for each of the three distinct project phases. By understanding researcher roles, development and content of each DMP, team internal and external challenges, and the overall benefits of creating and using the plans, these DMPs provide a demonstration of the utility of this project management tool.
... files that contain a header title. If the column section contains dates, it should be in '2020-05-01' format because, according to ISO8601 [10], this web application only covers the date and time on the x-axis, and date or time representation cannot include words with no specified numerical meaning in the standards. ...
Article
Statistical process control (SPC) implementation plays a major role in quality assurance during the manufacturing process. Nevertheless, the adoption rate of SPC commercial software solutions is unsatisfactory in most Malaysian manufacturing companies due to high software subscription costs and difficulties in applying the software without proper know-how, guidance, and training. This study proposes the development of a purpose-built interactive data visualization web application for rapid SPC analysis in the manufacturing industry using open-sourced software packages. An agile software development model is applied as the software development methodology. In the requirement phase, an interview session was conducted to identify project requirements among stakeholders, i.e. industrial practitioners that are involved with SPC analysis. Based on the feedback and expectations from stakeholders, a design of a web application for SPC analysis that incorporates interactive parameter settings and automated reporting was proposed. The web application was developed using the R programming language and the Shiny package library, and deployed at ShinyApps.io, a web service provider. For evaluation, a usability testing procedure was designed and conducted with five industrial SPC practitioners to determine the usefulness of the web application. The outcome of the usability testing indicated positive results and feedback from evaluators. In conclusion, the developed web-app can assist users, particularly from the manufacturing industry sectors, to perform fast SPC data analytics, visualization, and reporting with ease.
Article
Full-text available
On the Ground •High-quality rangeland data are critical to supporting adaptive management. However, concrete, cost-saving steps to ensure data quality are often poorly defined and understood. •Data quality is more than data management. Ensuring data quality requires 1) clear communication among team members; 2) appropriate sample design; 3) training of data collectors, data managers, and data users; 4) observer and sensor calibration; and 5) active data management. Quality assurance and quality control are ongoing processes to help rangeland managers and scientists identify, prevent, and correct errors in past, current, and future monitoring data. •We present 10 guiding data quality questions to help managers and scientists identify appropriate workflows to improve data quality by 1) describing the data ecosystem, 2) creating a data quality plan, 3) identifying roles and responsibilities, 4) building data collection and data management workflows, 5) training and calibrating data collectors, 6) detecting and correcting errors, and 7) describing sources of variability. •Iteratively improving rangeland data quality is a key part of adaptive monitoring and rangeland data collection. All members of the rangeland community are invited to participate in ensuring rangeland data quality.
Article
Full-text available
The importance of research data has grown as researchers across disciplines seek to ensure reproducibility, facilitate data reuse, and acknowledge data as a valuable scholarly commodity. Researchers are under increasing pressure to share their data for validation and reuse. Adopting good data management practices allows researchers to efficiently locate their data, understand it, and use it throughout all of the stages of a project and in the future. Additionally, good data management can streamline data analysis, visualization, and reporting, thus making publication less stressful and time-consuming. By implementing foundational practices of data management, researchers set themselves up for success by formalizing processes and reducing common errors in data handling, which can free up more time for research. This paper provides an introduction to best practices for managing all types of data.
Article
Full-text available
The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1044-7) contains supplementary material, which is available to authorized users.
Article
Full-text available
This paper presents the lubridate package for R, which facilitates working with dates and times. Date-times create various technical problems for the data analyst. The paper highlights these problems and oers practical advice on how to solve them using lubridate. The paper also introduces a conceptual framework for arithmetic with date-times in R.
Article
Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this paper offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, don't leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, don't include calculations in the raw data files, don't use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text file.
Practical Data Management for Bug Counters
  • Christie Bahlai
Bahlai, Christie. 2014. "Dealing with Dates as Data in Excel." Practical Data Management for Bug Counters. https://practicaldatamanagement.wordpress.com/2014/07/02/dealing-with-dates-as-data-in-excel
Data Organization in Spreadsheets: Dates as Data
  • Christie Bahlai
  • Aleksandra Pawlik
Bahlai, Christie, and Aleksandra Pawlik. 2016. "Data Organization in Spreadsheets: Dates as Data." Data Carpentry. https://datacarpentry.org/spreadsheet-ecology-lesson/03-dates-as-data
Beware This Scary Thing Excel Can Do to Your Data!" EC0L0GY B1TS
  • Margaret Kosmala
Kosmala, Margaret. 2016. "Beware This Scary Thing Excel Can Do to Your Data!" EC0L0GY B1TS. http://ecologybits.com/index.php/2016/07/06/beware-this-scary-thing-excel-can-do-to-your-data
Abandon All Hope, Ye Who Enter Dates in Excel -UC3 :: California Digital Library
  • Misha Wolf
  • Charles Wicksteed
Wolf, Misha, and Charles Wicksteed. 2018. "Date and Time Formats." W3C. https://www.w3.org/TR/NOTE-datetime Woo, Kara. 2014. "Abandon All Hope, Ye Who Enter Dates in Excel -UC3 :: California Digital Library." UC3 Blog. https://uc3.cdlib.org/2014/04/09/abandon-all-hope-ye-who-enter-dates-in-excel