ArticlePDF Available

Building a Holistic Taxonomy Model for Open Government Data Related Risks: Based on a Lifecycle Analysis

Authors:

Abstract and Figures

For many government departments, uncertainty aversion is a source of barriers in the advancement of data openness. A more active response to potential risks is needed and necessitates an in-depth examination of risks related to open government data (OGD). With a cross-case study in which three cases from the United Kingdom, the United States and China are examined, this study identifies potential risks that might emerge at different stages of the life cycle of OGD programs and constructs a taxonomy model for them. The taxonomy model distinguishes the "risks from OGD" from the "risks to OGD", which can help government departments make better responses. Finally, risk response strategies are suggested based on the research results.
Content may be subject to copyright.
RESEARCH PAPER
Building a Holistic Taxonomy Model for OGD-Related
Risks: Based on a Lifecycle Analysis
Fang Wang1,2†, An Zhao1, Hong Zhao1,3 & Jun Chu1
1Business School of Nankai University, Tianjin 300071, China
2Center for Network Society Governance of Nankai University, Tianjin 300071, China
3CETC Big Data Research Institute Co. Ltd., Guiyang 550081, China
Keywords: Open government data; Risk; Taxonomy; Life cycle; risk management
Citation: Wang, F., Zhao, A., Zhao, H. & Chu, J. Building a Holistic Taxonomy Model for OGD Related Risks: Based on a lifecycle
analysis. Data Intelligence 1(2019), 1-24. doi: 10.1162/dint_a_00018
Received: Jan. 25, 2019; Revised: May 10, 2019; Accepted: May 18, 2019
ABSTRACT
For many government departments, uncertainty aversion is a source of barriers in the advancement of data
openness. A more active response to potential risks is needed and necessitates an in-depth examination of
risks related to open government data (OGD). With a cross-case study in which three cases from the United
Kingdom, the United States and China are examined, this study identifies potential risks that might emerge
at different stages of the life cycle of OGD programs and constructs a taxonomy model for them. The
taxonomy model distinguishes the “risks from OGD” from the “risks to OGD”, which can help government
departments make better responses. Finally, risk response strategies are suggested based on the research
results.
1. INTRODUCTION
Government information disclosure has played an important role in the democratic development of
human society since Sweden passed the Freedom of the Press Act in 1766. In particular, since the United
States issued the Freedom of Information Act (FOIA) in 1966, many countries have passed laws or regulations
to protect the “right to know” of the public. With the advent of the era of big data, significant attention has
been focused on the value of open government data (OGD) in promoting government transparency and
accountability, public participation and social innovation [1, 2, 3, 4]. More than 70 countries have joined
Corresponding author: Fang Wang (Email: wangfangnk@nankai.edu.cn; ORCID: 0000-0002-2655-9975).
© 2019 Chinese Academy of Sciences Published under a Creative Commons Attribution 4.0 International (CC BY 4.0)
2 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
the “Open Government Partnership” Program till 2016 [5]. In China, 46 Chinese local governments had
launched OGD websites as of May of 2018 [6]. In January 2019, the OPEN Government Data Act of the
United States was signed into law [7].
Nevertheless, along with the worldwide advance of OGD, various barriers have been reported by OGD-
leading countries such asthe UK [8], the Netherland [9], and the USA [10]. A number of studies have
explored various barriers to OGD. We categorize them into the following six classes according to their
perspectives on OGD: (1) the data user perspective, e.g. lack of knowledge of access to the OGD datasets
[11, 12]; (2) the data provider perspective, e.g. institutional barriers [13], basic resources, organizational
arrangement and technical capacity [14], fear of false conclusions [9], economic issues [15], etc; (3) the
data perspective, e.g. fragmented datasets [16], poorly documented metadata [17], poor data quality [13,
18], poor information usability [11], poor machine-processability [19, 20], and complex data formats [21]
etc.; (4) the legislation perspective [13], e.g. difficulties in evaluating the eligibility of a dataset [11],
difficulties in evaluating the privacy sensitivity of a dataset [10, 22], the complexity of data copyright [23],
etc; (5) the technology perspective [10, 13], e.g. the way in which the data are stored, obtained and used
by a department [24]; and (6) the environment perspective, e.g. external pressures [25].
Some of these barriers exist in reality, while others simply derive from an insufficient understanding of
the possible risks. It was reported that a database of registered usernames and email addresses of the data.
gov.uk had been leaked [26]. In 2018 BBC reported that sensitive information of the military has been
disclosed in a data visualization map of a fitness tracking company [27]. Besides, some people were also
concerned that the geospatial data published on data.gov by the Department of Agriculture of the U.S.
might be used to locate crops targeted for eradication via infestation, or even to commit acts of biological
warfare [28]. As shown in above cases, the fear of the potential risks may hinder the advance of OGD. The
culture of risk aversion is one source of existing institutional barriers to OGD [13]. Uncertainty avoidance
plays a negative moderating role in the relationship between other organization resources and the OGD
capacity of government departments [14].
The best strategy for addressing risks is understanding and managing them more effectively, rather than
ignoring or avoiding them. Therefore, we aim to address the following three questions: “What kinds of risks
are involved when government departments implement OGD initiative?”, “How are these risks distributed
over the life cycle of OGD?” and “What strategies could be adopted?.” To answer these questions, we
conduct a cross-case study on three OGD- related cases from three countries, the cease of the care.data
program in the UK, the IRS data breach in the USA and the tardy progress of the OGD program in China.
We then identified 14 risks and categorized them into a taxonomy model which distinguishes “risks to
OGD” from “risks from OGD”. This study deepens the current understanding of OGD related risks and
brings new insights into the mechanism design for advancing OGD.
Data Intelligence 3
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
2. LITERATURE REVIEW AND ANALYTICAL FRAMEWORK
2.1 Risks Associated with OGD
Risk refers to the uncertainty of future results in a given condition and a particular period [29]. The risks
related to OGD that are frequently discussed are those pertaining to data leakage [30, 31, 32], invasion of
privacy [33, 34, 35] and other information security issues. [36] summarized 11 government risks in data
release: copyright, trade secret protection, privacy, the security of the infrastructure, publication of improper
data or information that might lead to negative attitude toward public institutions, inaccurate data,
misinterpretation of the data, absence of data consumers, less willing to cooperate, overlapping of data and
increased number of requests for data. The Office for National Statistics of UK also proposed to better
balance openness with privacy protection [37]. [38] believes that the risks posed to the official statistics
department by big data are also related to mission drift, damage to reputation and the loss of public trust,
inconsistent access and continuity, the fragmentation of approaches across jurisdictions, resource constraints
and cut-backs, privatization and competition, etc. Besides, OGD are also vulnerable to risks in terms of
effectiveness, relevance and trust [39]. [15] categorized the risks that may hinder OGD as those related to
governance, economic issues, licenses and legal frameworks, data characteristics, metadata, access, and
skills. Based on a single case study of Shanghai, [40] summarized the potential risks of OGD from the levels
of legislation, management and data.
2.2 Risk Management Related to OGD
In response to risks faced by government departments, the National Audit Office (NAO) of the UK
proposed embedding risk management into their core decision-making and planning management processes.
The NAO put forth the notion that risk management, including identifying, evaluating, processing and
reporting, can help governments make credible decisions and support innovation [41]. In 2013, a subordinate
of the USA Treadway Commission released the Internal Control-Integrated Framework to address enterprise
risk management in five aspects: controlled environment, risk assessment, control activities, information
and communication, and regulatory activities [42].
[36] proposed mitigation strategies for the identified risks of OGD, including monitoring and assessment
of the demand for data, proper specification of datasets to avoid duplication, compliance assessment, data
anonymization and data aggregation, quality control of data publication, establishment of internal and
external data catalog, linking to datasets already published, properly formulated terms and prompts for data
originating from third parties, clearly explained duties, and continuous monitoring of the impacts of OGD
initiatives. [43] holds an opinion that the application of tax-related data quality control and data technology
is a key element in control of tax risks. In 2017, the British government updated the Data Protection Act
promulgated in 1998 to reinforce the rights of citizens, such as the right to be forgotten, personal data,
privacy information and data migration stipulating that identifying personal information from anonymous
data and tampering with them will result in criminal charges [44]. In 2018, the General Data Protection
Regulation of the EU came into force [45], focusing on protecting and empowering all EU citizens regarding
data privacy and reshaping the way all organizations including governments approach data privacy.
4 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
The afore mentioned regulations and studies have fully discussed risk management strategies associated
with OGD, but they did not address the context in which a specific risk occurs, for example, the stage in
the data life-cycle, its source and consequence, etc. Besides, a single country context may not be sufficient
to reveal the variety of risks in more comprehensive institutional backgrounds. These defects may weaken
the effectiveness of suggested strategies. In view of this, we have conducted a cross-case study in international
context from the perspective of the life cycle of OGD.
2.3 Life-cycle Model of OGD
Life-cycle model can guide the process of opening up data [46]. Life-cycle analysis draws on the
biological analysis method, and divides the development process of objects (e.g. records, data, products,
projects, organizations, etc.), from the stage of generation to that of extinction into several stages. Among
others, OGD, is also subject to changes in its life cycle. An analysis based on the life cycle of OGD can
help identify the risks that exist at different stages. Meanwhile, risk management itself can also be divided
into three phases of latency, occurrence and crisis response [47], which necessitates different measures.
According to [2], the OGD life-cycle comprises three sections, a pre-processing section (data creation,
selection, harmonization, and publishing), an exploitation section (data interlinking, discovery, exploration,
and exploitation), and a maintenance section (data curation). Based on an investigation conducted in the
Netherlands, [46] developed a community-driven open data life-cycle model, which comprises identification
of data, data preparation, data issue, and data reuse and data evaluation. [48] discussed the barriers of
OGD in China based on a data-centered life cycle model, including data organization and processing,
storage and distribution, discovery and acquisition, and appreciation and evaluation.
With different research purposes, the above mentioned OGD life-cycle models are either data management
centered or value realization centered. As the risks of OGD are mainly taken by government departments,
including both the data providers and users, a government centered life cycle model is needed. A life cycle
model of OGD should consist of at least five stages: data creation and collection, data organization, data
release, data utilization and data maintenance.
2.4 Analytical Framework
Based on the literature review above and the life cycle of OGD, we develop an analytical framework
for the subsequent case study, as shown in Figure 1.
Data Intelligence 5
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
Figure 1. Analytical framework for case study based on the OGD lifecycle.
3. RESEARCH DESIGN
3.1 Research Procedure
The present study adopts a cross-case study method. Although previous studies have discussed some
OGD-related risks, they are neither integrated into a whole nor detailed from the perspective of the OGD
lifecycle. To bridge these theoretical gaps, this study adopts the case study method. Compared with a single
case study, a cross-case study is used to make generalization, examine themes, similarities and differences
across cases in quantitative or qualitative analysis. The research design involving multiple cases is generally
regarded as more robust than that of a single case study, as it provides the observation and analysis of a
phenomenon in several settings [49].
In this study, an analysis across three OGD cases from different countries is conducted. A content analysis
method is also used to examine the life-cycle distributions of OGD related risks. The research steps are as
follows:
1). Select three cases of OGD in the U.S., the UK and China based on their theoretical potentials and
representativeness in different stages of the OGD lifecycle.
2). Collect case data through the internet search and semi-structured interviews;
3). Identify the risks associated with OGD using within-case and cross-case analysis;
4). Categorize all identified risks into a taxonomy model;
5). Analyze the distributions of all risks over five stages of the OGD lifecycle;
6). Suggest countermeasures for OGD risk management.
6 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
3.2 Case Selection
Three cases are selected for their theoretical richness and representative ability to answer the research
questions and address the different stages of OGD programs. They are the UK healthcare data program
“care.data,” the American Internal Revenue Service (IRS) data breach event and the Open Data initiative
of Shanghai, China. The “care.data” case is typical, in that it reveals the risks in data collection and sharing.
The IRS data breach is not a typical OGD case, but it reveals the mismanagement and malicious use of
government data; it is therefore representative in revealing the risks that are characteristic of the OGD
maintenance phase. The case of the Shanghai OGD is of typical significance, as it reveals the risk concerns
of government departments at the early stage of an OGD initiative. The three cases, which are both country-
and industry-specific, focusing on different phases of the OGD lifecycle, can jointly support the identification
of risks in the entire lifecycle.
3.3 Data Collection
Case data are collected via internet search and semi-structured interview. The former applies to cases
from the UK and the USA, and covers news reports and online commentaries; the latter applies to the case
of China and covers three one-to-one in-depth interviews and one focused on group interview with the
heads of seven government departments. All the interviews lasted for nine hours in total, and 55,600
Chinese words were transcribed within one week of the interviews.
3.4 Data Analysis
A bottom-up coding approach combined with a cross-case comparison is adopted to analyze the data.
The data analysis is a continuous process, starting with data collection. First, all concepts or entities related
to risks or real harms are identified; they are then compared and categorized within the case. Second, the
results of three cases are listed analyzed, compared and categorized. Third, the sources and consequences
of all risks are analyzed. Fourth, a taxonomy model of OGD-related risks is constructed. Finally, the
distribution of the risks over the five stages of the OGD lifecycle is analyzed and strategies to address them
are suggested.
4. CASE ANALYSIS
4.1 Case 1: The Discontinuation of the care.data Program in UK
4.1.1 Case introduction
In 2013, the long-established National Health Service (NHS) of the UK and the Health and Social Care
Information Centre (HSCIC) initiated a program called care.data, aiming to improve the safety and care of
patients by using information; the program also helped create an extensive health records database, whose
target users include pharmacies, mental health services, opticians, dentists, and education and training
institutions, and that will eventually support all healthcare facilities. The data have been anonymized and
only cover the patient’s age range, gender and area of residence, but in exceptional circumstances, such
Data Intelligence 7
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
as during a pandemic, a researcher can apply to the Minister of Health for the removal of these privacy
protections. Researchers believe the data will help them develop new treatments and evaluate NHS
services. However, the “care.data” project, surprisingly, does not run smoothly. In February 2014, the NHS
acknowledged a serious crisis of confidence regarding the “care.data” project, and informed family doctors
to postpone the uploading of patient data for up to six months. In the fall of 2014, the NHS decided to
endeavor four new pilots to collect medical health data for two million patients, but the first pilot was not
officially launched until June 2015. Due to poor communication with the public, such as the absence of
press conferences, and the failed delivery of brochures to families, the NHS’s data collection was collectively
denounced by patients, doctors, the British Medical Association, the privacy campaign group Big Brother
Watch and the Association of Medical Research Charities; consequently, one million people withdrew from
the program. Under enormous pressure, the care.data was stopped by the NHS on July 6, 2016.
4.1.2 Risk identification of case 1
The NHS is an important source of population data. The care.data is a typical example of government
data collection and utilization. Through the analysis of data collected from online news reports, blogs,
comments, etc, the following risks are identified:
1). Privacy leakage risk. Health data are of a personal and sensitive nature. Mudie, an opponent of
care.data, said: “The human cost to the patient whose identity and medical history are made public
is potentially disastrous. Careers could be ended, jobs lost, insurance refused and relationships
destroyed if sensitive medical facts are made public or used by private firms, other people or, indeed,
the media. “ [50]. Although the patient data collected by the care.data had been anonymized, data
users typically used open data in conjunction with the holder’s closed data and accessible data, and
the public was worried that malicious analyzers could use specific techniques to identify the patient’s
private information [51].
2). Risks from implicit operational specifications. A lack of explicit operational specifications has
led to various difficulties in the opening and collection of data. The accusations from the public are:
“The ambiguous standards for obtaining health data poses a risk of trust between doctors and patients
[52].” “The regulations over data access, data inspection and balance are not yet established or
implemented [53],” etc. The NHS did not systematically organize and process the collected data
sets, leading to perplexed usage and potential safety hazards when cooperating with other agencies.
3). Risks of improper data use, especially in cooperation with commercial organizations. The focus of
the public concern is that sharing sensitive medical information with commercial companies can be
risky without the explicit consent of patients. The care.data hands over coded patient data to the
insurance industry to help actuaries calculate the average premiums, but the public believe that it
may identify an individual’s tendency to become ill, causing the insured to be biased against when
attempting to buyinsurance. In the cooperation between NHS and Google’s DeepMind, the public
is also worried that commercial organizations will use patients’ private data for profit. “Any effort to
ignore the use of data and only discuss open data policy will fail as they are faced with the disorder
in reality and compromise in practice [53].”
8 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
4). Data quality risks induced by data collection methods, such as incompleteness, distortion,
inaccuracies, etc. For example, care.data requires general practitioners to collect patient data;
however, as a result, data pertaining to young and healthy males may be missing. Some people
obtain the prescription drugs but flush them down the toilet when unmonitored, causing distortions
in the drug’s performance data, etc.
5). Risks from immature techniques. One of the reasons for the public’s opposition to care.data is
the lack of technical sophistication. Sheila Bird, a professor of statistics of Strathclyde University,
stated: “Data-sharing as proposed by care.data was disastrously incompetent – both ethically and
technically. Professionals rebelled and prevailed in out-casting care.data, thereby ensuring that future
proposals will not succeed unless both technically proficient and in the public interest.” [54]
6). The risk of public trust crisis prompted by the immaturity of government regulation. Of all the
objections against care.data, many could be regarded as a function of the public trust crisis caused
by a flawed government supervision system. For example, “when you propose to share our most
confidential medical records, ambiguous promises and fictitious regulatory frameworks are disturbing
to the public “ [50].
7). The risk of poor communication. Huge external pressures on care.data stemmed from lack of
publicity and ineffective communication. While the project is valuable, it requires the understanding,
support and cooperation from citizens and other organizations. In November 2014, the All Party
Parliamentary Group in the British Parliament investigated the care.data project, accusing it of lacking
transparency and poor publicity, which eventually led to its failure.
8). The risk of unsustainable funding for public communication. Between October 2013 and 2014,
the NHS announced a £ 2 million to publicize the care.data project to the general public, but the
actual cost of advocacy was merely £ 1 million. The survey found that less than one-third of the
public received publicity brochures because of insufficient investment in public communication.
This eventually led to the project being forced to cease due to tremendous external pressure
9). The risk of oversized external pressures. After the care.data program was announced in early
2013, some privacy organizations launched the Medical Data Confidentiality Initiative,calling
attention to security risks in the use of medical data. Since then, these organizations have applied
significant pressure to initiatives in the collection of medical data.
10). The risk of unprofessional information governance. In January 2015, the NHS’s Independent
Information Governance Oversight Panel released a report indicating that the initial commitment to
the care.data project was not completed, and partly because of the lack of experts in information
governance.
4.2 Case 2: The IRS Data Breach in the U.S.
4.2.1 Case introduction
In 2015, a data breach took place in the Internal Revenue Service (IRS) of the United States [55]. With
the help of companies such as IBM, the IRS set up a complex data platform networked with other government
agencies. The site has an application called “Get Transcript” that allows citizens to easily access to previous
Data Intelligence 9
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
tax records. In 2015, hackers illegally accessed about 724,000 taxpayers’ tax returns via the Get Transcript
app. This malicious behavior was not detected until three months later. After the investigation, J. Russell
George, the Treasury Inspector General for the tax administration, accused the IRS of not deploying the
web systems according to the requirements and recommendations, that caused serious data breaches in
the event of reduced staffing and increased capital input [56].
4.2.2 Risk identification of case 2
1). The risk of hacking. Hacking is a constant threat to government applications, and its immediate
consequences are leakages of privacy, trade secrets or even the security information of national
infrastructure.
2). The risk of poorly implemented government regulations. After the implementation of the My
Data project, the U.S. Open Data Action Plan established clear requirements regarding network and
data security systems for government departments, but the IRS did not follow these requirements
seriously, resulting in a colossal security breach.
3). The risk of poor communication and cooperation between departments. After the data breach
event, the IRS, the Treasury Department and other departments were at odds with each other. The
poor communication and cooperation between them resulted in an inadequate response to the risk.
4). The risk of delayed system updates and maintenance. The maintenance and update of an open
system platform must be carried out periodically; otherwise, security risks are likely. In this case,
some IRS applications were outdated and had many security vulnerabilities; this was coupled with
poor maintenance of the systems and platforms. Together, these factors were vulnerable to hacking,
which remained undetected for three months.
5). The risk of unsustainability in capital investment. According to the IRS, an increasing number
of government information systems security services are being outsourced; with the delay of at least
$ 400 million in investments due to IT budget cuts, many of the operations, including the maintenance
and replacement of old IT systems, are affected, thereby resulting in system failure and security
loopholes.
4.3 Case 3: The Tardy Progress of the OGD Program in China
During the short history of OGD in China, no major crisis has taken place yet. Therefore it is impossible
to have an event as a case study that exhibits the lifecycle. Instead, we only investigate the relevant risk
factors by examining a representative city government. Between December 2016 and May 2017, we
interviewed seven government departments in Shanghai, including the Human Resources and Social
Security Bureau, the Agriculture Commission, the Food and Drug Administration, the Audit Bureau, the
Health and Family Planning Commission, the Trade and Industry Bureau, and the Planning and Land
Resources Administration. According to the analysis of interview data, the risk concerns of the departments
related to OGD can be summarized into the following six aspects:
10 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
1). Risk of data distortion. Government statistical data, though eye-catching and frequently used by
the public, are prone to distortion in the layer-by-layer hierarchical reporting system; this has become
a systemic problem. One interviewee said: “It is not that our attitude to OGD is inactive, but that
the authenticity of data reported by the subordinate departments cannot be guaranteed” In January
2016, the Eighth Inspectorate of the Central Discipline Inspection Commission stated in their
feedback to the Party Committee of the National Bureau of Statistics: some leading cadres sought
personal gain by “statistics”, with rent-seeking through power [57].
2). Risk of low data quality. Given that some government departments’ data are collected from
enterprises or reported by subordinate departments and other diversified channels, some data are
thus flawed with inconsistent format, unstandardized metadata, incompleteness and other quality
problems, which forestalls some departments from engaging in OGD pilots. One respondent noted:
“The data collected directly from medical institutions and health administration bureaus at county
or district level were found countless quality problems. We’ve been doing data cleaning for 2-3
consecutive years, so a lot of data have not yet been released.”
3). Risk of data aggregation. Although open data has been anonymized and desensitized, some
departments are still worried about the existence of security risks for collective data release. One
interviewee said: “The bulk of our data are disease information or disease prevention information,
if released, it implicates personal privacy.”
4). Risk of undefined operational norms. Some departments expressed that due to the lack of
technical standards and codes of practice, they are quite at sea for some problems in open data.
One interviewee suggested that “the scope of disclosure also needs to be stipulated. We are willing
to participate in the construction and also want to provide useful data, but the specification of the
application needs to be well defined.”
5). Risk of imperfect mechanisms. The lack of supervision, feedback and incentive mechanism led to
the slow progress of the OGD project. On account of the scanty feedback on the use of open data,
the enthusiasm of some departments for OGD has been dampened. One interviewee said: “We are
in want of a mechanism for feedback and supervision, we would like to know where and to what
extent the data we provide has realized its value. All these require feedback, which is also a positive
incentive for us.
6). Risk of data value-sparseness. Some departments want to know if their data are really useful to
the public. “Every year we provide a lot of open data to the community (for social services), and a
great deal of work has been done, but I am still a stranger to the data they (neighborhoods) are using
and how they use it. “ “We’d love to know who is using the data.”
5. RISK INTEGRATION AND CLASSIFICATION
Through iterative comparison and merging, the risks identified from the three cases are integrated into
14 types of risks: content legitimacy, data quality, data value, data management, platform support, information
security, organization support, resource input, institutional support, business process, use, imperfect
regulations and standards, scarce external resources and external pressures. The 14 types of risks are then
further categorized based on their sources and consequences respectively.
Data Intelligence 11
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
5.1 Classification Based on the Risk Source
According to the source, the 14 types of risks are categorized into five classes: data (related) risks,
technology (related) risks, management (related) risks, utilization (related) risks and external environment
(related) risks.
5.1.1 Data risks
Most of the data published on OGD websites are submitted by various government departments, while
a small part of them are created by the OGD department itself. These data vary in their attributes accordingly
with the clear-cut department functions. Some attributes may lead to uncertain consequences if the data
are opened. For instance, some data concern citizens’ privacy, and their inappropriate disclosure may bring
harm to the individuals involved. These risks arising from data attributes are categorized as data risk, which
is mainly related to the legitimacy of the open content, the data quality, and the data value.
The risk in the legitimacy of open content refers to the possibility that the open contents do not comply
with laws, regulations, or other social norms. The risk of low data quality includes the possible distortion,
error, obsolescence, or incompleteness of data contents, format chaos, absent links or bad metadata. This
kind of risks may bring inestimable losses to data users or damage the government’s credibility. The main
problems lie in the phase of data collection and pre-processing. Some desensitized data also suffer from
quality degradation after de-identification. The risk of low data value refers to the possible investment loss
of resources due to opening data that have trivial value, have no clear users, or are rarely used.
5.1.2 Technology risks
Technology risks result from inadequate technical capacity and tardy understanding of new technologies,
such as data management, platform support and information security, etc. Technology problems in data
management may lead to unsatisfactory effects of OGD, such as the use of outdated techniques for data
collection, improper methods for data cataloging, unstandardized metadata and improper data formats that
encumber analysis and utilization, etc. The low capacity and delayed updates of the OGD platform bring
disastrous consequences, as evinced in case two. Because of the loopholes in the information security
technologies of OGD platforms, worrisome consequences, such as privacy leakage, data tampering and
corruption, platform damage or falsely authorized certification, may occur.
5.1.3 Management risks
Management risks are caused by problems that exist in OGD-related organizational structures, business
processes, management styles, or mechanism designs; they comprise the following four types: organizational
risks, resource input risks, institutional risks and business process risks. The organizational problems may
exist in the lack of special position setting, the unbalanced allocation of power and responsibility between
the data provider and receiver, poor inter-departmental communication and cooperation, weak OGD
promotion strategies, and poor management of business outsourcing, etc. Insufficient investment of talents
12 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
or funds in OGD-related businesses may directly hinder the promotion of OGD. Incomplete management
rules, business guidelines, incentive mechanisms or ill-conceived business processes upon OGD may also
bring unwanted consequences.
5.1.4 Utilization risks
Utilization risks arise from misuse, malicious or improper use, or insufficient use of the opened data,
which may result from insufficient literacy or capacity of OGD users. This may result in erroneous decision
making, invasion of privacy, or insufficient exploitation of data value.
5.1.5 Environment risks
Environment risk refers to a harmful impact upon the process, mode, or outcome of an OGD program
due to the limitations of current institutions, resources, or public expectations, including: (1) Risk of
imperfect regulations and standards. Some obstacles to OGD may be attributable to imperfect laws or
regulations, inoperable technical standards, or rigid government administration systems. (2) Risk of scarce
external resources, e.g. the lack of data governance experts, insufficient talent supply, or immature knowledge
of OGD. (3) Risk of excessive external pressure. As shown in case one, online negative opinions of pressure
groups exerted significant pressure on the initiative of OGD.
5.2 Classification Based on the Risk Consequence
Judging from the consequences, the aforementioned 14 subtypes of risks can be divided into two classes:
risks to OGD and risks from OGD. The risks to OGD may hinder the smooth running of an OGD program
but will not undermine the legitimacy of OGD itself. These risks comprise10 sub-categories: risks in data
management, information security, platform support, organizational adjustment, resource investment,
institutional provision, business process, imperfect regulations and standards, scare external resources, and
excessive external pressures. The risks from OGD refers to the possible negative effects caused by OGD,
which may lead to challenges to the legitimacy of the OGD, including illegal contents, low data value,
poor data quality and improper data utilization. The risks from OGD may lead to a sceptical attitude to
OGD initiative, while the risks to OGD may influence the success of OGD program. The fear of both may
become the actual barriers to OGD.
5.3 A Holistic Taxonomy Model of Risks Associated with OGD
Based on the analysis above, a holistic taxonomy model of OGD related risks is constructed, as shown
in Figure 2. First, all risks identified from the three cases are listed together and compared. Those that are
similar are clustered and categorized into an upper class according to their sources and consequences
respectively. Next, the correspondence between the source-based and consequence-based classifications
is analyzed. Finally, a holistic taxonomy model of OGD related risks is constructed. This model can help
government departments and the public better understand OGD related risks and thus devise more efficient
response strategies.
Data Intelligence 13
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
Figure 2. A taxonomy model of OGD related risks.
6. RISK DISTRIBUTION OVER THE LIFE CYCLE OF OGD
6.1 Distribution of Risks from five Sources
Using a content analysis for the qualitative data of the three cases, the life-cycle distributions of risks
from five sources are revealed, as shown in Table 1. The frequency of each type of risk occurring at every
stage of the OGD lifecycle is calculated. At different stages, the distributions of the 14 types of risks are
shown in Figure 3. Among the five stages, the data collection stage is risk intensive. This implies that an
appropriate risk management plan should be made before the OGD program begins.
14 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
Table 1. The life cycle distribution of OGD-related risks.
Risk Collection Organiza-
tion Release Utilization Mainte-
nance Sum
Risks
from
OGD
Data risks Legitimacy 3
Data quality  3
Data value 5
Utilization risk Use 1
Risks to
OGD
Technology risks Data management  5
Platform support  5
Information security  5
Management risks Organizational
management
 4
Resource input  5
Institutional  5
Business process  4
Environment risks Imperfect regulations
and standards
 5
Scarce external
resources
 3
Excessive external
pressures
3
Sum 13 11 11 10 11 56
Figure 3. The life cycle distribution of 14 types of OGD related risks.
Data Intelligence 15
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
6.2 The Occurrence Stages of 14 Types of Risks
A radar chart is made to show the number of the occurrence stages of 14 risks. As shown in Figure 4,
the risks that may occur at all five stages include those related to data value, data management, platform
support, information security, resource input, institutional risk, and flawed regulations and standards. This
indicates the importance of a strong technical platform and a sound management system for the success
of the OGD program. The risks that may occur at four stages include organizational and business process
risk, which still imply the importance of OGD project management. The risks that may occur at three stages
are those related to content legitimacy, data quality, the scarcity of external sources and external pressures.
Among the risks that occur at all five stages, only the risk related to data value is a risk from OGD. This
means that, except their natural characteristic, the value of the open data should be improved at every stage
of the OGD lifecycle. The other three types of risks from OGD emerge at three or fewer stages. The risk of
data use only occurs at the utilization stage. This implies that a specific type of risk that may be brought
by OGD should be controlled or avoided at a few specific stages. Most of the risks to OGD could be
mitigated by improving project management and a few of them could be avoided by strengthening
regulations and technical standards, e.g. the enforcement of EU GDPR.
335
5
5
1
5
4
5
5
4
5
3
3
0
1
2
3
4
5
Content legality
Data quality
Data value
Data management
Plaorm support
Use
Informaon security
Organizaonal
Resource input
Instuonal risk
Business process
Imperfect regulaon
& standard
Scarce external
resources
Excessive external
pressure
Occurrence stages
Occurrence frequency
Figure 4. The number of the occurrence stages of 14 types of risks.
16 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
7. RISK MANAGEMENT STRATEGIES FOR OGD PROJECTS
Risks from OGD and risks to OGD have different sources and consequences and are distributed at
different stages in the OGD lifecycle. Strategies in response to them are suggested as follows:
7.1 Response Measures for Risks from OGD
(1) Data governance strategy for the risk in content legitimacy
The legitimacy of the open content is the first concern of many government departments, and has even
become the default explanation for any deficiency in data openness. In response to this kind of risk, the
government must accelerate the enactment of related laws and regulations (e.g. privacy protection act, and
data security act), establish examination criteria for privacy-involved data and build an effective data
governance system to ensure the legitimacy of open contents. In May 2018, the GDPR was enforced by
the EU, empower people with more control over their personal data collected by all companies operating
in the EU [45]. In the Nanhai district of Foshan city in Guangdong province, China, a data governance
committee was established to determine the legitimacy of open data [30].
(2) Value appraisal and feedback mechanism for the risk in data value
Among the departments that are inactive to data openness, some are unconvinced of the value of the
data to be released. To ensure the value of data that have yet to be disclosed, it is necessary to establish a
mechanism for data value appraisal mechanism. Further, a feedback mechanism should be set up to inform
the data providing departments about the results of data utilization, thus giving them sufficient information
to determine the priorities for future releases.
(3) Quality management and data provenance strategy for the risk in data quality
In response to the risk in data quality, guidelines and evaluation standards for stable data formats and
metadata, secure platform and data provenance are needed to ensure the authenticity, integrity and usability
of data at each stage of the life cycle of OGD program. For instance, the OPEN Government Data Act of
the U.S. requires federal agencies to publish their information online based on an underlying open standard
that is maintained by a standards organization [58].
(4) Legislation strategy for the risk in data utilization
In response to data utilization risks, it is necessary to clearly stipulate the purpose, means, scope, and
results of data utilization by speeding up the legislation, and mete out legal penalties for the malicious use
of data.
7.2 Improving Risk Management of OGD Project
(1) Through the lifecycle of the risk itself
During the different periods of the risk lifecycle, different strategies are needed to warn, evaluate and
respond appropriately. [59] proposed the establishment of a data risk warning mechanism, an internal
control mechanism and the fostering of risk coping ability. (1) During the period of latent risk, government
Data Intelligence 17
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
departments must actively analyze and identify potential risks, make efforts to avoid and transfer them, and
establish reserve countermeasures for use in times of crisis. In this phase, although the government of
Shanghai had foreseen possible adverse consequences at the early stages of the OGD program, they were
not able to formulate a complete risk response plan because they lacked an understanding of OGD-related
risks. (2) In the risk occurrence phase, it is necessary to initiate the emergency plan immediately and take
effective measures to minimize losses as much as possible, including timely cut-loss, communication
enhancement and winning public understanding. Due to the failure of taking measures in time to pledge
public understanding since 2 million publicity funds were not invested as originally planned, the care.data
initiative of UK had to be stopped under public pressures [50]. (3) After the risk occurs, an emergency plan
should be initiated immediately to minimize any adverse effects. Due to its limited capacity for crisis
management and poor inter-departmental communication, the IRS failed to spot and solve the problem in
time after the hacker invasion, and allowed the data-theft to last undetected for three months [60].
(2) Through the life cycle of the OGD project
At all stages of the life cycle of an OGD project, it is necessary to implement corresponding risk control
strategies.
1). At the stage of data collection, a complete OGD plan is needed. A detailed handbook should be
developed to set the data open scope, conditions and technical means, especially the security level
of data involving secrets.
2). At the data organization stage, a set of operable technique standards is needed so as to ensure
trustworthy data quality, such as metadata, data preprocess, data catalogue, data documentation,
etc.
3). At the stage of data release and utilization, detailed regulations should be made to ensure proper
data use. Besides, emergency plans should be formulated in response to possible privacy leakage,
malicious use, hacker and virus and weak publicity.
4). At the maintenance stage, plans should be made to prevent unsustainable resource investment to
ensure the smooth progress of the project.
8. CONCLUSION AND DISCUSSION
Risk aversion is a potential factor of resistance to OGD, which stems largely from the lack of understanding
rather than their uncontrollability. The construction of a taxonomy model of risk can help government
departments understand them better, and thus avoid conservative inaction. By conducting a cross-case
analysis on three cases of OGD in the UK, the U.S., and China, this study identified 14 types of risks
associated with OGD. According to the risk source, the 14 types of risks are classified into five classes:
data, technology, management, utilization, and environmental risks. According to the risk consequence,
the 14 types of risks are classified into two groups: risks to OGD and risks from OGD. A holistic taxonomy
model of OGD related risks is then constructed. Based on the model, the distribution of each type of risk
across five stages of the OGD life cycle is analyzed. It is found that the stage of data collection is risk-
intensive and seven types of risks may emerge at all five stages. It is also found that most of the risks to
18 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
OGD could be avoided or mitigated by improving risk management throughout the lifecycle of OGD
project. In response to the risks from OGD, it is necessary to improve data governance, data value appraisal
and feedback, quality management, and data provenance and speed up legislation.
Among the 14 types of identified risks, some have been discussed in previous studies, such as privacy
leakage [61, 62], low data quality [17, 63], and improper use [64] etc, but beyond these, this study also
identifies several new risks associated with OGD, including those concerning content legitimacy, data
value, weak data management, excessive external pressures, institutional flaws, external resource scarcity,
etc. Further, this study also makes the following contributions to this field: firstly, it distinguishes the risks
to OGD from the risks from OGD, and thus differentiates the legitimacy of OGD from its smooth realization.
This has a theoretical implications on deepening the understanding of OGD related risks and supporting
the rationality of OGD initiative. Secondly, this study identifies OGD related risks at different stages of the
life cycle of OGD program from three cases in different countries and builds a holistic taxonomy model
after clustering them. This fills the gap left by the deficiencies of previous studies that focused on a single
country and present the risks fragmentarily. Third, this study analyzes the distribution of each type of risk
over five stages of the lifecycle of OGD and suggests specific strategies in response to them. This has
practical implications for government departments that are promoting OGD.
The limitations of this study include: (1) Due to the authors’ geographical limitations, the data for the
cases of the UK and the U.S. are mainly collected from online sources. Although Google renders a
comprehensive search, the absence of interviews with related government officials has prevented the
researchers from directly assessing the attitudes of government staff on this point. To make up for this
limitation, several rounds of in-depth interviews were conducted in the case of Shanghai. (2) The taxonomy
model of OGD related risks is built with a bottom-up induction approach from three cases. Although it can
deepen the understanding of the OGD related risks, it might have not involved all the potential risks that
face other government departments. In the future, we will extend our approach to improve the taxonomy
model of OGD-related risks with more cases and to empirically test the correlation between risk management
and OGD performance.
AUTHOR CONTRIBUTIONS
F. Wang (wangfangnk@nankai.edu.cn) designed the whole framework, examined the results of data
analysis and wrote the final paper. A. Zhao (zhaoanstudy@163.com) collected most of the data of the first
two cases and made preliminary data analysis. H. Zhao (zhaohong@mail.nankai.edu.cn) analyzed part of
the data and revised the draft. J. Chu (chujun620@163.com) took part in all the interviews and transcribed
the records.
Data Intelligence 19
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
ACKNOWLEDGEMENT
We would like to express special thanks to the reviewers and editors for their valuable comments, as
well as the interviewees and news commentators for their insightful viewpoints. We are grateful to Jiayue
Ma, Wei Zhao, the master students, Xiaoyu Wang, Weichong Zhang, Jing Yang, the doctoral students of
Nankai University, and Yichen Zhang, the master student of UCSD for their helps with data collection and
figure drawing.
This work has been funded by the project of National Engineering Laboratory of Big Data Application
Technology for Improving Government Governance Capability: “The Large-scale Intelligent Government
Document Processing Technology based on NLP and Deep Learning” and “Improving the Governance
Capability of the Government with Big Data”; the project of National Social Science Fund of China
“Network Society Governance in China” [granted number: 14ZDA063], and the project of National Natural
Science Fund of China: “Research on the Organization and Mode of Modern Social Governance” (grant
number: 71533002).
REFERENCES
[1] S. Dawes. Stewardship and usefulness policy principles for information-based transparency. Government
Information Quarterly 27(4) 2010, 377–383. doi: http://dx.doi.org/10.1016/j.giq.2010.07.001
[2] J. Attard, F. Orlandi, S. Scerri, et al. A systematic review of open government data initiatives. Government
Information Quarterly (32)4 2015, 399–418. doi: http://dx.doi.org/10.1016/j.giq.2015.07.006
[3] M. Janssen, & A. Zuiderwijk. Infomediary business models for connecting open data providers and users.
Social Science Computer Review 32(5) 2014, 694–711. doi: http://dx.doi.org/10.1177/0894439314525902
[4] OECD. Open government data, 2017. Available at: http://www.oecd.org/gov/digital-government/open-
government-data.htm.
[5] OGP. Global summit, 2016. Available at: https://en.ogpsummit.org/osem/conference/ogp-summit.
[6] DMG. China Open Data Index, 2018. Available at: http://ifopendata.fudan.edu.cn/.
[7] SPARC 2019. Passed into Law: OPEN Government Data Act (S.760/H.R. 1770). Available at: https://
sparcopen.org/our-work/open-government-data-act/.
[8] C. Martin. Barriers to the open government data agenda: Taking a multi-level perspective. Policy & Internet
6(3) 2014, 217–240. doi: http://dx.doi.org/10.1002/1944-2866.POI367
[9] P. Conradie, & S. Choenni. Exploring process barrier to release public sector information in local govern-
ment. In Proceedings of the ICEGOV, 2012, pp. 5–13. doi: http://dx.doi.org/10.1145/2463728.2463731
[10] J. Zhang, S. Dawes, & J. Sarkis. Exploring stakeholders’ expectation of the benefits of barriers of e-government
knowledge sharing. The Journal of Enterprise Information Management 18(5) 2005, 548–567. doi: http://
dx.doi.org/10.1108/17410390510624007
[11] A. Zuiderwijk, K. Jeffrey, & M. Janssen. The potential of metadata for linked open data and its value for
users and publishers. Journal of Electronic Democracy and Open Government 4(2) 2012, 222–244. doi:
http://dx.doi.org/10.29379/jedem.v4i2.138
[12] A. Zuiderwijk, M. Janssen, S. Choenni, R. Meijer, & R.S. Alibaks. Sociotechnical impediments of open data.
Electronic Journal of e-Government (10) 2012, 156–172. Available at: http://www.ejeg.com/issue/download.
html?idArticle=255
20 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
[13] M. Janssen, Y. Charalabidis, & A. Zuiderwijk. Benefits, adoption barriers and myths of open data and open
government. Information Systems Management 29(4) 2012, 258–268. doi: http://dx.doi.org/10.1080/10580
530.2012.716740
[14] Y. Zhao, & B. Fan. Exploring open government data capacity of government agency: Based on the resource-
based theory. Government Information Quarterly 35(1) 2018, 1–12. doi: http://dx.doi.org/10.1016/j.giq.
2018.01.002)
[15] S. Martin, M. Foulonneau, S. Turki, & M. Ihadjadene. Risk analysis to overcome barriers to open data.
Electronic Journal of e-Government (11) 2013, 348–359. Available at http://www.ejeg.com/issue/download.
html?idArticle=296
[16] R. McLaren, & R. Waters. Governing location information in the UK. The Cartographic Journal 48(3) 2011,
172–178. doi: http://dx.doi.org/10.1179/000870411X13044121958902
[17] S. Kubler, J. Robert, S. Neumaier, J. Umbrich, & Y.L. Traon. Comparison of metadata quality in open data
portals using the analytic hierarchy process. Government Information Quarterly 35(1) 2018, 13–29. doi:
http://dx.doi.org/10.1016/j.giq.2017.11.003
[18] A. Vetrò, L. Canova, M. Torchiano, C.O. Minotas, & F. Morando. Open data quality measurement framework:
Definition and application to open government data. Government Information Quarterly 33(2) 2016,
325–337. doi: http://dx.doi.org/10.1016/j.giq.2016.02.001
[19] L. Zheng, & F. Gao. China open government data platform: Framework, current situation and advices.
E-Government (7) 2015, 8–16.
[20] Y. Cao. Government open data survival status: Investigation report on 19 local governments. Library and
Information Service (14) 2016, 94–101. doi: 10.13266/j.issn.0252-3116.2016.14.011
[21] A. Whitmore. Using open government data to predict war: A case study of data and systems challenges.
Government Information Quarterly 31(4) 2014, 622–630. doi: http://dx.doi.org/10.1016/j.giq.2014.04.003
[22] S. Choenni, J. van Dijk, & F. Leeuw, Preserving privacy whilst integrating data: Applied to criminal justice.
Information Polity, (15)2010, 125–138. doi: http://dx.doi.org/10.3233/IP-2010-0202
[23] X. Zhu. The failure of an early episode in the open government data movement: A historical case study.
Government Information Quarterly 34(2) 2017, 256–269. doi: http://dx.doi.org/10.1016/j.giq.2017.03.004
[24] P. Conradie, & S. Choenni. On the barriers for local government releasing open data. Government Informa-
tion Quarterly 31(S1) 2014, S10–S17. doi: http://dx.doi.org/10.1016/j.giq.2014.01.003
[25] H.J. Wang, & J. Lo. Adoption of open government data among government agencies. Government Informa-
tion Quarterly 33(1) 2016, 80–88. doi: http://dx.doi.org/10.1016/j.giq.2015.11.004
[26] BBC. Government data site user details leak, 2017. Available at https://www.bbc.com/news/technology-
40443601.
[27] BBC. Fitness app Strava lights up staff at military bases, 2018. Available at: https://www.bbc.com/news/
technology-42853072.
[28] J. Höchtl. Open Government Data - Security risk or mean for threat prevention, 2012. Available at: https://
www.slideshare.net/jhoechtl/open-government-data-security-risk-or-mean-for-threat-prevention.
[29] C.A. Williams, M. Smith, & P.C. Young. Risk Management and Insurance. New York: McGraw-Hill Publishing
Corporation, 1995, pp. 10–80.
[30] F. Wang, & F. Chen. Openness and exploitation of government big data in the course of sate governance.
Chinese Public Administration (11) 2015, 6–12. doi:10.3782/j.issn.1006-0863.2015.11.01
[31] J. Tao, & L. Mei. Big data technology embedding government regeneration risk and its control. Journal of
Tianshui College of Administration 17(1) 2016, 17–20.
[32] K. Granickas. Ethical and Responsible Use of Open Government Data. European Public Sector Information
Platform Topic Report, 2015, No.2.
Data Intelligence 21
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
[33] T. Scassa. Privacy and open government. Future Internet (6) 2014, 397–413. doi: http://dx.doi.org/10.3390/
fi6020397
[34] L. Weng, & Y. Li. The opening and sharing of governmental big data: A study on the conditions, obstacles and
basic Principles. Comparative Economic and Social Systems (2) 2016, 113–122.
[35] C. Jiang. Big data risk assessment of national cyber security. China Information Security (5) 2015, 53–54.
doi: 10.3969/j.issn.1674-7844.2015.05.041
[36] J. Kucera, & D. Chlapek. Benefits and risks of open government data. Journal of Systems Integration 5(1)
2014, 30–41. doi: http://dx.doi.org/10.20470/jsi.v5i1.185
[37] R. Davidson. Open data, big data, public trust and risk at the Office for National Statistics. Beyond Infotopia:
Contextualising Risk, Openness and Transparency in the Information Age, Society for Risk Analysis Europe
Conference, 2016.
[38] R. Kitchin. Big data and official statistics: Opportunities, challenges and risks, 2015. Available at: http://
eprints.maynoothuniversity.ie/7231/1/PC.
[39] S. Dawes. A realistic look at open data, 2012. Available at: https://www.w3.org/2012/06/pmod/pmod2012_
submission_38.pdf.
[40] X. Liu, W. Sun, & L. Zheng. Potential risks of government open data and countermeasures: talk Shanghai as
an example. E-government 77(9) 2017, 22–30. doi: 10.16582/j.cnki.dzzw.2017.09.003
[41] National Audit Office. Managing risks to improve public services, 2004. Available at http://www.nao.org.uk.
[42] COSO. Internal control - integrated framework executive summary, 2013. Available at: https://www.coso.
org/Documents/990025P-Executive-Summary-final-may20.pdf.
[43] J. Liu. Tax risk control in big data era. Journal of Hunan Tax College (6) 2014, 19–21. doi: 10.3969/j.issn.1008-
4614.2014.06.006)
[44] J. Titcomb. Data protection bill: How will the new laws affect you, 2017. Available at: http://www.telegraph.
co.uk/technology/0/data-protection-bill-will-new-laws-affect/.
[45] GDPR. General Data Protection Regulation (GDPR). Available at https://gdpr-info.eu/.
[46] A. van Veenstra, & T. van den Broek. A Community-driven Open Data Lifecycle Model Based on Literature
and Practice. In: Boughzala I., Janssen M., Assar S. (eds) Case Studies in e-Government 2.0. Springer, Cham,
2015. doi: http://dx.doi.org/10.1007/978-3-319-08081-9_11
[47] Y. Lu, & J. Lu. Project Risk Management. Beijing: Tsinghua University Press, 2001.
[48] R. Huang, & T. Lai. Barriers of open government data in China from the perspective of data life cycle.
Information studies: Theory & Application 41(2) 2018, 7–13. doi: 10.16353/j.cnki.1000-7490.2018.02.002
[49] R.K. Yin. Case study research. Thousand Oaks, CA, Sage Publications, 1994.
[50] British Journal of Healthcare Computing. More care.data problems raised, 2014. Available at: http://www.
hitcentral.eu/british-journal-healthcare-computing/more-caredata-problems-raised.
[51] B. Goldacre. Care data is in chaos: It breaks my heart, 2014. Available at: https://www.theguardian.com/
commentisfree/2014/feb/28/care-data-is-in-chaos.
[52] L. Presser, M. Hruskova, H. Rowbottom, & J. Kancir. Care.data and access to UK health records: patient
privacy and public trust, 2015. Available at: https://techscience.org/a/2015081103/.
[53] G. Sollazzo, & D. Miller. Open data in the health sector: Users, stories, products and recommendations,
2017. Available at: http://openhealthcare.org.uk/open-data-in-the-health-sector/.
[54] D. Gray. Life science sector welcomes new guidance on NHS medical data sharing, 2016. Available at:
https://www.medicalplasticsnews.com/news/life-science-sector-new-guidance-on-nhs-medical-data/.
[55] J. Heckman. IRS: Frequent data breaches make it ‘fundamentally more difficult’ to verify taxpayers, 2018.
Available at: https://federalnewsnetwork.com/cybersecurity/2018/09/irs-frequent-data-breaches-make-it-
fundamentally-more-difficult-to-verify-taxpayers/
22 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
[56] A. Rappeport. Up to 100,000 taxpayers compromised in Fafsa Tool Breach, 2017. Available at: https://www.
nytimes.com/2017/04/06/us/politics/internal-revenue-service-breach-taxpayer-data.html.
[57] Z. Lin. National Bureau of Statistics staff uses ‘data’ to Seek private property for rent, 2016. Available at:
http://china.caixin.com/2016-02-04/100907474.html.
[58] OPEN Government Data Act 2019. Data Coliation. Available at: https://www.datacoalition.org/open-
government-data-act/
[59] Y. Xia. Analyzing on the risks and risk management of open government data. Journal of the China Society
for Scientific and Technical Information (36) 2017, 18–27.
[60] E. McKee. IRS data breach allows hackers to steal $30 million from taxpayers, 2017. Available at: https://
www.atr.org/irs-data-breach-allows-hackers-steal-30-million-taxpayers.
[61] L. van Zoonen. Privacy concerns in smart cities. Government Information Quarterly 33(3) 2016, 472–480.
doi: http://dx.doi.org/10.1016/j.giq.2016.06.004
[62] J.J. Zhao, & S.Y. Zhao. Opportunities and threats: A security assessment of state e-government websites.
Government Information Quarterly 27(1) 2010, 49–56. doi: http://dx.doi.org/10.1016/j.giq.2009.07.004
[63] B. Fan, & Y. Zhao. The moderating effect of external pressure on the relationship between internal organiza-
tional factors and the quality of open government data. Government Information Quarterly 34(3) 2017,
396–405. doi: http://dx.doi.org/10.1016/j.giq.2017.08.006
[64] S. Dzazali, A. Sulaiman, & A.H. Zolait. Information security landscape and maturity level: Case study of
Malaysian Public Service (MPS) organizations. Government Information Quarterly 26(4) 2009, 584–593.
doi: http://dx.doi.org/10.1016/j.giq.2009.04.004
Data Intelligence 23
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
AUTHOR BIOGRAPHY
Fang Wang (Corresponding Author) is a professor of library and information
Science in the Business School and the director of the Center for Network
Society Governance of Nankai University, China. She received her Ph.D
degree from Peking University. She has presided more than 20 projects of
NSFC and other foundations and published more than 100 papers in Chinese
and English as well as 10 books. Her research interests include government
information management and knowledge discovery. Email: wangfangnk@
nankai.edu.cn.
An Zhao is currently working in a government agency of Beijing. She received
her master degree in archive science from Nankai University. Her research
interest is e-government. Email: zhaoanstudy@163.com.
Hong Zhao is currently a Ph.D. student in Department of Information
Resource Management, Business School, Nankai University, China. He
received his M.S. degree in Information Science in 2008 from Nankai
University, China. His research interest is Government Information Resources
Management and Intelligent Processing. Email: zhaohong@mail.nankai.edu.cn.
24 Data Intelligence
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
Jun Chu is currently working in a government agency of Tianjin. She received
her Master degree in Archive Science in 2018 from Nankai University. Her
research interest is government data sharing. Email: chujun620@163.com.
Chapter
Full-text available
The rhetoric related to benefits of Open Government Data (OGD) seems to lack anchor in practice affecting practitioners and empirical evidence restraining academia. This rhetoric could be hard to see for those already persuaded. As such, the rhetoric could contain inconsistencies that are based more on myths than facts, contributing to the slow pace of OGD development. OGD is sometimes based on dogmatic rhetoric that is overly simplistic, which hides significant benefits and blocks potential audiences from seeing the practical applications of OGD. The purpose of the present study was to analyse the persuasiveness of present OGD arguments from a rhetorical perspective to identify rhetorical patterns. We conducted desktop research, investigating the rhetoric of eight websites emphasising OGD benefits. Our findings include four common patterns of the rhetoric involving persuasion and dissuasion. The rhetoric contains paradoxes of promises and discoveries, which we categorised as the grand quest, promised opportunities, tribal solidarity, and the silver bullet patterns. A further finding was two mythical paradoxes: (1) promises versus discovery and (2) proving while arguing.KeywordsOpen government dataRhetoricPersuasionDissuasionMythical paradox
Article
Full-text available
In recent years, the amount of open data from governments and academic communities have increased rapidly. Open data are expected to promote the transparency and accountability of governments and academic communities, enable public participation, and facilitate digital innovation. However, open data are still facing problems such as unsatisfactory quality, insufficient data governance, increasing preservation cost and ineffective data reuse at present. This panel will invite experts in related fields to discuss the quality, reuse, and governance of open data, and propose feasible solutions from an international perspective. We will verify and promote the action plan in practice, and have more academic discussion with ASIS&T and relevant academic communities.
Article
Scene Graph Generation, which discovers the interaction between pairs of entities in an image, plays a significant role in image understanding. Most recent studies only consider visual features, ignoring the implicit effect of commonsense. We propose a novel model to take the advantage of commonsense knowledge in Scene Graph Generation, named Iterative Visual Relationship Detection with Commonsense Knowledge Graph (IVRDC). IVRDC consists of two modules: a feature module that predicts predicates by visual features and semantic features with a bi-directional recurrent neural network; and a commonsense knowledge module that constructs a specific commonsense knowledge graph for predicate prediction. These two modules roll out iteratively and cross-feed predictions from and to each other. The final predictions are made by taking the result of every iteration into account with an attention mechanism. Experimental results on the Visual Relationship Detection (VRD) dataset and the Visual Genome (VG) dataset demonstrate that our proposed model is competitive.
Article
Full-text available
The open government data (OGD) movement that focuses on government transparency and data reuse did not appear out of thin air. Some early episodes of this social movement can be traced to the early 1990s.This paper presents a historical case study of such an OGD episode, a campaign targeted at a government database called JURIS, initiated by OGD advocates in the early 1990s. JURIS was a legal information retrieval system created by the Department of Justice and used by government employees, which contained federal court decisions (or case law), among many other primary legal materials. Public interest groups and small publishers intended to open up the database for public access and data reuse, but their effort failed and eventually led to the shutdown of the JURIS system. This paper provides a detailed account of the history, analyzes the reasons of the failure, and discusses outcomes of the campaign. Drawing from social movement theories, especially the political opportunity structure, the paper illustrates the complexity of the social political environment surrounding the OGD movement, especially with regard to an important type of government data, primary legal information, in the United States.
Article
Full-text available
The development of big data is set to be a significant disruptive innovation in the production of official statistics offering a range of opportunities, challenges and risks to the work of national statistical institutions (NSIs). This paper provides a synoptic overview of these issues in detail, mapping out the various pros and cons of big data for producing official statistics, examining the work to date by NSIs in formulating a strategic and operational response to big data, and plotting some suggestions with respect to on-going change management needed to address the use of big data for official statistics.
Article
Full-text available
In this paper a framework is constructed to hypothesize if and how smart city technologies and urban big data produce privacy concerns among the people in these cities (as inhabitants, workers, visitors, and otherwise). The framework is built on the basis of two recurring dimensions in research about people's concerns about privacy: one dimensions represents that people perceive particular data as more personal and sensitive than others, the other dimension represents that people's privacy concerns differ according to the purpose for which data is collected, with the contrast between service and surveillance purposes most paramount. These two dimensions produce a 2×2 framework that hypothesizes which technologies and data-applications in smart cities are likely to raise people's privacy concerns, distinguishing between raising hardly any concern (impersonal data, service purpose), to raising controversy (personal data, surveillance purpose). Specific examples from the city of Rotterdam are used to further explore and illustrate the academic and practical usefulness of the framework. It is argued that the general hypothesis of the framework offers clear directions for further empirical research and theory building about privacy concerns in smart cities, and that it provides a sensitizing instrument for local governments to identify the absence, presence, or emergence of privacy concerns among their citizens.
Article
Full-text available
The diffusion of Open Government Data (OGD) in recent years kept a very fast pace. However, evidence from practitioners shows that disclosing data without proper quality control may jeopardize dataset reuse and negatively affect civic participation. Current approaches to the problem in literature lack a comprehensive theoretical framework. Moreover, most of the evaluations concentrate on open data platforms, rather than on datasets.In this work, we address these two limitations and set up a framework of indicators to measure the quality of Open Government Data on a series of data quality dimensions at most granular level of measurement. We validated the evaluation framework by applying it to compare two cases of Italian OGD datasets: an internationally recognized good example of OGD, with centralized disclosure and extensive data quality controls, and samples of OGD from decentralized data disclosure (municipality level), with no possibility of extensive quality controls as in the former case, hence with supposed lower quality.Starting from measurements based on the quality framework, we were able to verify the difference in quality: the measures showed a few common acquired good practices and weaknesses, and a set of discriminating factors that pertain to the type of datasets and the overall approach. On the basis of this evaluation, we also provided technical and policy guidelines to overcome the weaknesses observed in the decentralized release policy, addressing specific quality aspects.
Article
Full-text available
Despite the development of Open Data platforms, the wider deployment of Open Data still faces significant barriers. It requires identifying the obstacles that have prevented e-government bodies either from implementing an Open Data strategy or from ensuring its sustainability.This paper presents the results of a study carried out between June and November 2012, in which we analyzed three cases of Open Data development through their platforms, in a medium size city (Rennes, France), a large city (Berlin, Germany), and at national level (UK). It aims to draw a clear typology of challenges, risks, limitations, barriers, all terms used by the different stakeholders with diverse meanings and based on different motivations. Indeed the issues and constraints faced by re-users of public data differ from the ones encountered by the public data providers. Through the analysis of the experiences in opening data, we attempt to identify how barriers were overcome and how risks were managed. Beyond passionate debates in favor or against Open Data, we propose to consider the development of an Open Data initiative in terms of risks, contingency actions, and expected opportunities. We therefore present in the next sections the risks to Open Data organized in 7 categories: (1) governance, (2) economic issues, (3) licenses and legal frameworks, (4) data characteristics, (5) metadata, (6) access, and (7) skills.
Article
Open government data (OGD) are valued by many countries and governments worldwide because of its important political, economic, and social benefits. Based on the resource-based theory, we construct a research model from the aspects of tangible, intangible, and human resources, as well as organizational culture to explore the factors that influence open government data capacity (OGDC). Results indicate that data variables, basic resources, organizational arrangement and technical capacity are directly related to the OGDC of government agencies; power distance negatively moderates the relationship between organizational structure and OGDC; uncertainty avoidance moderates the relationship among basic resources, organizational arrangement and OGDC. On this basis, we put forward relevant suggestions for the following development of OGD.
Article
The quality of metadata in open data portals plays a crucial role for the success of open data. E-government, for example, have to manage accurate and complete metadata information to guarantee the reliability and foster the reputation of e-government to the public. Measuring and comparing the quality of open data is not a straightforward process because it implies to take into consideration multiple quality dimensions whose quality may vary from one another, as well as various open data stakeholders who - depending on their role/needs - may have different preferences regarding the dimensions' importance. To address this Multi-Criteria Decision Making (MCDM) problem, and since data quality is hardly considered in existing e-government models, this paper develops an Open Data Portal Quality (ODPQ) framework that enables end-users to easily and in real-time assess/rank open data portals. From a theoretical standpoint, the Analytic Hierarchy Process (AHP) is used to integrate various data quality dimensions and end-user preferences. From a practical standpoint, the proposed framework is used to compare over 250 open data portals, powered by organizations across 43 different countries. The findings of our study reveals that today's organizations do not pay sufficient heed to the management of datasets, resources and associated metadata that they are currently publishing on their portal.
Article
Open government data (OGD) have important political, economic, and social values that are highly valued by many countries around the world. These data represent the cross-boundary information sharing practices between governments and the public. Therefore, this paper investigates OGD by following the theory and practice of cross-boundary information sharing. Most of the current studies have focused on OGD platform, whereas this study has focused on individual government departments and datasets. From the perspective of public data users, we constructed an evaluation index to measure OGD quality, which reflected the degree of cross-boundary information sharing between the governments and the public. We specifically concentrated on the external environment of OGD and selected 128 government departments in Shanghai, Beijing, and Wuhan in China as samples. We found that institutional capacity is an important factor for OGD quality of individual government departments. However, technology capacity and organization arrangement demonstrated small significance for OGD quality. The pressure from the public and higher-level government departments can moderate the relationship between institutional capacity and OGD quality. These results can guide governments in prioritizing the improvement of OGD quality and implementing an OGD project.
Article
With an emerging interest in open government data (OGD) around the world, there has been an increasing need for research on the determinants of OGD adoption. This study examines factors influencing the adoption of OGD among government agencies in Taiwan to fill the existing knowledge gap. Accordingly, based on previous research on innovation adoption, we develop a research model that integrates the technology-organization-environment (TOE) framework and the following four factors that are central to adoption decisions: perceived benefits, perceived barriers, organizational readiness, and external pressures. We examined this model through survey data from 342 government agencies of the Executive Yuan in Taiwan. The results show a significant positive relationship among perceived benefits, organizational readiness, and external pressures and the adoption of OGD by government agencies. This study creates a valuable reference for other countries in the early stages of OGD initiatives and has significant implications for governmental policy practitioners.
Article
Public sector bodies maintain a large amount of data from various domains. This data represents a potential resource that organizations and individuals can use to enhance their own datasets or which can be used to develop new and innovative products and services. In order to foster the reuse of the data held by the public sector bodies a number of countries around the world has started to publish its data according to the Open Data principles. In this paper we present a set of benefits that can be achieved by publishing Open Government Data (OGD) and a set of risks that should be assessed when a dataset is considered for opening up. Benefits and risks presented in this paper were mostly identified during two of our OGD activities.