Conference PaperPDF Available

What Data Should I Protect? Recommender and Planning Support for Data Security Analysts

Authors:
  • Cloudera, Palo Alto, United States

Abstract and Figures

Major breaches of sensitive company data, as for Facebook's 50 million user accounts in 2018 or Equifax's 143 million user accounts in 2017, are showing the limitations of reactive data security technologies. Companies and government organizations are turning to proactive data security technologies that secure sensitive data at source. However, data security analysts still face two fundamental challenges in data protection decisions: 1) the information overload from the growing number of data repositories and protection techniques to consider; 2) the optimization of protection plans given the current goals and available resources in the organization. In this work, we propose an intelligent user interface for security analysts that recommends what data to protect, visualizes simulated protection impact, and helps build protection plans. In a domain with limited access to expert users and practices, we elicited user requirements from security analysts in industry and modeled data risks based on architectural and conceptual attributes. Our preliminary evaluation suggests that the design improves understanding and trust of the recommended protections and helps convert risk information in protection plans.
Content may be subject to copyright.
What Data Should I Protect? Recommender and Planning
Support for Data Security Analysts
Tianyi Li
Computer Science Department, Virginia Tech
(previously: UX Team, Informatica)
Blacksburg, VA
tianyili@vt.edu
Gregorio Convertino
UX Research, Google
(previously: UX Team, Informatica)
San Francisco, CA
gconvertino@gmail.com
Ranjeet Kumar Tayi
UX Team, Informatica
Redwood City, CA
rtayi@informatica.com
Shima Kazerooni
UX Team, Informatica
Redwood City, CA
skazerooni@informatica.com
ABSTRACT
Major breaches of sensitive company data, as for Facebook’s 50
million user accounts in 2018 or Equifax’s 143 million user accounts
in 2017, are showing the limitations of reactive data security tech-
nologies. Companies and government organizations are turning to
proactive data security technologies that secure sensitive data at
source. However, data security analysts still face two fundamental
challenges in data protection decisions: 1) the information overload
from the growing number of data repositories and protection tech-
niques to consider; 2) the optimization of protection plans given the
current goals and available resources in the organization. In this
work, we propose an intelligent user interface for security analysts
that recommends what data to protect, visualizes simulated pro-
tection impact, and helps build protection plans. In a domain with
limited access to expert users and practices, we elicited user require-
ments from security analysts in industry and modeled data risks
based on architectural and conceptual attributes. Our preliminary
evaluation suggests that the design improves the understanding
and trust of the recommended protections and helps convert risk
information in protection plans.
CCS CONCEPTS
Human-centered computing Interactive systems and tools
.
KEYWORDS
Recommender Systems; Security Software; Data Protection; Multi-
factor Decision-making; Intelligent User Interfaces; User-Centered
Design.
ACM Reference Format:
Tianyi Li, Gregorio Convertino, Ranjeet Kumar Tayi, and Shima Kazerooni.
2019. What Data Should I Protect? Recommender and Planning Support for
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA
©2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6272-6/19/03.. .$15.00
https://doi.org/10.1145/3301275.3302294
Data Security Analysts. In 24th International Conference on Intelligent User
Interfaces (IUI ’19), March 17–20, 2019, Marina del Rey, CA, USA. ACM, New
York, NY, USA, 12 pages. https://doi.org/10.1145/3301275.3302294
1 INTRODUCTION
After the 2017 Equifax data breach impacting about 143 million
users [
29
], Facebook reported an attack of their network system
that exposed personal information of nearly 50 million users on
September 28, 2018 [
12
]. Such cyber theft cases are topping the
list of risks for which businesses are least prepared [
9
]. This alerts
cybersecurity researchers and software providers that traditional
reactive approaches, which are based on anti-intrusion technologies
such as rewalls and digital signatures, can be circumvented and
thus are insucient to protect the application and data layers of
company systems [19].
Reactive approaches are good at answering what is attacked and
where. Models and tools are developed to detect malicious activities
that have happened, such as anomalies in user activity [
21
,
26
]
and network systems [
10
]. However, these solutions are usually
specialized for a certain type of risks. More importantly, detect-
ing, isolating and remediating infections can take weeks in most
organizations (e.g., [
6
]). This long reaction time leaves detected vul-
nerabilities exposed to attackers and increases the loss. To mitigate
risks before it is too late, organizations – and thus data security
software vendors – are re-focusing attention on proactive data pro-
tection at source, i.e. data-centric security (Figure 1). They apply
security techniques such as encryption to the source database from
which other dependent databases and systems access the sensitive
data (e.g., [1, 3, 20, 24]).
With this new focus, the problem now becomes what to protect
and how, "proactively". This is a hard problem because making
such decisions requires a good understanding of complex data
risk situations – the data owned and managed by organizations are
manifold and sensitive, and can be accessed, exported, and modied
by dierent parties with varying authority. In fact, Equifax’s data
breach was caused by a third party company that supported their
online dispute portal. In addition, the human element is often the
weakest link in information security strategies, no matter how
secure the system is [
14
,
27
]. Furthermore, as governments enforce
information security policies, failure of compliance will lead to legal
and nancial penalties plus reputation loss.
IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA T. Li et al.
Figure 1: Data-Centric Security: re-focusing from indicators
of attacks (left) to proactive data protection at source (right)
The volume and complexity of data as well as the limited budget
and resources make it dicult for organizations to protect every-
thing equally [
30
,
34
]. Intelligent user interfaces are a natural solu-
tion to bridge the gap between the need to make optimal protection
decisions and the insucient support to manage the voluminous and
complex risk information. Moreover, the same data might represent
dierent value to dierent organizations and regulated in multiple
ways by more than one policy. Foraging information relevant to
the protection goals, understanding and verifying why certain data
warrant certain protection activities are important to the quality
and the eciency of the data protection.
In this paper, we present an explainable intelligent user interface
that interactively recommends and simulates protection options
and carries the insights into aggregated plans. We model data risks
by distinguishing the architectural and conceptual attributes and
compute risk metrics from dierent perspectives. The system 1) rec-
ommends groups of data stores by the expected protection impact,
i.e. highest risk reduction with the given budget, 2) displays the re-
lated risk factors and visualizes the simulated protection impact to
explain the recommendation rationale, 3) captures user interaction
to interpret latent user preference and updates recommendations
accordingly. We followed a user-centered design approach [
7
]: we
rst conducted user research on needs and then ran four iterative
design and evaluation cycles with target users and proxies. The eval-
uation feedback suggests that our system design can help analysts
better understand the risk situation, convert risk information into
protection plans, and adjust their protection goals when necessary.
2 RELATED WORK
In this section, we rst motivate our work with the contemporary
data breach incidences, how reactive security systems are no longer
sucient, and the usability gap in current data-centric security
systems. We then elaborate on the key challenges of designing
proactive security support, and nally describe the opportunities of
IUI to address the challenges in data-centric security system design.
2.1 Proactive Detection and Information
Overload
With more sophisticated hacking techniques [
31
] there is an in-
creased incidence of data breaches. The Identity Theft Resource
Center reported 1,579 breaches in the US in 2017, 45% percent up
from 2016 [
25
]. This growing phenomenon and two motivators
are leading organizations to adopt data security systems: avoiding
business disruptions or losses due to data breaches and complying
with sterner government regulations such as the General Data Pro-
tection Regulation (GDPR) (e.g., see analysis in [
33
]). It is common
among enterprises to adopt data security techniques such as en-
cryption, tokenization, masking, or access control to protect their
sensitive data, as traditional defenses like rewalls and signature-
based technologies are being circumvented by attacks aimed at the
application and data layers of company systems [19].
Here we categorize the contemporary data security systems in
two groups. The rst group collects and analyzes user events and
log data to detect anomalies or identifying malicious user activi-
ties [
4
,
21
,
26
]. The second group ags risks on sensitive data (e.g.
Informatica’s Secure@Source [
24
], IBM’s QRadar Security Intel-
ligence [
1
], and Imperva’s SecureSphere [
3
]). The rst group is
more reactive in nature as it focuses on the footprints of previous
activities. The second group of systems, which our work aims to
extend, allows preemptively dening and applying security policies
across data silos, thus building stronger bastions against threats.
Techniques such as machine learning are also increasing the detec-
tion accuracy by replacing older rules-based and signature-based
technologies.
Both groups of systems help with discovery and analytics. How-
ever, they rely on the human expert to prioritize data at risk and
translate the discovered risks in protection decisions on a case by
case basis. Our work aims at supporting the analysts in this nal
phase, by helping with information overload, prioritization against
risk metrics, and optimization of protection plans.
2.2 Risk Quantication and Protection
Prioritization
It’s hard to get access to expert users and practices in this domain.
We are aware of only a few studies, qualitative in nature. M’manga
and collaborators [
2
] conducted a qualitative study with ten security
analysts from the IT departments of three organizations. Their
interviews highlighted factors that inuence risk interpretation
and the overall complexity of the decision-making task. They found
that the decisions to remediate vulnerabilities are conducted in
constrained conditions and are based on non-standardized analysis,
which they call ’folk risk analysis’. A similar interview-based study
with thirty security practitioners was conducted by Werlinger and
collaborators [
36
]. Their ndings pointed to the collaborative nature
of this work, with multiple stakeholders involved and the limitation
of the current security systems. They argue that the systems should,
for example, help more with collaboration and knowledge sharing,
reduce task complexity (e.g., by supporting task prioritization), and
integrate data security and communication tools into one platform.
Other qualitative analyses have argued that data protection plans
can be viewed as investment decisions for the organization. The
investment should be proportionate to the risk and lifetime of
data (e.g., [
11
,
34
,
35
]). That is, not all data at risk can be protected
equally: data of less value might not be worth the cost of conducting
protection. Analyzing and comparing the return on the investment
across protection plans is an area where future systems can help
e.g., [30, 35].
In summary, a few studies pointed to several unfullled needs
of analysts. Security analysts must decide what data to protect
and with what priorities. At the same time, they need to manage
What Data Should I Protect IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA
multiple constraints to optimize protection plans. We are not aware
of system evaluations that specically address the decision-making
aspect of data security analysts.
2.3 Intelligent UIs to Support Multi-factor
Decisions
Automatic data protection systems have been easily compromised
by attackers using commonly available attack vectors against known
defensible vulnerabilities [
22
,
37
]. In fact, the decision-making pro-
cess of security data protection is inuenced by multiple actors and
factors that change over time: the organizational structure and the
industry, the stakeholder who administers the data security budget,
the available budget, the business priorities, and so on (e.g., [
16
]).
A data security team relies on the eective and collaborative use of
people, processes, and technology [
23
,
36
]. Thus, the human expert
must be in the loop to identify data at risk, set goals and priorities
with relevant stakeholders, understand constraints (e.g., budget),
and decide what data to protect.
Current systems help with risk detection but then leave it to
the human to translate the overwhelming risk information into
protection decisions. We argue that a user-centric design based
on Intelligent User Interfaces (IUI) and mixed-initiative systems
can better support analysts with such multi-factor decision-making
problems. IUI can help the analyst to rst set goals or, equivalently,
select the relevant denitions of risks which in response help the
prioritize data at risk. Future systems can support the prioritization,
for example, by comparing economic models in information security
investment [
32
], estimating returns on security investments [
30
,
34
,
35
], and methods to generate and aggregate rankings of the risks
in a system (e.g., [28]).
Another reason for IUI is to help explain risk and priorities. In
fact, one of the biggest challenges for data security teams is to
estimate the value of data and the (negative) value of losing or
disclosing it, or risk. In a survey of 37 cyber insurance experts,
the European Union Agency for Network and Information Secu-
rity (ENISA) found that cyber insurers and organizations face the
challenge of dening risk measures [
18
]. Given the current basic un-
derstanding of risk, they recommend that organizations understand
their risk before addressing it. For example, a classic constraint
when prioritizing solutions is the limited budget for protections
(e.g., see protections as investments in [11, 30, 34, 35]).
We propose applying existing IUI approaches to explain recom-
mendations (e.g. by showing the relevant security policies, types
of sensitive data, etc.) analogously to [
13
,
17
] and assessing the
expected protection impact of recommended data analogously to [
8
,
17
]. Our work is in line with these IUI approaches. We are not aware
of existing systems that have applied these approaches to reduce
information overload and support multi-factor decision making by
security analysts.
3 USER-CENTERED DESIGN
The above related works suggest two hard problems in making data
protection decisions: 1) translating the discovered risks into appro-
priate protection decisions, 2) optimizing protection decisions into
executable plans based on multiple factors such as plan benets
and costs and the goals of the organization. Both problems require
U Industry Size Job Role
1 Human Res. 1.6k Chief Info. Security Ocer
2 Health Care 10k+ Data Architect
3 Education 10k+ Chief Info. Security Ocer
4 Technology 1-5k Sr. Director of Info. Security
5 Technology 10k+ Security Technical Program Mgr.
6 Finance 10k+ Data Base Adm. Manager
7 Technology 10k+ Chief Security Architect
8 Telecom. 5-10k Security Capabilities Expert
9 Financial 200 IT Security and Compliance Mgr
10 Financial 10k+ IT Development Manager
11 Technology 1-5k Director of Strategic Bus. Dev.
Table 1: Target users interviewed in the user research: com-
pany industry, company size (number of employees), and
users’ job roles.
an in-depth understanding of real-world user practice and require-
ments. In our two years of work with practitioners we observed
the following domain-specic challenges for system design in data
security:
In the security domain, it is hard to get access to enough
variety of real-world data, experts, and practices since they
are deemed sensitive by organizations.
Data-centric security is a new domain and new market for
software applications. There are no de facto successful sys-
tems yet to use as references for designers.
Data-centric security by denition should be adaptable to
the data and its users. It is hard to standardize designs across
various data structures and user classes.
Finally, enterprise software systems have long sales and
deployment cycles, which limits the opportunity for fast
design and evaluation iterations with new organizations
who deploy a new system.
To address these challenges, our user-centered design is comprised
of a year-long empirical user research with target users (Table 1)
and four iterative design and evaluation cycles afterward with proxy
users (Table 2). In the following subsections, we rst report the
methods and ndings of the user research, then we report more in
detail on the iterative design process and results. The four iterations
were based on the user research and led to the nal system design
and implementation.
3.1 User Research: Goals, Pain Points, and
Requirements
The goal of the user research was to understand the participants’
pain points, use-cases, and expectations in data-centric security.
3.1.1 Method. We conducted one-on-one, semi-structured inter-
views with 11 target users (Security Managers or Security Analysts)
at 11 companies (Table 1). Each session was remote (using WebEx
and conference calling) or in-person and lasted 90 minutes. Af-
ter a 5-minute introduction to explain the purpose and agenda of
the interview, we asked questions about participants’ day to day
activities and overall responsibilities related to security products
IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA T. Li et al.
(15 minutes). Then we spent 40 minutes asking the participants
their current practice regarding: 1) the overall risk assessment of
the company, 2) user behavior analysis, 3) security violation alerts,
and 4) policy compliance. After that, we collected more specic
feedback on performing the above four tasks using [
24
] and how
to improve them. The interviews were audio and video recorded
with participant permission. We transcribed the recordings and
analyzed the ndings with 4 two-hour expert review sessions. We
summarize the ndings below.
3.1.2 User Personas. Quoting our target users’ own terminology,
they are professional security analysts who focus on identifying
and prioritizing datasets based on "business risks" evaluated against
the "nancial asset and liability valuations", and the "required se-
curity budget". The goal is to make investment decisions for data
protection "proportionate to the risk mitigation and the lifetime of
a dataset" [11, p. 9].
3.1.3 Disconnected Tools and Lack of Intelligence Hinders Analysis.
Two major limitations of current tools that constrain security anal-
ysis emerged from the interviews. First, target users wanted to see
sensitive data getting discovered, analyzed, and protected in one
single tool. For example, U11 stated: "One tool is ideal so that I can
dene one set of policies; it can [ensure data security] across the
enterprise". Having to manage too many tools distracts target users
from the main analysis. Such analysis work fragmented among
dierent tools impedes full visibility of the sensitive data and the
associated risk across the many data stores. A similar limitation was
found by [
36
]. Second, multiple target users requested to have their
current systems augmented with intelligence and automation. For
example, U9 stated: "Ideal way would be an application or tool that
is automated and as smart as possible to learn from its mistakes".
U6 stated: "It would be nice to just turn the service on and it can
nd the data and mask it. Automation."
3.1.4 Top Analysis Tasks and Requirements. The interviews with
target users also revealed the following top tasks that data security
analysts and managers perform.
Executive tasks. Data security analysts and managers are re-
sponsible for dening security strategies and getting management
buy-in for security investments. This requires them to maintain an
updated understanding of the latest policies and data risk situation
in the organization.
Internal Housekeeping Tasks. Data security analysts need to
create information security policies, controls, and procedures to
monitor the status of the data stored and managed in the organiza-
tion. The goal is to ensure that the data is safe both at rest and in
motion. This requires them to create internal policies that set rules
for the system to detect anomalies and push notications.
Policy Auditing Tasks. Besides the system-level monitoring of
the data, data security analysts also need to review and analyze
violations of relevant policies and standards, as well as reviewing
the compliance level. Failure of compliance at the required level
would lead to a considerable amount of ne and reputation damage.
Protection Management Tasks. Data security analysts would
design and propose new data protection plans according to the
latest risk situation. Data security managers would review, approve,
Participants Iterations
Job Role
1 2 3 4
Product Manager P1
Product Manager P2 ✓ ✓ ✓
Sales Manager P3 ✓ ✓ ✓
Security Manager P4 (U) ✓ ✓ ✓
Security Service Manager P5 (U)
SW Development Manager P6
Data Scientist P7
Security Engineer P8
Security Manager P9 (U)
Security Architect P10 (U)
Table 2: Participants of Each Iteration of the User-Centered
Design: four are target users (marked by U), six are proxies.
and assign the protection plans to the appropriate roles to address
the detected violations.
Execution Tasks. Data security technicians are responsible for
implementing the processes to secure the technology infrastructure
and the company data, according to the approved protection plans.
This also includes managing project integration of new systems
and services, as well as enabling old and new partners.
3.2 Problem Modeling and Validation
In the rst iteration, drawing on the ndings of the user research,
we prioritized and validated the user requirements in focus and
modeled the problem by separating the two major data security
concerns: what to protect and how.
3.2.1 Method. We conducted a design workshop that involves the
four authors and one proxy user. There were two sessions: 1) clas-
sifying the user requirements collected during prior investigations
with target users; 2) sketching paper prototype designs to address
the validated requirements and follow-up discussions. There was a
30-minute break in between.
The rst half of the workshop is a one-hour requirement valida-
tion session. P1 (see Table 2) is the product manager of an existing
data security system [
24
]. P1 has rich experience and deep under-
standing of the requirements and pain points of end users from
dierent domains. The two authors that led the year-long empirical
user research with target users played proxy users and relayed the
ndings of the user research. The team of workshop participants
agreed then on the prioritization of the user requirements. The au-
thors nally worked on the problem modeling based on information
from the proxy users.
The second half of the workshop is a one-hour design session.
Each of the four authors was given 30 minutes to sketch a paper
prototype of a design that would address the requirements. After
that, all the designs were put together and evaluated in discussions
involving all the participants. In summary, the workshop allowed
specifying requirements, exploring alternative design concepts, and
discussing potential design choices with proxy users.
3.2.2 Separation of Concerns: What and How. We summarized the
requirements from the user research in two main security concerns:
What Data Should I Protect IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA
Figure 2: Cartesian space of data attributes relevant to se-
curity analysis: conceptual and architectural axes. Each col-
ored balls with numbers represents a column in a database.
Each cylinder is a group of columns that have the corre-
sponding architectural and conceptual attributes. For exam-
ple, Data Store B has 6 columns of data governed by GDPR
policy, 2 containing SSN data. In Data Store A, columns 1, 2,
and 9 contain SSN data.
Figure 3: Example of Architectural and Conceptual At-
tributes of Data Units: Amanda is the rst name of a cus-
tomer, and other types of information of this customer are
stored in multiple tables and data stores.
selecting what data an organization should protect rst, given the
relevant criteria; and building protection plans optimized against
multiple competing criteria or priorities.
Deciding how the data should be protected depends on data at-
tributes distinct from those informing the decision whether the data
should be protected. The "whether-to-protect" decisions depend
primarily on the conceptual value people attach to data, which can
be quantied as the business cost for the company if the data is lost
or compromised (e.g., government penalty [45], customer lawsuits,
reputation damage [32]). The "how-to-protect" decisions are con-
strained primarily by the architecture and technology used to store
and manage the data. Based on the information from proxy users,
we proposed the model in Figure 2 to separate concerns between
these two types of constraints.
On the architecture axis, we list the attributes that describe
the architecture and technology for data management. The same
type of sensitive data, governed by the same policy, might live
in dierent databases. Thus, the constraints of the architecture
Figure 4: Low Fidelity Prototype: Recommendations on Pro-
tection Options based on the Goals Set by the User
(platform, service) need to be accounted for. For example, dierent
databases have dierent compatibility and technical support for
data operations. The data protection technique Apache Sentry
can
regulate user access control on Hadoop clusters, but might not be
as helpful on a MongoDB database.
The conceptual axis is relevant when deciding what to protect.
The architecture axis is relevant when executing the protection. The
proposed model allows a practical separation of concerns among the
attributes of a data unit, without worrying about the relationship
among these attributes.
3.3 Validating Design Concepts and Workow
In the second iteration, we developed and evaluated a low-delity
prototype. The prototype shows a sidebar built as an extension of
an existing data-centric security system [24]. The sidebar has two
main functions: show protections (recommended or user created)
and build plans.
3.3.1 Method. We conducted semi-structured interviews with three
participants (P2-P4 in Table 2). Each interview started with a 5-
minute introduction to explain the purpose and the agenda, then a
general question session (20 minutes) to inquire the participant’s
job role, risk metrics commonly used, examples of their real risk re-
duction projects, and unfullled needs (e.g., "Given a risk reduction
goal, what would you like to be recommended by the data security
management system"?). In the second part of the interview (40
minutes), the participants were shown the low-delity prototype
(Figure 4 and 5) in a Wizard of Oz manner. The participants gave
feedback on each screen. The main evaluation criteria used were
ease of understanding of the design components, the utility of the
functions included, and whether there were any missing functions
or information in the prototype.
3.3.2 More Intelligent and Explainable Recommendations. A key
nding was that all participants requested to see system recommen-
dations at the outset of the task, rather than focusing on specifying
their goal rst. They wanted to see "what protections would have
the most impact with the lowest cost and eorts" (P4), what are
the "top N actions to reduce risk the most" (P3), and "what area
should I focus on rst" (P2). At this early stage of the analysis, they
usually do not have enough knowledge and understanding of the
risk situation to set a goal or manually create a protection from
scratch. In addition, P2 and P3 suggested that they would like to see
IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA T. Li et al.
Figure 5: Low Fidelity Prototype: Optimization of Protection
Plans
both current and estimated future risk metrics to analyze the impact
and get a sense of progress. P2 also suggested that more visual cues
will help her understand and trust system recommendations.
3.3.3 Reorganization of Screen Real Estate. P2 and P4 suggested
the planning and optimization of protection options deserve more
screen real estate to incorporate more details of the potential plans.
3.4 Evaluating Interaction Design
Based on the feedback from the rst two iterations, we implemented
an interactive prototype to collect further feedback from more
target users during a third iteration.
3.4.1 Method. We evaluated the interactive prototype with the
same semi-structured interview method as in the second iteration.
It involved seven participants (P2-P8 in Table 2). The feedback
collected revealed more specic requirements summarized in the
three areas listed below.
1. Risk metric selection and ordering. P3 suggested that it would
be useful to allow grouping or ltering of policies by the sensitivity
level so that he could focus on policies (i.e., recommendations) of
higher sensitivity level rst. P3 ranked risk metrics by importance
as data risk cost, protection expense, and risk score, whereas P2
ranked risk score as most important, then protection expense. She
did not consider data risk cost relevant. Furthermore, what P2 really
cared about is the ratio (risk score/protection expense), to assess
the cost-eectiveness of protections. P3 and P4 commented that it is
complicated and challenging to estimate protection expense. This is
due to uncertainty about the future, the dierent systems deployed
in each company, and non-standardized terms and concepts (e.g.,
some protections will have expenses in dierent areas that are
hard to compare). However, all participants pointed out that it is
important that we consider these metrics, and that showing a rough
estimate of a condence interval for each metric is also helpful.
2. Terminology and visual cues. P2 commented that the terms
used in the prototype should be more intuitive and less technical so
that a broader business audience could understand and benet from
the system. In this version, we had thumbs-up and thumbs-down
for users to indicate if the recommendations are useful or not. P2
did not understand these cues. P4 suggested that whether a user
selects a recommendation to include in the plan or not would be
enough to indicate the usefulness of the recommendation.
3. Interaction and transition of functionality. In this prototype,
users can see the details of selected recommendations on the bottom
of the bar. P4 commented that it would be more intuitive to expand
the recommendation in place.
4 FINAL SYSTEM DESIGN AND
IMPLEMENTATION
The nal system design includes a recommender sidebar (Figure 6)
and a plan building workspace (Figure 8). The intelligent UI is builds
on the risk information coming from an underlying system [
24
], our
risk modeling, and our impact-aware recommendation algorithm.
4.1 Interactive Recommendation on the
Sidebar
In response to the overwhelming volume and complexity of detected
risks, the system makes recommendations based on the "attributes"
with the highest impact on risk metrics, i.e. the data stores that
share those attributes are the most worthy of protection.
4.1.1 Impact Analysis: What if I protect this. The recommendations
are listed in a sidebar that extends an existing data security system
(Figure 6). At the top of the sidebar is an impact analysis carousel
that compares the current and expected future values of each risk
metric. Clicking on the checkmarks beside each recommendation
will include or remove a group of data stores in the current plan.
Consequently, the expected future risk scores in the carousel will
be updated to estimate the aggregated impact of all selected recom-
mendations.
The data stores are recommended in groups based on their secu-
rity policies, data domains, or other attributes on the conceptual
axis in Figure 2. Each group has a title describing the grouping
criterion, the number of data stores, the total number of data elds,
and the expected/current risk metric values. Clicking on the title
will expand and display details of the group in a tabular view (see
Figure 7).
Users can slide the carousel to see dierent risk metrics (see
"Risk Score" at the top right in Figure 6). Users can also "Select
all" recommendations, turn "On/O" the impact analysis, show
"More" or "Less" groups, or "Reset" the recommendations. "View
by" allows users to apply lters to further narrow down the list
of recommendations. "Rank by" selects the risk metrics to rank
recommendations.
4.1.2 Recommendation Algorithms and User Input. The recom-
mender reads the risk information from the underlying system
[
24
] and computes protection impact by potential changes in risk
metrics. The underlying system scans the data assets in the organi-
zation and quanties the current risk using the method described
in [20].
As found in our user research, security analysts are most in-
terested in dening protections that will bring the highest risk
reduction with the given budget (highest impact). We measure the
impact of a protection decision by the expected risk score reduc-
tion, protection coverage increment, expenditure on execution, and
elimination of loss.
In the recommender algorithm, we implemented two ways for
the user to transfer knowledge to the system. The rst is at the
What Data Should I Protect IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA
Figure 6: Recommender Sidebar Extending an Existing Data Security System [24] (See Video[5])
Figure 7: Recommender Sidebar: Recommendation Details
risk factor level. Security analysts can customize risk computation
by tuning weights of dierent risk factors, such as the governing
policies of a data eld, or the user activity count (see Figure 13). The
second way to transfer user knowledge to the system, and more
frequently, is by allowing the security analyst to select the most
relevant protection from the recommendation sidebar. Data secu-
rity analysts usually do not have a comprehensive understanding
of the entire risk situation in the organization. Yet they can decide
if protecting a recommended group of data is in line with their
goals. The interface captures user interactions such as selecting,
unselecting, or expanding a recommendation to interpret latent
user preference. For example, if the user selects a recommendation
to protect all data stores governed by GDPR policy, the weight of
GDPR policy will increase, so will the data domains included in
GDPR policy. The precision-recall values are computed by com-
paring the recommended groups of data stores, and the selected
groups of data stores.
We summarize the recommendation process as follows:
(1)
(Cold start) Initiate the weights of each dimension on the con-
ceptual axis to be user-specied values (set as equal weights
by default).
(2)
Compute the future values of each risk metric of the elds
on each conceptual dimension.
(3)
Compute the rankings of dimensions by the impact on each
risk metric and feed the data to the UI.
(4)
Increment the weights of the dimensions related to user-
selected recommendations (e.g., if the user selects a policy,
then the weights of data domains governed by this policy
are also incremented).
(5) Go back to Step 2 with updated dimension weights.
4.2 Plan Building Workspace
After selecting some candidate protections options, data security
analysts need to build and compare dierent plans to analyze the
expected benets and costs.
4.2.1 Create and Iteratively Edit New Plan. The plan building workspace
allows ne-tuning a protection plan by adding or removing indi-
vidual data stores. The data stores in the current plan are ordered
in a bar chart by their impact on risk reduction. For example, in
Figure 9, the risk metric selected is "data risk cost", so the data
stores leading to more reduction in data risk cost are on the top.
The numbers beside each bar of data stores show the residual val-
ues of risk metrics after protecting the data store and those above.
Users can hover over the data store names to see the exact risk
reduction and other details of each data store. Selecting a dierent
risk metric will change the ranking of the data stores. The num-
ber of stores currently in the plan is shown at the top left of the
workspace. The user can also add more data stores manually or
access recommendations (see Figure 10).
Users can also edit and save the details of the current plan (see
Figure 9, right). This design aords future integration with coordi-
nation and collaboration functions such as creating, assigning and
executing protection plans.
4.2.2 Review and Compare Multiple Plans. As found in our user re-
search, data security analysts usually build multiple plans, evaluate
and iteratively edit them, before putting a protection plan in exe-
cution. We extend the plan building workspace with a low-delity
IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA T. Li et al.
Figure 8: Plan Building Workspace - All Saved Plans (Left), and Compare Plans views (Right)
Figure 9: Plan Building Workspace: Create New Plan
Figure 10: Plan Building Workspace: Iterative Edit and Save
Plans with Details
prototype that provides full-page views for the users to review the
details of existing plans (Figure 8, left) or compare multiple plans
(Figure 8, right). The top of the plan detail page shows the accu-
mulated impact of all selected plans, with all risk metrics visible
side by side. For each plan, the details of data stores are shown in
a compact table card, including the current and estimated future
values of risk metrics.
Data security analysts can also select several plans for in-depth
comparison and evaluation. The plan comparison page displays
selected plans side by side in a tabular view (Figure 8, lower right).
On the top of the page is the aggregated impact of the selected plans
on the entire data assets in the organization. The table shows the
current and estimates future risk metric values of the data covered
in each plan. Below the risk metrics the interface shows the the
number of data stores, data domains, and data elds covered in each
plan. Users can click on one plan (see the highlighted blue frame
in Figure 8, lower right) to see the overall impact on the entire
data asset; they can also drag a plan card to the left to rank it as
more preferred, or to the right as less preferred. Users can add more
plans in the comparison view by clicking on "Add Plans" button on
the upper right of the view, or exit by clicking on "Cancel". The
current plan comparison can be exported as a report for review by
stakeholders and budget approval by managers. Users can iterate
on plan renement before executing the protection.
4.3 Risk Modeling
We build on the risk quantication in [
20
] and model the risk by
separating the two major concerns: how is a data unit used and
regulated (conceptual attributes), and how is a data unit stored in
systems (architectural attributes) (Figure 2).
4.3.1 Architectural Aributes: How Data is Stored in Systems. The
Architectural Axis encapsulates how data are stored and transferred
in various systems and services. Each data unit (e.g. "Amanda" is the
First Name of a customer in Figure 3) is stored in a (column, row) cell
in some tables of some data stores. Computationally, we construct
a vector to represent where a data unit is stored in data stores,
its user access authority and history. Other information can be
computed with these basic elements. We compute the architectural
risk score of each data store as a weighted sum of the following four
architectural risk factors.
What Data Should I Protect IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA
Figure 11: Hierarchy of Data Structures in the Risk Modeling
Figure 12: Risk Score Computation: Utility functions that
formalize architectural/conceptual attribute values into
scores then take the weighted sum.
Figure 13: Risk Factor: Weights (set by users) and Values
(1) Protection Percentage measures the percentage of encrypted
rows in a data store. If any data store in a lineage is compromised, all
data stores in the same lineage are considered at risk and requiring
protection.
(2) Number of Targets measures the number of data stores in-
volved in the same data lineage as the current data store.
(3) User Access Count measures the number of user accounts
having access to the columns in a data store.
(4) Impressions measures the total number of rows actually ac-
cessed by users in a data store.
We encode and normalize each factor as a score (rightmost col-
umn in Figure 12) and compute the weighted sum (weights set by
users as shown in Figure 13) as the architectural risk score of a data
store.
4.3.2 Conceptual Aributes: How Data is Used and Regulated. Each
organization has their own implementation of data structures. For
example, a customer’s First Name might be stored in the column
"First Name" in one data store, but "Given Name" in another. The
Conceptual Axis categorizes the business value of data, such as data
domains, according to standardized security regulations like EU
General Data Protection Regulation (GDPR) [
33
]. A data domain
is a category of columns in data stores. For example, as show in
Figure 3, SSN can uniquely identify a person, thus forms a Personal
Identier (PID) data domain. Another example is given by First
Name or Address, each considered Personally Identiable Informa-
tion (PII) but not sensitive enough to uniquely identify a person; yet
combined, First Name plus Address, can uniquely identify a person,
thus form a PID data domain.
Risk factors based on conceptual attributes are primarily assessed
by data owners and security policies like GDPR requirements. We
model conceptual risk score of a data store as a weighted sum of the
following four conceptual risk factors.
(1) Number of Sensitive Fields measures the number of data
columns governed by policies in a data store.
(2) Policy Impressions measures the number of rows governed by
a policy in a data store.
(3) Sensitivity Level measures the highest sensitivity level of
policies that govern the data domains in a data store.
(4) Risk Cost measures the unit monetary loss by a data record
if the data were to be compromised, including tangible loss of
policy penalty, and intangible loss like reputation damage. For
proof of concept, we dene risk cost as the monetary penalty per
row according to policies. The risk cost of a data store is the sum
of risk costs on each row.
We encode and normalize each factor (Figure 12) and compute
the weighed sum as the conceptual risk score of a store.
5 FINAL EVALUATION
The goal of the nal evaluation was to do an end-to-end assessment
of the nal design, including the recommender sidebar and plan
building workspace, and collect new requirements on the plan
building workspace.
5.1 Method
The nal evaluation involved ve participants (P2-P4, P9, P10 in
Table 2).We conducted semi-structured interviews with the same
evaluation criteria of the second and third iterations (ease of un-
derstanding, utility, and missing functions or information). In the
rst 40 minutes of the interview, each participant evaluated the
interactive prototype (Figures 6 - 10). In the next 20 minutes they
evaluated the low-delity prototype of the plan review and com-
parison pages (Figure 8) presented in a Wizard of Oz manner as an
extension of the plan building workspace.
5.2 Results
All returning participants (P2-P4) were satised with this improved
version of the design. The new participants (P9, P10) praised the
ease of understanding and utility of the design components.
IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA T. Li et al.
5.2.1 Recommender Sidebar Reduces Information Overload. All par-
ticipants found the sidebar easy to understand and useful as a means
to cope with information overload and prioritize protection options.
We observed that P2 selected "Risk Score" to rank the recommenda-
tions, while P3 selected "Data Risk Cost". This echoed their prior
emphasis on the importance of risk metrics. In addition, they both
selected at least one other risk metrics to re-rank the recommenda-
tions, and discovered and selected the top ranked recommendations
that they would have overlooked otherwise. This suggests that the
recommender sidebar accommodates dierent analysis priorities
and gives a more comprehensive view of the risk.
5.2.2 Iterative Plan Building Reduces Analysis Overhead. All par-
ticipants applauded the seamless connection between the recom-
mender sidebar and the plan building workspace. "So all the data
stores in this view are from the previous recommendations [I se-
lected]? That is nice." (P10) We observed P9 experimenting with
dierent rankings of data stores, and re-opened the recommender
sidebar to add more data stores. P4 deleted two data stores that
require dierent protection techniques than the remaining data
stores, "...but it would be helpful to have it [protection techniques
and compatibility information] at hand, as it can get complicated
and hard to keep track".
5.2.3 Impact Analysis Supports Multi-factor Decision-Making. Be-
sides selecting protection options, the impact analysis function was
also deemed helpful for communication among multiple stakehold-
ers. P9 saw himself using the system for "reporting", "tell them this
is how much is protected, this is what I still need to do and my
plan of going forward". Understanding that the expected impact
is only an estimate, P9 would use the system to estimate loss by
"how much money that has been lost and this is the number of
people that have been aected", to evaluate the cost of data loss
in terms of "dollars per person". In doing so he would be able to
understand and communicate the overall data risk cost, and use
this information to decide and justify requests for budgets.
5.2.4 Further Requirements and Suggestions on Future Extensions.
P9 pointed to more granular ways to aggregate risk information
and protection plans, such as by lines of business or policy. Lines of
business (i.e., the leaders of business divisions) are the stakeholders
to whom P9 needs to report his analyses and suggestions. P10 sug-
gested that other useful aggregation criteria are location, platform,
and data domain (or type of data), to see for example what data
domains reside in which data stores and what the risks levels are
for each.
All participants appreciated the functions of reviewing plan
details and comparing plans (Figure 8). P2, P3, and P4 suggested
a few more extensions of features and concepts. P2 advised that
the "department" would be the next most important grouping at-
tribute needed in the design. P4 recommended that information
on protection techniques should also appear in the plan detail and
comparison pages of the plan building workspace. Two of our target
users (P4 and P9, Table 2) highlighted the need for saving multiple
alternative plans which can be vetted by the relevant stakeholders.
P4 reported that his team creates about 12 plans per year and about
20-25% of them get approved for implementation. This validates
the value of the design in Figure 8.
Figure 14: Impact analysis function implemented in [24]
6 DISCUSSION
This paper contributes user requirements from security analysts in
industry, a risk model, and an intelligent user interface design that
supports decision making and plan building in data security anal-
ysis. We implemented an interactive prototype that recommends
what data to protect, visualizes the expected impact of protections,
and allows building and comparing protection plans.
6.1 Separation of Concerns for Iterative
Analysis
Our user research suggests that current systems still have a sense-
making gap between their tools for risk detection and those for data
protection decisions. The feedback from our ten participants dur-
ing the iterations suggests that the proposed design can help with
lling this gap and better support decision-making around data
protection. The design addresses two key challenges: reducing in-
formation overload when selecting what to protect and facilitating
multi-factor decisions around protection plans.
Sensitive data scattered across data stores carries dierent value
to dierent business departments of the enterprise. We use at-
tributes of the data to measure risks (e.g., the department that
owns the data, those who have access to it, the database that stores
it, how the data is collected and used), make recommendations on
what data should be protected, and help optimize protection plans.
Drawing on our user research ndings, the design decomposes
the problem into two sub-problems: reviewing and selecting the data
to protect and building plans for the protection. The data attributes
can then be categorized accordingly: conceptual attributes dene
data values and are more relevant when selecting the protection tar-
gets; the architectural attributes dene data protection constraints
and are more relevant when deciding how to protect the targets.
This separation of concerns allows analysts to focus on the data
attributes that matter to the sub-problem at hand rather than ev-
erything at once. Beginning with conceptual attributes, analysts
familiarize themselves with the system and understand the overall
risk situation. Once they have a good understanding of the data
risk and a reduced problem space of data worthy of protection, they
turn to the architectural attributes to plan protections.
This two-step decision process can be iterative. When analysts
prioritize and plan for protection execution, they can do a second
round of ltering of what data to protect, further restricting or
expanding the decision-making space. The Cartesian space in Figure
2 models the overlap among dierent conceptual attributes while
maintaining the exibility of the slice-and-dice in the analysis (e.g.,
a policy might govern multiple data domains, where policy and
domain are overlapping conceptual attributes).
What Data Should I Protect IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA
6.2 Trustworthiness and Prediction Accuracy
A limitation of the proposed intelligent UI pertains to the trust-
worthiness of the expected future value of each risk metric. It is a
mere "estimate" based on the current risk situation in the current
prototype. The expected future values are computed by simulating
the protection of a set of data stores that are currently unprotected,
while keeping everything else as is. In the real world, the values of
other risk factors are likely to change and aect our estimates.
By the time a plan is executed, the databases congurations
and data storage are usually dierent from when the plan was
built (change in architectural attributes). Also, protecting one data
store usually has a ripple eect on other data stores and related
business operations (change in conceptual attributes). For example,
suspending a data store that contains employee information for a
week would aect tasks on that data store and those dependent on
it, delaying the work and causing additional costs.
The user access count, the number of departments with access to
a data store, and the proliferation values are shown in our recom-
mendation and plan details. These are useful indicators for analysts
to qualitatively judge the impact of the recommendation. But in
the real world such impact will vary across organizations, requir-
ing user input to make estimates more accurate and explainable.
Future systems needed to 1) keep track of the changing risk situa-
tion and 2) customize recommendations and impact estimates via
conversational tools that leverage input from the parties involved
([15]).
Another limitation that inuences system trustworthiness is that
the evaluation remained at the level of the interface design and did
not include the evaluation of the recommender system with data
collected from a community of users after they have adopted the
system. Future work will be needed to deploy the proposed system
design with a broader community of users and learn from "in vivo"
user decisions as they use the system in real organizations and for
enough time.
6.3 Representativeness and Generalizability
For practical reasons, the prototype was built as an extension of an
existing data-centric security system. This introduces an obvious
bias. However, we believe our work represents a rst step towards
testing and generalizing the proposed design in the context of other
data security systems.
It is also important to consider that having a large number of
test users for evaluating a novel system design is rarely an option
in the security domain. Due to this domain-specic challenge, the
evaluation of the tool was conducted with a small sample of users
and can be considered a case study. Evaluations with larger samples
will be needed to further validate and improve the proposed system
design.
7 CONCLUSION
This paper proposed a design that uses recommendations and in-
teractive impact analysis tools for reducing the security analysts’
cognitive load and improving performance. Our target users and
proxies welcomed the separation of concerns between target se-
lection and plan building. Our evaluation conrmed the utility
of applying a mixed-initiative approach to support data protec-
tion decisions. As goals and constraints change case by case, fully
automated solutions appear less practical than mixed initiative
solutions.
We are pleased to share that some impact analysis functions
of the plan building workspace have been implemented in the
underlying system [
24
] (see Figure 14). We hope this work can
serve as a rst step toward designing and testing intelligent user
interfaces and mixed-imitative tools for the new eld of data-centric
security applications.
REFERENCES
[1]
2012. IBM QRadar. https://www.ibm.com/us-en/marketplace/ibm- qradar-siem.
(2012). Latest Release: 2017-12-13.
[2]
2017. Folk Risk Analysis: Factors Inuencing Security Analysts’ Interpreta-
tion of Risk. In Thirteenth Symposium on Usable Privacy and Security (SOUPS
2017). USENIX Association, Santa Clara, CA. https://www.usenix.org/conference/
soups2017/workshop-program/wsiw2017/mmanga
[3]
2018. Imperva SecureSphere. https://www.imperva.com/products/securesphere/.
(2018). Latest Release: 2018.
[4]
2018. Splunk Enterprise Security. https://splunkbase.splunk.com/app/263/. (2018).
Latest Release: 5.1.0 2018.
[5]
2018. Video Demo of Interactive Design Prototype. https://youtu.be/
JPx4DBJSM8g. (2018).
[6]
Nurul Hidayah Ab Rahman and Kim-Kwang Raymond Choo. 2015. A Survey
of Information Security Incident Handling in the Cloud. Comput. Secur. 49, C
(March 2015), 45–69. https://doi.org/10.1016/j.cose.2014.11.006
[7]
Chadia Abras, Diane Maloney-Krichmar, and Jenny Preece. 2004. User-centered
design. Bainbridge, W. Encyclopedia of Human-Computer Interaction. Thousand
Oaks: Sage Publications 37, 4 (2004), 445–456.
[8]
Oznur Alkan, Elizabeth M. Daly, and Inge Vejsbjerg. 2018. Opportunity Team
Builder for Sales Teams. In 23rd International Conference on Intelligent User In-
terfaces (IUI ’18). ACM, New York, NY, USA, 251–261. https://doi.org/10.1145/
3172944.3172968
[9]
SE Allianz. 2016. Allianz risk barometer: Top business risks 2015. Technical Report.
Technical report, Allianz Google Scholar.
[10]
Dustin L Arendt, Russ Burtner, Daniel M Best, Nathan D Bos, John R Gersh,
Christine D Piatko, and Celeste Lyn Paul. 2015. Ocelot: user-centered design of a
decision support visualization for network quarantine. In Visualization for Cyber
Security (VizSec), 2015 IEEE Symposium on. IEEE, 1–8.
[11]
Marc-Antoine Meunier Brian Lowans, Neil MacDonald and Brian Reed. 2017.
Predicts 2017: Application and Data Security. Technical Report. Gartner, Inc.
[12]
Carole Cadwalladr and Emma Graham-Harrison. 2018. Revealed: 50 million
Facebook proles harvested for Cambridge Analytica in major data breach. The
Guardian 17 (2018).
[13]
Li Chen and Feng Wang. 2017. Explaining Recommendations Based on Feature
Sentiments in Product Reviews. In Proceedings of the 22Nd International Conference
on Intelligent User Interfaces (IUI ’17). ACM, New York, NY, USA, 17–28. https:
//doi.org/10.1145/3025171.3025173
[14]
Kim-Kwang Raymond Choo. 2011. The cyber threat landscape: Challenges and
future research directions. Computers & Security 30, 8 (2011), 719–731.
[15]
Elizabeth F. Churchill. 2018. Designing Recommendations. Interactions 26, 1 (Dec.
2018), 24–25. https://doi.org/10.1145/3292029
[16]
Daniel Dor and Yuval Elovici. 2016. A model of the information security in-
vestment decision-making process. Computers & Security 63 (2016), 1 – 13.
https://doi.org/10.1016/j.cose.2016.09.006
[17]
Malin Eiband, Hanna Schneider, Mark Bilandzic, Julian Fazekas-Con, Mareike
Haug, and Heinrich Hussmann. 2018. Bringing Transparency Design into Practice.
In 23rd International Conference on Intelligent User Interfaces (IUI ’18). ACM, New
York, NY, USA, 211–223. https://doi.org/10.1145/3172944.3172961
[18]
ENISA. 2016. The European Union Agency for Network and Information Security
(ENISA). Technical Report. ENISA.
[19]
Paul German. 2016. Face the facts–your organisation will be breached. Network
Security 2016, 8 (2016), 9–10.
[20]
Richard Grondin and Rahul Gupta. 2017. Identifying and Securing Sensitive Data
at its Source. (May 6 2017). US patent US9785795B2, granted on 2017-10-10.
[21]
Bar Haim, Eitan Menahem, Yaron Wolfsthal, and Christopher Meenan. 2017.
Visualizing Insider Threats: An Eective Interface for Security Analytics. In
Proceedings of the 22nd International Conference on Intelligent User Interfaces
Companion. ACM, 39–42.
[22]
Mark Hall. 2016. Why people are key to cyber-security. Network Security 2016, 6
(2016), 9 – 10. https://doi.org/10.1016/S1353-4858(16)30057- 5
IUI ’19, March 17–20, 2019, Marina del Rey, CA, USA T. Li et al.
[23]
J.Todd Hamill, Richard F. Deckro, and Jack M. Kloeber. 2005. Evaluating infor-
mation assurance strategies. Decision Support Systems 39, 3 (2005), 463 – 484.
https://doi.org/10.1016/j.dss.2003.11.004
[24]
Informatica. 2015. Secure @ Source. https://www.informatica.com/products/
data-security/secure- at-source.html. (2015).
[25]
Identity Theft Resource Center (ITRC). 2017. 2017 Annual Data Breach Year-End
Review. Technical Report. Identity Theft Resource Center (ITRC).
[26]
Philip A Legg. 2015. Visualizing the insider threat: challenges and tools for
identifying malicious user activity. In Visualization for Cyber Security (VizSec),
2015 IEEE Symposium on. IEEE, 1–7.
[27]
Stephen Lineberry. 2007. The human element: The weakest link in information
security. Journal of Accountancy 204, 5 (2007), 44.
[28]
Simon Miller, Christian Wagner, Uwe Aickelin, and Jonathan M. Garibaldi. 2016.
Modelling cyber-security experts’ decision making processes using aggregation
operators. Computers & Security 62 (2016), 229 – 245. https://doi.org/10.1016/j.
cose.2016.08.001
[29]
SA O’Brien. 2017. Giant equifax data breach: 143 million people could be aected.
CNN Tech (2017).
[30]
Steve A. Purser. 2004. Improving the ROI of the security management process.
Computers & Security 23, 7 (2004), 542 – 546. https://doi.org/10.1016/j.cose.2004.
09.004
[31]
Sasha Romanosky, Alessandro Acquisti, and Richard Sharp. 2010. Data breaches
and identity theft: when is mandatory disclosure optimal? (2010).
[32]
Rachel Rue. 2007. A Framework for Classifying and Comparing Models of Cyber
Security Investment to Support Policy and Decision-Making. In WEIS.
[33]
Michael Veale, Reuben Binns, and Max Van Kleek. 2018. Some HCI Prior-
ities for GDPR-Compliant Machine Learning. CoRR abs/1803.06174 (2018).
arXiv:1803.06174 http://arxiv.org/abs/1803.06174
[34]
Jingguo Wang, Aby Chaudhury, and H. Raghav Rao. 2008. Research NoteâĂŤA
Value-at-Risk Approach to Information Security Investment. Information Sys-
tems Research 19, 1 (2008), 106–120. https://doi.org/10.1287/isre.1070.0143
arXiv:https://pubsonline.informs.org/doi/pdf/10.1287/isre.1070.0143
[35]
Matthew P Warnecke. 2013. Examining the Return on Investment of a Secu-
rity Information and Event Management Solution in a Notional Department of
Defense Network Environment. (June 2013). Master’s thesis.
[36]
Rodrigo Werlinger, Kirstie Hawkey, David Botta, and Konstantin Beznosov. 2009.
Security Practitioners in Context: Their Activities and Interactions with Other
Stakeholders Within Organizations. Int. J. Hum.-Comput. Stud. 67, 7 (July 2009),
584–606. https://doi.org/10.1016/j.ijhcs.2009.03.002
[37]
Charles Cresson Wood. 2004. Why information security is now multi-disciplinary,
multi-departmental, and multi-organizational in nature. Computer Fraud &
Security 2004, 1 (2004), 45–69. https://doi.org/10.1016/j.cose.2014.11.006
... Recommender systems are used daily by millions of people, and explanations that clarify a recommender's behavior are well-known to improve users' perceptions of the recommender's usefulness [2,4,5,13,14,29,54,55,57], controllability [2,17,29,35], trustworthiness [1,6,14,35,37,47], and transparency [2,8,17,29,33,35,37,53]. Some recommenders provide users with local explanations describing why a specific item is recommended [10,35]. ...
... Recommender systems are used daily by millions of people, and explanations that clarify a recommender's behavior are well-known to improve users' perceptions of the recommender's usefulness [2,4,5,13,14,29,54,55,57], controllability [2,17,29,35], trustworthiness [1,6,14,35,37,47], and transparency [2,8,17,29,33,35,37,53]. Some recommenders provide users with local explanations describing why a specific item is recommended [10,35]. ...
... Fifteen participants went through both the global and local conditions in randomized order, and the other 15 interacted only with the local-plus-global condition. We did not include a baseline condition (no explanation) because the importance of explanations to recommender transparency is well-established [2,8,17,29,33,35,37,53]. When signing up for the study, each participant provided two topics of interest, which would act as their feed topics. ...
Preprint
Explanations are well-known to improve recommender systems' transparency. These explanations may be local, explaining an individual recommendation, or global, explaining the recommender model in general. Despite their widespread use, there has been little investigation into the relative benefits of these two approaches. Do they provide the same benefits to users, or do they serve different purposes? We conducted a 30-participant exploratory study and a 30-participant controlled user study with a research-paper recommender system to analyze how providing participants local, global, or both explanations influences user understanding of system behavior. Our results provide evidence suggesting that both explanations are more helpful than either alone for explaining how to improve recommendations, yet both appeared less helpful than global alone for efficiency in identifying false positives and negatives. However, we note that the two explanation approaches may be better compared in the context of a higher-stakes or more opaque domain.
... In another work, [8] provides a recommender system to predict cyberattacks by identifying attack paths and demonstrates how a recommendation method can be used to classify future cyberattacks. [5] introduced an interactive user interface for security analysts that recommends what data to protect, visualizes simulated protection impact, and helps build protection plans. However, none of them supports the different characteristics of DDoS attacks while offers intuitive interfaces for users to add their demands and log files to receive recommendations. ...
Chapter
Full-text available
As the dependency of businesses on digital services increases, their vulnerability to cyberattacks increases, too. Besides providing innovative services, business owners must focus on investing in robust cybersecurity mechanisms to countermeasure cyberattacks. Distributed Denial-of-Service (DDoS) attacks remain one of the most dangerous cyberattacks, e.g., leading to service disruption, financial loss, and reputation harm. Although protection measures exist, a catalog of solutions is missing, which could help network operators to access and filter information in order to select suitable protections for specific demands.
... [13] provides a recommender system to predict cyberattacks by identifying attack paths and demonstrates how a recommendation method can be used to classify future cyber attacks. [10] introduced an interactive user interface for security analysts that recommends what data to protect, visualizes simulated protection impact, and helps build protection plans. However, these solutions do not directly address the recommendation of protection services. ...
Conference Paper
Full-text available
Cyberattacks are the cause of several damages on governments and companies in the last years. Such damage includes not only leaks of sensitive information, but also economic loss due to downtime of services. The security market size worth billions of dollars, which represents investments to acquire protection services and training response teams to operate such services, determines a considerable part of the investment in technologies around the world. Although a vast number of protection services are available, it is neither trivial for network operators nor end-users to choose one of them in order to prevent or mitigate an imminent attack. As the next-generation cybersecurity solutions are on the horizon, systems that simplify their adoption are still required in support of security management tasks. Thus, this paper introduces MENTOR, a support tool for cyber-security, focusing on the recommendation of protection services. MENTOR is able to (a) to deal with different demands from the user and (b) to recommend the adequate protection service in order to provide a proper level of cybersecurity in different scenarios. Four similarity measurements are implemented in order to prove the feasibility of the MENTOR's engine. An evaluation determines the performance and accuracy of each measurement used during the recommendation process.
Conference Paper
p>With the increase in data volume, velocity and types, intelligent human-agent systems have become popular and adopted in different application domains, including critical and sensitive areas such as health and security. Humans’ trust, their consent and receptiveness to recommendations are the main requirement for the success of such services. Recently, the demand on explaining the recommendations to humans has increased both from humans interacting with these systems so that they make an informed decision and, also, owners and systems managers to increase transparency and consequently trust and users’ retention. Existing systematic reviews in the area of explainable recommendations focused on the goal of providing explanations, their presentation and informational content. In this paper, we review the literature with a focus on two user experience facets of explanations; delivery methods and modalities. We then focus on the risks of explanation both on user experience and their decision making. Our review revealed that explanations delivery to end-users is mostly designed to be along with the recommendation in a push and pull styles while archiving explanations for later accountability and traceability is still limited. We also found that the emphasis was mainly on the benefits of recommendations while risks and potential concerns, such as over-reliance on machines, is still a new area to explore.</p
This book constitutes the refereed proceedings of the 17th International Conference on Economics of Grids, Clouds, Systems, and Services, GECON 2020, held in Izola, Slovenia, in September 2020. Due to COVID-19 pandemic the conference was held virtually by the University of Ljubljana. The 11 full papers and 9 short papers presented in this book were carefully reviewed and selected from 40 submissions. The papers are structured in selected topics, namely: Smartness in Distributed Systems; Decentralizing Clouds to Deliver Intelligence at the Edge; Digital Infrastructures for Pandemic Response and Countermeasures; Dependability and Sustainability; Economic Computing and Storage; Poster Session.
Chapter
With the increase in data volume, velocity and types, intelligent human-agent systems have become popular and adopted in different application domains, including critical and sensitive areas such as health and security. Humans’ trust, their consent and receptiveness to recommendations are the main requirement for the success of such services. Recently, the demand on explaining the recommendations to humans has increased both from humans interacting with these systems so that they make an informed decision and, also, owners and systems managers to increase transparency and consequently trust and users’ retention. Existing systematic reviews in the area of explainable recommendations focused on the goal of providing explanations, their presentation and informational content. In this paper, we review the literature with a focus on two user experience facets of explanations; delivery methods and modalities. We then focus on the risks of explanation both on user experience and their decision making. Our review revealed that explanations delivery to end-users is mostly designed to be along with the recommendation in a push and pull styles while archiving explanations for later accountability and traceability is still limited. We also found that the emphasis was mainly on the benefits of recommendations while risks and potential concerns, such as over-reliance on machines, is still a new area to explore.
Preprint
Full-text available
With the increase in data volume, velocity and types, intelligent human-agent systems have become popular and adopted in different application domains, including critical and sensitive areas such as health and security. Humans' trust, their consent and receptiveness to recommendations are the main requirement for the success of such services. Recently, the demand on explaining the recommendations to humans has increased both from humans interacting with these systems so that they make an informed decision and, also, owners and systems managers to increase transparency and consequently trust and users' retention. Existing systematic reviews in the area of explainable recommendations focused on the goal of providing explanations, their presentation and informational content. In this paper, we review the literature with a focus on two user experience facets of explanations; delivery methods and modalities. We then focus on the risks of explanation both on user experience and their decision making. Our review revealed that explanations delivery to end-users is mostly designed to be along with the recommendation in a push and pull styles while archiving explanations for later accountability and traceability is still limited. We also found that the emphasis was mainly on the benefits of recommendations while risks and potential concerns, such as over-reliance on machines, is still a new area to explore.
Conference Paper
This demo shows a recommender bar and a planning workspace that augment an existing system. The design addresses two challenges for analysts doing proactive data protection: 1) the information overload from the many data repositories and protection techniques to consider; 2) the optimization of protection plans given multiple competing factors.
Article
Full-text available
In this short paper, we consider the roles of HCI in enabling the better governance of consequential machine learning systems using the rights and obligations laid out in the recent 2016 EU General Data Protection Regulation (GDPR)---a law which involves heavy interaction with people and systems. Focussing on those areas that relate to algorithmic systems in society, we propose roles for HCI in legal contexts in relation to fairness, bias and discrimination; data protection by design; data protection impact assessments; transparency and explanations; the mitigation and understanding of automation bias; and the communication of envisaged consequences of processing.
Conference Paper
Full-text available
Sellers work together as a team on sales opportunities, using their expertise in different roles to increase the probability of a win. These roles include managing the relationship with the client, overall architecture support or deep knowledge of a particular product depending on the seller's expertise, and the current opportunity requirements. Forming the right team for an incoming opportunity is vital and depends on several factors including understanding the required roles for the opportunity and assigning the right person to fulfill these roles, taking into consideration the seller's social network. In this paper, we present the Opportunity Team Builder solution, which supports sellers in this work by dividing the process into the following sub-tasks; identifying the required roles for the opportunity based on the products that the client is interested in, recommending the best people to fulfill these roles, and providing a win probability figure to guide users in team formation. This supports the sellers in forming the bestfitting team for current opportunity dynamics. Each task in the solution is implemented as a model using historical data from previous sales opportunities. Models work in coordination with each other to ultimately maximize the probability of win over loss. The solution not only recommends the best person to join a team taking into account a combination of inferred skills and social relationships, but also the predicted impact the person can have on the overall performance of the team. We present how the whole solution is realized with an intelligent user interface enabling interaction with the user throughout the team formation process. Substantial experiments with real world data show that win/loss prediction is performed accurately and the Opportunity Team Builder solution can recommend teams that achieve a higher win probability.
Conference Paper
Full-text available
Intelligent systems, which are on their way to becoming mainstream in everyday products, make recommendations and decisions for users based on complex computations. Researchers and policy makers increasingly raise concerns regarding the lack of transparency and comprehensibility of these computations from the user perspective. Our aim is to advance existing UI guidelines for more transparency in complex real-world design scenarios involving multiple stakeholders. To this end, we contribute a stage-based participatory process for designing transparent interfaces incorporating perspectives of users, designers, and providers, which we developed and validated with a commercial intelligent fitness coach. With our work, we hope to provide guidance to practitioners and to pave the way for a pragmatic approach to transparency in intelligent systems.
Conference Paper
Full-text available
With the ever-growing volume of cyber-attacks on organizations, security analysts require effective visual interfaces and interaction techniques to detect security breaches and, equally importantly, to efficiently share threat information. To support this need, we present a tool called ?User Behavior Analytics? (UBA) that conducts continuous analysis of individuals' usage of their organizational IT networks, and effectively visualizes the associated security exposures of the organization. The UBA tool was developed as an extension of IBM?s security analytics environment, and incorporates a risk-focused dashboard that highlights anomalous user behaviors and the aggregated risk levels associated with individual users, user groups, and overall system security state. Moreover, the tool?s dashboard has been designed to facilitate rapid review of security incidents and correlate them with data from various sources such as user directory and HR systems. In doing so, the tool presents busy security analysts with an effective means to visually identify and respond to cyber threats on the organization's crown jewels.
Conference Paper
The explanation interface has been recognized important in recommender systems as it can help users evaluate recommendations in a more informed way for deciding which ones are relevant to their interests. In different decision environments, the specific aim of explanation can be different. In high-investment product domains (e.g., digital cameras, laptops) for which users usually attempt to avoid financial risk, how to support users to construct stable preferences and make better decisions is particularly crucial. In this paper, we propose a novel explanation interface that emphasizes explaining the tradeoff properties within a set of recommendations in terms of both their static specifications and feature sentiments extracted from product reviews. The objective is to assist users in more effectively exploring and understanding product space, and being able to better formulate their preferences for products by learning from other customers' experiences. Through two user studies (in form of both before-after and within-subjects experiments), we empirically identify the practical role of feature sentiments in combination with static specifications in producing tradeoff-oriented explanations. Specifically, we find that our explanation interface can be more effective to increase users' product knowledge, preference certainty, perceived information usefulness, recommendation transparency and quality, and purchase intention.
Article
Following recent developments affecting the information security threat landscape, information security has become a complex managerial issue. Using grounded theory, we present a conceptual model that reflects the most up-to-date decision-making practices regarding information security investment in organizations for several industries. The framework described in this article generalizes the current decision-making processes, while taking into consideration that organizations may differ in many respects, including: the stakeholder that administers the information security budget, the Chief Information Security Officer's (CISO) role in the organization, the organization's industry sector, the organizational structure, and so on. Our findings indicate that the information security investment decision-making process contains 14 phases and 16 concepts that affect and are affected by these phases. The study shows that the decision-making process is heavily biased by different organizational and psychological factors. The conceptual model derived can assist decision makers/stakeholders in performing, reviewing, and manipulating the decision-making process in their organizations. It can also assist vendors and consultants in understanding and prioritizing various aspects of their sales cycle.
Article
It's simple: cyber-security isn't working. Too many companies are being breached and governments globally are recognising the need to invest heavily to protect vital services and infrastructure. When the most fiscally prudent UK Chancellor of the Exchequer in a generation stumps up an additional £1.9bn to combat the cyber-security threat, reality starts to bite – organisations of every size need to do more to safeguard operations. Today's defence in depth security models are naïve. The reliance on traditional access control, threat detection and threat protection is clearly inadequate. It is only by recognising that a breach has already occurred and containing that breach within a defined and secure segment that an organisation can avoid damaging system-wide breaches. Paul German of Certes Networks insists it is time to add another layer of defence – breach containment.