K. Selcuk CandanArizona State University | ASU · CIDSE
K. Selcuk Candan
PhD
About
383
Publications
32,900
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,655
Citations
Introduction
Additional affiliations
August 1997 - present
Publications
Publications (383)
While witnessing the exceptional success of machine learning (ML) technologies in many applications, users are starting to notice a critical shortcoming of ML: correlation is a poor substitute for causation. The conventional way to discover causal relationships is to use randomized controlled experiments (RCT); in many situations, however, these ar...
Many socio-economical critical domains (such as sustainability, public health, and disasters) are characterized by highly complex and dynamic systems, requiring data and model-driven simulations to support decision-making. Due to a large number of unknowns, decision-makers usually need to generate ensembles of stochastic scenarios, requiring hundre...
Building automatic fault detection and diagnosis (AFDD) technologies have shown great potential for energy savings. Literature on building AFDD research mainly focuses on traditional data available from building automated systems (BAS) or one-time measurements. In this research, we investigate the capability of acoustic emission (AE), a non-traditi...
The increasing use of remote or mobile access, integrated wearable technologies, data exchange, and cloud-based data analytics in modern smart buildings is steering the building industry towards open communication technologies. The increased connectivity and accessibility could lead to more cyber-attacks in smart buildings. On the other hand, physi...
Successfully tackling many urgent challenges in socio-economically critical domains, such as public health and sustainability, requires a deeper understanding of causal relationships and interactions among a diverse spectrum of spatio-temporally distributed entities. In these applications, the ability to leverage spatio-temporal data to obtain caus...
Wetlands are important to communities, offering benefits ranging from water purification, and flood protection to recreation and tourism. Therefore, identifying and prioritizing potential wetland areas is a critical decision problem. While data-driven solutions are feasible, this is complicated by significant data sparsity due to the low proportion...
Deep learning has been applied successfully in sequence understanding and translation problems, especially in univariate, unimodal contexts, where large number of supervision data are available. The effectiveness of deep learning in more complex (multi-modal, multi-variate) contexts, where supervision data is rare, however, is generally not satisfa...
Purpose of Review
Preparing for pandemics requires a degree of interdisciplinary work that is challenging under the current paradigm. This review summarizes the challenges faced by the field of pandemic science and proposes how to address them.
Recent Findings
The structure of current siloed systems of research organizations hinders effective inte...
Discovering causal relationships in complex socio-behavioral systems is challenging but essential for informed decision-making. We present Upload, PREprocess, Visualize, and Evaluate (UPREVE), a user-friendly web-based graphical user interface (GUI) designed to simplify the process of causal discovery. UPREVE allows users to run multiple algorithms...
Streamflow prediction is a complex problem due to large uncertainties in flow model parameters ➢ Data with high granularity for streamflow prediction is not readily available ➢ To obtain more accurate prediction rate, we use Causal Discovery with Reinforcement Learning for streamflow prediction on large datasets ➢ Physics-based simulations (CaMa-Fl...
Modern Building Automation Systems (BASs), as the brain that enable the smartness of a smart building, often require increased connectivity both among system components as well as with outside entities, such as the cloud, to enable low-cost remote management, optimized automation via outsourced cloud analytics, and increased building-grid integrati...
Online user engagement is highly influenced by various machine learning models such as recommender systems. These systems recommend new items to the user based on the user’s historical interactions. Implicit recommender systems reflect a binary setting showing whether a user interacted (e.g., clicked on) with an item or not. However, the observed c...
The convenient access to copious multi-faceted data has encouraged machine learning researchers to reconsider correlation-based learning and embrace the opportunity of causality-based learning, i.e., causal machine learning (causal learning). Recent years have therefore witnessed great effort in developing causal learning algorithms aiming to help...
Most neural network-based classifiers extract features using several hidden layers and make predictions at the output layer by utilizing these extracted features. We observe that not all features are equally pronounced in all classes; we call such features class-specific features. Existing models do not fully utilize the class-specific differences...
Sampling is a technique to help identify a representative data subset that captures the characteristics of the whole dataset. Most existing sampling algorithms require distribution assumptions of the multivariate data, which may not be available beforehand. This study proposes a new metric called Eigen-Entropy (EE), which is based on information en...
Machine learning models have gained widespread success, from healthcare to personalized recommendations. One of the preliminary assumptions of these models is the independent and identical distribution. Therefore, the train and test data are sampled from the same observation per this assumption. However, this assumption seldom holds in the real wor...
Recommender systems suffer from biases that may misguide the system when learning user preferences. Under the causal lens, the user’s exposure to items can be seen as the treatment assignment, the ratings of the items are the observed outcome, and the different biases act as confounding factors. Therefore, to infer debiased preferences and to captu...
The demand for searching, querying multimedia data such as image, video and audio is omnipresent, how to effectively access data for various applications is a critical task. Nevertheless, these data usually are encoded as multi-dimensional arrays, or tensor, and traditional data mining techniques might be limited due to the curse of dimensionality....
COVID-19 outbreak was declared a pandemic by the World Health Organization (WHO) on March 11, 2020. To minimize casualties and the impact on the economy, various mitigation measures have being employed with the purpose to slow the spread of the infection, such as complete lockdown, social distancing, and random testing. The key contribution of this...
Increasing advancements in building digitization, smart sensing, and metering technologies have allowed large amount of data to be collected and saved for monitoring, analyzing, and controlling building systems. However, due to sensors or communications failure, the data collected are often incomplete and poor in quality. Data imputation approaches...
The 15 th ACM International Conference on Web Search and Data Mining (WSDM 2022) was successfully held on 2/21 - 2/25/2022 online. It was originally planned to be held in Tempe, Arizona and changed to a virtual format due to the surge of Omicron. It was a five-day event with the middle 3 days dedicated to keynote speeches, single-session presentati...
Online review systems are the primary means through which many businesses seek to build the brand and spread their messages. Prior research studying the effects of online reviews has been mainly focused on a single numerical cause, e.g., ratings or sentiment scores. We argue that such notions of causes entail three key limitations: they solely cons...
Building automatic fault detection and diagnosis (AFDD) technologies have shown great potential for energy savings. To enable AFDD,
a baseline depicting the normal operation mode is needed to detect whether the building operation deviates from normality. Existing
research using physics-based knowledge and models for AFDD has mainly taken a trial-an...
Recommender systems aim to recommend new items to users by learning user and item representations. In practice, these representations are highly entangled as they consist of information about multiple factors, including user's interests, item attributes along with confounding factors such as user conformity, and item popularity. Considering these e...
The convenient access to copious multi-faceted data has encouraged machine learning researchers to reconsider correlation-based learning and embrace the opportunity of causality-based learning, i.e., causal machine learning (causal learning). Recent years have therefore witnessed great effort in developing causal learning algorithms aiming to help...
Literature on building Automatic Fault Detection and Diagnosis (AFDD) mainly focuses on simulated system data due to high expenses and difficulties of obtaining and analyzing real building data. There is a lack of validation on performances and scalabilities of data-driven AFDD approaches using simulated data and how it compares to that from real b...
In many fields of computer science, tensor decomposition techniques are increasingly being adopted as the core of many applications that rely on multi-dimensional datasets for implementing knowledge discovery tasks. Unfortunately, a major shortcoming of state-of-the-art tensor analyses is that, despite their effectiveness when the data is certain,...
This is a report on the eleventh edition of the Conference and Labs of the Evaluation Forum (CLEF 2021), (virtually) held on September 21--24, 2021, in Bucharest, Romania. CLEF was a four day event combining a Conference and an Evaluation Forum. The Conference featured keynotes by Naila Murray and Mark Sanderson, and presentation of peer reviewed r...
A common challenge in multimedia data understanding is the unsupervised discovery of recurring patterns, or motifs, in time series data. The discovery of motifs in uni-variate time series is a well studied problem and, while being a relatively new area of research, there are also several proposals for multi-variate motif discovery. Unfortunately, m...
Online review systems are the primary means through which many businesses seek to build the brand and spread their messages. Prior research studying the effects of online reviews has been mainly focused on a single numerical cause, e.g., ratings or sentiment scores. We argue that such notions of causes entail three key limitations: they solely cons...
Online review systems are the primary means through which many businesses seek to build the brand and spread their messages. Prior research studying the effects of online reviews has been mainly focused on a single numerical cause, e.g., ratings or sentiment scores. We argue that such notions of causes entail three key limitations: they solely cons...
Naïve extensions of uni-variate prediction techniques lead to an unwelcome increase in the cost of multi-variate model learning and significant deteriorations in the model performance. In this paper, we first argue that (a) one can learn a more accurate forecasting model by leveraging temporal alignments among variates to quantify the importance of...
Urban systems are characterized by complexity and dynamicity. Data-driven simulations represent a promising approach in understanding and predicting complex dynamic processes in the presence of shifting demands of urban systems. Yet, today’s silo-based, de-coupled simulation engines fail to provide an end-to-end view of the complex urban system, pr...
Factorization Machines (FMs) enhance an underlying linear regression or classification model by capturing feature interactions. Intuitively, FMs warp the feature space to help capture the underlying non-linear structure of the machine learning task. In this paper, we propose novel Doubly-Warped Factorization Machines (or \(\mathtt{W2FM}\)s) that le...
This book constitutes the refereed proceedings of the 12th International Conference of the CLEF Association, CLEF 2021, held virtually in September 2021.
The conference has a clear focus on experimental information retrieval with special attention to the challenges of multimodality, multilinguality, and interactive search ranging from unstructured...
The demand for searching, querying multimedia data such as image, video and audio is omnipresent, how to effectively access data for various applications is a critical task. Nevertheless, these data usually are encoded as multi-dimensional arrays, or Tensor, and traditional data mining techniques might be limited due to the curse of dimensionality....
Tensor decomposition is a multi-modal dimensionality reduction technique to support similarity search and retrieval. Yet, the decomposition process itself is expensive and subject to dimensionality curse. Tensor train decomposition is designed to avoid the explosion of intermediary data, which plagues other tensor decomposition techniques. However,...
Networked observational data presents new opportunities for learning individual causal effects, which plays an indispensable role in decision making. Such data poses the challenge of confounding bias. Previous work presents two desiderata to handle confounding bias. On the treatment group level, we aim to balance the distributions of confounder rep...
The recent success in machine learning (ML) has led to a massive emergence of AI applications and the increases in expectations for AI systems to achieve human-level intelligence. Nevertheless, these expectations have met with multi-faceted obstacles. One major obstacle is ML aims to predict future observations given real-world data dependencies wh...
Social media has become an indispensable tool in the face of natural disasters due to its broad appeal and ability to quickly disseminate information. For instance, Twitter is an important source for disaster responders to search for (1) topics that have been identified as being of particular interest over time, i.e., common topics such as “disaste...
This study explores self-reports of 241 older adults (aged 63–95) regarding loneliness and social disconnectedness, and the potential for information and communication technologies (ICT) and ride-hailing services to mitigate these phenomena. The samples are drawn from four older adult living communities in Maricopa County, Arizona. Lonelier older a...
Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant r...
The recent success in machine learning (ML) has led to a massive emergence of AI applications and the increases in expectations for AI systems to achieve human-level intelligence. Nevertheless, these expectations have met with multi-faceted obstacles. One major obstacle is ML aims to predict future observations given real-world data dependencies wh...
Today, most information sources provide factual, objective knowledge, but they fail to capture personalized contextual knowledge which could be used to enrich the available factual data and contribute to their interpretation, in the context of the knowledge of the user who queries the system. This would require a knowledge framework which can accom...
Real-time decision making has acquired increasing interest as a means to efficiently operating complex systems. The main challenge in achieving real-time decision making is to understand how to develop next generation optimization procedures that can work efficiently using: (i) real data coming from a large complex dynamical system, (ii) simulation...
With many applications relying on multi-dimensional datasets for decision making, matrix factorization (or decomposition) is becoming the basis for many knowledge discovery and machine learning tasks, from clustering, trend detection, anomaly detection, to correlation analysis. Unfortunately, a major shortcoming of matrix analysis operations is tha...
Despite their impressive success when these hyper-parameters are suitably fine-tuned, the design of good network architectures remains an art-form rather than a science: while various search techniques, such as grid-search, have been proposed to find effective hyper-parameter configurations, often these parameters are hand-crafted (or the bounds of...
Tensors are commonly used for representing multi-modal data, such as Web graphs, sensor streams, and social networks. As a consequence of this, tensor-based algorithms, most notably tensor decomposition, are becoming a core tool for data analysis and knowledge discovery, including clustering. Intuitively, tensor decomposition process generalizes ma...
Efficient implementations of range and nearest neighbor queries are critical in many large multimedia applications. Locality Sensitive Hashing (LSH) is a popular technique for performing approximate searches in high-dimensional multimedia, such as image or sensory data. Often times, these multimedia data are represented as a collection of important...
Wireless sensor networks and other power-efficient devices fill increasingly important roles in modern society. At the same time, they also face increasing internal and external threats, such as node capture or protocol disruption by adversarial agents. Providing reliable and secure service in the face of these challenges remains an ongoing problem...
Data- and model-driven computer simulations are increasingly critical in many application domains. Yet, several critical data challenges remain in obtaining and leveraging simulations in decision making. Simulations may track 100s of parameters, spanning multiple layers and spatial-temporal frames, affected by complex inter-dependent dynamic proces...
Many distributed systems assume participants are both performant and secure, characteristics offered by many cloud-based systems. However, scaling distributed techniques down to highly resource-or power-constrained contexts may require alternative approaches. One such context is the deployment of ad hoc distributed systems in insecure or uncontroll...
Dynamic topic models (DTM) are commonly used for mining latent topics in evolving web corpora. In this paper, we note that a major limitation of the conventional DTM based models is that they assume a predetermined and fixed scale of topics. In reality, however, topics may have varying spans and topics of multiple scales can co-exist in a single we...
Many applications generate and/or consume multi-variate temporal data, and experts often lack the means to adequately and systematically search for and interpret multi-variate observations. In this article, we first observe that multi-variate time series often carry localized multi-variate temporal features that are robust against noise. We then ar...
Addressing archaeology's most compelling substantive challenges requires synthetic research that exploits the large and rapidly expanding corpus of systematically collected archaeological data. That, in turn, requires a means of combining datasets that employ different systematics in their recording while at the same time preserving the semantics o...
Measures of node ranking, such as personalized PageRank, are utilized in many web and social-network based prediction and recommendation applications. Despite their effectiveness when the underlying graph is certain, however, these measures become difficult to apply in the presence of uncertainties, as they are not designed for graphs that include...