
David C. AnastasiuSanta Clara University | SCU · Department of Computer Engineering
David C. Anastasiu
Ph.D.
About
67
Publications
18,446
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,736
Citations
Introduction
My research interests fall broadly at the intersection of machine learning, data mining, computational genomics, and high performance computing. My projects focus on algorithmic design for machine learning problems with real-world applications and impact, especially those with unconventional inputs, such as sparse data, sets of multivariate time series, and video streams.
Additional affiliations
September 2019 - present
Publications
Publications (67)
Low-latency inference for machine learning models is increasingly becoming a necessary requirement, as these models are used in mission-critical applications such as autonomous driving, military defense (e.g., target recognition), and network traffic analysis. A widely studied and used technique to overcome this challenge is to offload some or all...
In the hydrology field, time series forecasting is crucial for efficient water resource management, improving flood and drought control and increasing the safety and quality of life for the general population. However, predicting long-term streamflow is a complex task due to the presence of extreme events. It requires the capture of long-range depe...
The number of people diagnosed with advanced stages of kidney disease have been rising every year. Early detection and constant monitoring are the only minimally invasive means to prevent severe kidney damage or kidney failure. We propose a cost-effective machine learning-based testing system that can facilitate inexpensive yet accurate kidney heal...
Accurate time series forecasting is critical in a variety of fields, including transportation, weather prediction, energy management, infrastructure monitoring, and finance. Forecasting highly skewed and heavy-tailed time series, particularly in multivariate environments, is still difficult. In these cases, accurately capturing the relationships be...
Thrombin is a key enzyme involved in the development and progression of many cardiovascular diseases. Direct thrombin inhibitors (DTIs), with their minimum off-target effects and immediacy of action, have greatly improved the treatment of these diseases. However, the risk of bleeding, pharmacokinetic issues, and thrombotic complications remain majo...
Forecasting time series with extreme events has been a challenging and prevalent research topic, especially when the time series data are affected by complicated uncertain factors, such as is the case in hydrologic prediction. Diverse traditional and deep learning models have been applied to discover the nonlinear relationships and recognize the co...
With the aim of analyzing large-sized multidimensional single-cell datasets, we are describing a method for Cosine-based Tanimoto similarity-refined graph for community detection using Leiden’s algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN)...
The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential. The 2023 challenge had five tracks, which drew a record-breaking number of participation requests from 508 teams across...
Forecasting time series with extreme events has been a challenging and prevalent research topic, especially when the time series data are affected by complicated uncertain factors, such as is the case in hydrologic prediction. Diverse traditional and deep learning models have been applied to discover the nonlinear relationships and recognize the co...
With the aim of analyzing large-sized multidimensional single-cell datasets, we are describing our method for Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN...
This article presents a synthetic distracted driving (SynDD1) dataset for machine learning models to detect and analyze drivers' various distracted behavior and different gaze zones. We collected the data in a stationary vehicle using three in-vehicle cameras positioned at locations: on the dashboard, near the rearview mirror, and on the top right-...
We introduce the Competitive Learning Platform (CLP), an online continuous improvement tool that provides automatic partial performance feedback to students or groups of students on individual or collaborative assignments. CLP motivates students to do their best and come up with new solutions that can lead to improved assignment results before the...
The 6th edition of the AI City Challenge specifically focuses on problems in two domains where there is tremendous unlocked potential at the intersection of computer vision and artificial intelligence: Intelligent Traffic Systems (ITS), and brick and mortar retail businesses. The four challenge tracks of the 2022 AI City Challenge received particip...
This article presents a synthetic dataset for machine learning models to detect and analyze drivers' various distracted behavior and different gaze zones. We collected the data in a stationary vehicle using three in-vehicle cameras positioned at locations: on the dashboard, near the rearview mirror, and on the top right-side window corner. The data...
A majority of microbial infections are associated with biofilms. Targeting biofilms is considered an effective strategy to limit microbial virulence while minimizing the development of antibiotic resistance. Toward this need, antibiofilm peptides are an attractive arsenal since they are bestowed with properties orthogonal to small molecule drugs. I...
A majority of microbial infections are associated with biofilms. Targeting biofilms is considered an effective strategy to limit microbial virulence while minimizing the development of antibiotic resistance. Towards this need, antibiofilm peptides are an attractive arsenal since they are bestowed with properties orthogonal to small molecule drugs....
The AI City Challenge was created with two goals in mind: (1) pushing the boundaries of research and development in intelligent video analysis for smarter cities use cases, and (2) assessing tasks where the level of performance is enough to cause real-world adoption. Transportation is a segment ripe for such adoption. The fifth AI City Challenge at...
With the advent of accurate deep learning-based object detection methods, it is now possible to employ prevalent city-wide traffic and intersection cameras to derive actionable insights for improving traffic, road infrastructure, and transit. A crucial tool in signal timing planning is capturing accurate movement- and class-specific vehicle counts....
The AI City Challenge was created to accelerate intelligent video analysis that helps make cities smarter and safer. Transportation is one of the largest segments that can benefit from actionable insights derived from data captured by sensors, where computer vision and deep learning have shown promise in achieving large-scale practical deployment....
Autism spectrum disorders (ASDs) are a group of conditions characterized by impairments in reciprocal social interaction and by the presence of restricted and repetitive behaviors. Current ASD detection mechanisms are either subjective (survey-based) or focus only on responses to a single stimulus. In this work, we develop machine learning methods...
Finding nearest neighbors is an important topic that has attracted much attention over the years and has applications in many fields, such as market basket analysis, plagiarism and anomaly detection, community detection, ligand-based virtual screening, etc. As data are easier and easier to collect, finding neighbors has become a potential bottlenec...
Urban traffic optimization using traffic cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between...
Given the great amounts of data being transmitted between devices in the 21st century, existing channels of wireless communication are getting congested. In the wireless space, the focus up to now has been on the microwave frequency range. An alternative for high-speed medium- and long-range communication is the millimeter wave spectrum, which is m...
The nearest neighbor graph is an important structure in many data mining methods for clustering, advertising, recommender systems, and outlier detection. Constructing the graph requires computing up to n2 similarities for a set of n objects. This high complexity has led researchers to seek approximate methods, which find many but not all of the nea...
Kosinus-Ähnlichkeitsgraphenerstellung, oder All-Pairs-Ähnlichkeitssuche, ist ein wichtiger Systemkern vieler Methoden der Datengewinnung und des maschinellen Lernens. Die Graphenerstellung ist eine schwierige Aufgabe. Bis zu n2 Objektpaare sollten intuitiv verglichen werden, um das Problem für eine Reihe von n Objekten zu lösen. Für große Objektrei...
Communication is paramount, especially during a natural disaster or other emergency. Even when traditional lines of communication become unavailable, emergency response teams must be able to communicate with each other and the outside world. To facilitate this need, major cities across the United States are deploying wireless emergency networks (WE...
Tanimoto, or extended Jaccard, is an important similarity measure which has seen prominent use in fields such as data mining and chemoinformatics. Many of the existing state-of-the-art methods for market basket analysis, plagiarism and anomaly detection, compound database search, and ligand-based virtual screening rely heavily on identifying Tanimo...
The k-nearest neighbor graph is an important structure in many data mining methods for clustering, advertising, recommender systems, and outlier detection. Constructing the graph requires computing up to n 2 similarities for a set of n objects. This high complexity has led researchers to seek approximate methods, which find many but not all of the...
Tanimoto, or (extended) Jaccard, is an important similarity measure which has seen prominent use in fields such as data mining and chemoinformatics. Many of the existing state-of-the-art methods for market-basket analysis, plagiarism and anomaly detection, compound database search, and ligand-based virtual screening rely heavily on identifying Tani...
Recommender systems are ubiquitous in today's marketplace and have great commercial importance, as evidenced by the large number of companies that sell recommender systems solutions. Successful recommender systems use past product purchase and satisfaction data to make high quality personalized recommendations. The vast amounts of data available to...
Solving the AllPairs similarity search problem entails finding all pairs of vectors in a high dimensional sparse dataset that have a similarity value higher than a given threshold. The output form this problem is a crucial component in many real-world applications, such as clustering, online advertising, recommender systems, near-duplicate document...
The k-nearest neighbor graph is often used as a building block in information retrieval, clustering, online advertising, and recommender systems algorithms. The complexity of constructing the exact k-nearest neighbor graph is quadratic on the number of objects that are compared, and most existing methods solve the problem approximately. We present...
The k-nearest neighbor graph is often used as a building block in information retrieval, clustering, online advertising, and recom-mender systems algorithms. The complexity of constructing the exact k-nearest neighbor graph is quadratic on the number of objects that are compared, and most existing methods solve the problem approximately. We present...
The proliferation of computing devices in recent years has dramatically changed the way people work, play, com-municate, and access information. The personal computer (PC) now has to compete with smartphones, tablets, and other devices for tasks it used to be the default device for. Understanding how PC usage evolves over time can help provide the...
Frequent pattern mining is an essential data mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called " Big Data " . Scalable parallel algorithms hold...
The All-Pairs similarity search, or self-similarity join problem, finds all pairs of vectors in a high dimensional sparse dataset with a similarity value higher than a given threshold. The problem has been classically solved using a dynamically built inverted index. The search time is reduced by early pruning of candidates using size and value-base...
The authors investigated the use of microblogs - or weibos - and related censorship practices using 111 million microblogs collected between 1 January and 30 June 2012. Using a matched case-control study design helped researchers determine a list of Chinese terms that discriminate censored and uncensored posts written by the same microbloggers. Thi...
In a world flooded with information, document clustering is an important tool that can help categorize and extract insight from text collections. It works by grouping similar documents, while simultaneously discriminating between groups. In this article, we provide a brief overview of the principal techniques used to cluster documents and introduce...
How to organize and present search results plays a critical role in the utility of search engines. Due to the unprecedented scale of the Web and diversity of search results, the common strategy of ranked lists has become increasingly inadequate, and clustering has been considered as a promising alternative. Clustering divides a long list of dispara...
How to organize and present search results plays a critical role in the utility of search engines. Due to the unprecedented scale of the Web and diversity of search results, the common strategy of ranked lists has become increasingly inadequate, and clustering has been considered as a promising alternative. Clustering divides a long list of dispara...
Analysts are overwhelmed with information. They have large archives of historical data, both structured and unstructured, and continuous streams of relevant messages and documents that they need to match to current tasks, digest, and incorporate into their analysis. The purpose of the READ project is to develop technologies to make it easier to cat...
Precision-oriented search results such as those typically returned by the major search engines are vulnerable to issues of polysemy. When the same term refers to different things, the dominant sense is preferred in the rankings of search results. In this paper, we propose a novel two-box technique in the context of Web search that utilizes contextu...
Precision-oriented search results such as those typically returned by the major search engines are vulnerable to issues of polysemy. When the same term refers to different things, the dominant sense is preferred in the rankings of search results. In this paper, we propose a novel technique in the context of web search that utilizes contextual terms...
We introduce and theoretically study the Gardener's problem that well models many web information monitoring scenarios, where numerous dynamically changing web sources are monitored and local information needs to be periodically updated under communication and computation capacity constraints. Typical such examples include maintenance of inverted i...