Carlos Baquero’s research while affiliated with University of Porto and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (148)


CRDT-Based Game State Synchronization in Peer-to-Peer VR
  • Conference Paper

April 2025

·

1 Read

Abel Dantas

·

Carlos Baquero

Figure 1. A snapshot of BrickSync in action, showcasing two users collaboratively manipulating virtual objects in a shared VR environment. Real-time synchronization of object states is achieved using CRDTs over a P2P WebRTC connection.
Figure 2. A Unity GameObject Transform component. Rotation is non-commutative: the order of rotations affects the final orientation. Rotation is a quaternion (4D space to prevent gimbal lock) represented in Euler angles on the UI.
Figure 4. Impact of Connection Type and Architecture on Latency. The x axis displays different network configurations groups that show the different architectures, as expected, P2P is much faster.
Figure 6. Distribution of latency results based on underlying architecture.
Figure 8. Average Latency Across Connection Types: PC to PC LAN shows the lowest latency, while Quest to Quest Hotspot exhibits the highest.

+3

CRDT-Based Game State Synchronization in Peer-to-Peer VR
  • Preprint
  • File available

March 2025

·

6 Reads

Virtual presence demands ultra-low latency, a factor that centralized architectures, by their nature, cannot minimize. Local peer-to-peer architectures offer a compelling alternative, but also pose unique challenges in terms of network infrastructure. This paper introduces a prototype leveraging Conflict-Free Replicated Data Types (CRDTs) to enable real-time collaboration in a shared virtual environment. Using this prototype, we investigate latency, synchronization, and the challenges of decentralized coordination in dynamic non-Byzantine contexts. We aim to question prevailing assumptions about decentralized architectures and explore the practical potential of P2P in advancing virtual presence. This work challenges the constraints of mediated networks and highlights the potential of decentralized architectures to redefine collaboration and interaction in digital spaces.

Download

figure 3. Scatterplot of the relation of price and carat in a hexagon bin count format, with the number of bins set to 50.
figure A2. Mean absolute difference between R glm function and GLM Distributed algorithm version with the number of observations set to 10000 or 100000 and the number of predictors to 1, 3, or 5 along 100 replicas for the number of virtual nodes from 10 to 100 in increments of 5
Distributed linear regression with 5 simulated nodes: mean absolute differences for 100 replicas, with varying number of observations and predictors.
Distributed generalized linear regression with 5 simulated nodes: mean absolute errors for 100 replicas, with varying numbers of observations and predictors.
Coefficients estimates for the credit card dataset shared by centralized and distributed approaches.
Distributed Generalized Linear Models: A Privacy-Preserving Approach

March 2025

·

1 Read

This paper presents a novel approach to classical linear regression, enabling model computation from data streams or in a distributed setting while preserving data privacy in federated environments. We extend this framework to generalized linear models (GLMs), ensuring scalability and adaptability to diverse data distributions while maintaining privacy-preserving properties. To assess the effectiveness of our approach, we conduct numerical studies on both simulated and real datasets, comparing our method with conventional maximum likelihood estimation for GLMs using iteratively reweighted least squares. Our results demonstrate the advantages of the proposed method in distributed and federated settings.


Social Compliance with NPIs, Mobility Patterns, and Reproduction Number: Lessons from COVID-19 in Europe

January 2025

·

13 Reads

·

·

Carlos Baquero

·

[...]

·

Non-pharmaceutical interventions (NPIs), including measures such as lockdowns, travel limitations, and social distancing mandates, play a critical role in shaping human mobility, which subsequently influences the spread of infectious diseases. Using COVID-19 as a case study, this research examines the relationship between restrictions, mobility patterns, and the disease's effective reproduction number (Rt) across 13 European countries. Employing clustering techniques, we uncover distinct national patterns, highlighting differences in social compliance between Northern and Southern Europe. While restrictions strongly correlate with mobility reductions, the relationship between mobility and Rt is more nuanced, driven primarily by the nature of social interactions rather than mere compliance. Additionally, employing XGBoost regression models, we demonstrate that missing mobility data can be accurately inferred from restrictions, and missing infection rates can be predicted from mobility data. These findings provide valuable insights for tailoring public health strategies in future crisis and refining analytical approaches.


Performance and Explainability of Feature Selection-Boosted Tree-based Classifiers for COVID-19 Detection

December 2023

·

32 Reads

·

3 Citations

Heliyon

In this paper, we evaluate the performance and analyze the explainability of machine learning models boosted by feature selection in predicting COVID-19-positive cases from self-reported information. In essence, this work describes a methodology to identify COVID-19 infections that considers the large amount of information collected by the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS). More precisely, this methodology performs a feature selection stage based on the recursive feature elimination (RFE) method to reduce the number of input variables without compromising detection accuracy. A tree-based supervised machine learning model is then optimized with the selected features to detect COVID-19-active cases. In contrast to previous approaches that use a limited set of selected symptoms, the proposed approach builds the detection engine considering a broad range of features including self-reported symptoms, local community information, vaccination acceptance, and isolation measures, among others. To implement the methodology, three different supervised classifiers were used: random forests (RF), light gradient boosting (LGB), and extreme gradient boosting (XGB). Based on data collected from the UMD-CTIS, we evaluated the detection performance of the methodology for four countries (Brazil, Canada, Japan, and South Africa) and two periods (2020 and 2021). The proposed approach was assessed in terms of various quality metrics: F1-score, sensitivity, specificity, precision, receiver operating characteristic (ROC), and area under the ROC curve (AUC). This work also shows the normalized daily incidence curves obtained by the proposed approach for the four countries. Finally, we perform an explainability analysis using Shapley values and feature importance to determine the relevance of each feature and the corresponding contribution for each country and each country/year.


Time-limited Bloom Filter

June 2023

·

14 Reads

A Bloom Filter is a probabilistic data structure designed to check, rapidly and memory-efficiently, whether an element is present in a set. It has been vastly used in various computing areas and several variants, allowing deletions, dynamic sets and working with sliding windows, have surfaced over the years. When summarizing data streams, it becomes relevant to identify the more recent elements in the stream. However, most of the sliding window schemes consider the most recent items of a data stream without considering time as a factor. While this allows, e.g., storing the most recent 10000 elements, it does not easily translate into storing elements received in the last 60 seconds, unless the insertion rate is stable and known in advance. In this paper, we present the Time-limited Bloom Filter, a new BF-based approach that can save information of a given time period and correctly identify it as present when queried, while also being able to retire data when it becomes stale. The approach supports variable insertion rates while striving to keep a target false positive rate. We also make available a reference implementation of the data structure as a Redis module.



Figure 3: ROC curves and their 95% confidence intervals yielded by the proposed approach using different classification models in each of the four countries and for 2021. Each ROC curve includes the AUC corresponding value.
Feature Selection for an Explainability Analysis in Detection of COVID-19 Active Cases from Facebook User-Based Online Surveys

June 2023

·

36 Reads

In this paper, we introduce a machine-learning approach to detecting COVID-19-positive cases from self-reported information. Specifically, the proposed method builds a tree-based binary classification model that includes a recursive feature elimination step. Based on Shapley values, the recursive feature elimination method preserves the most relevant features without compromising the detection performance. In contrast to previous approaches that use a limited set of selected features, the machine learning approach constructs a detection engine that considers the full set of features reported by respondents. Various versions of the proposed approach were implemented using three different binary classifiers: random forest (RF), light gradient boosting (LGB), and extreme gradient boosting (XGB). We consistently evaluate the performance of the implemented versions of the proposed detection approach on data extracted from the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS) for four different countries: Brazil, Canada, Japan, and South Africa, and two periods: 2020 and 2021. We also compare the performance of the proposed approach to those obtained by state-of-the-art methods under various quality metrics: F1-score, sensitivity, specificity, precision, receiver operating characteristic (ROC), and area under ROC curve (AUC). It should be noted that the proposed machine learning approach outperformed state-of-the-art detection techniques in terms of the F1-score metric. In addition, this work shows the normalized daily case curves obtained by the proposed approach for the four countries. It should note that the estimated curves are compared to those reported in official reports. Finally, we perform an explainability analysis, using Shapley and relevance ranking of the classification models, to identify the most significant variables contributing to detecting COVID-19-positive cases. This analysis allowed us to determine the relevance of each feature and the corresponding contribution to the detection task.


Consistent comparison of symptom-based methods for COVID-19 infection detection

June 2023

·

12 Reads

·

5 Citations

International Journal of Medical Informatics

Background: During the global pandemic crisis, various detection methods of COVID-19-positive cases based on self-reported information were introduced to provide quick diagnosis tools for effectively planning and managing healthcare resources. These methods typically identify positive cases based on a particular combination of symptoms, and they have been evaluated using different datasets. Purpose: This paper presents a comprehensive comparison of various COVID-19 detection methods based on self-reported information using the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS), a large health surveillance platform, which was launched in partnership with Facebook. Methods: Detection methods were implemented to identify COVID-19-positive cases among UMD-CTIS participants reporting at least one symptom and a recent antigen test result (positive or negative) for six countries and two periods. Multiple detection methods were implemented for three different categories: rule-based approaches, logistic regression techniques, and tree-based machine-learning models. These methods were evaluated using different metrics including F1-score, sensitivity, specificity, and precision. An explainability analysis has also been conducted to compare methods. Results: Fifteen methods were evaluated for six countries and two periods. We identify the best method for each category: rule-based methods (F1-score: 51.48% - 71.11%), logistic regression techniques (F1-score: 39.91% - 71.13%), and tree-based machine learning models (F1-score: 45.07% - 73.72%). According to the explainability analysis, the relevance of the reported symptoms in COVID-19 detection varies between countries and years. However, there are two variables consistently relevant across approaches: stuffy or runny nose, and aches or muscle pain. Conclusions: Regarding the categories of detection methods, evaluating detection methods using homogeneous data across countries and years provides a solid and consistent comparison. An explainability analysis of a tree-based machine-learning model can assist in identifying infected individuals specifically based on their relevant symptoms. This study is limited by the self-report nature of data, which cannot replace clinical diagnosis.



Citations (63)


... 1) Articles not written in English; 2) Articles without complete bibliometric information; 3) Duplicates of the same study; 4) Articles that are not fully relevant to the field of explainability in medicine research or are general-purpose papers without case study implementations. [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], [85], [86], [87], [88], [89], [90], [91], [92], [93], [94], [95], [96], [97], [98], [99], [100], [101], [102], [103], [104], [105], [106], [107], [108] [109], [110], [111], [112], [113], [114], [115], [116], [117], [118], [119], [120], [121], [122], [123], [124], [125], [126], [127], [128], [129], [130], [131], [132], [133], [134], [135], [136], [137], [138], [139], [140], [141], [142], [143], [144], [145], [146], [147], [148], [149], [150], [151], [152], [153], [154], [155], [156], [157], [158], [159], [160], [161], [162], [163], [164], [165], [166], [167], [168], [169], [170], [171], [172], [173], [174], [175], [176], [177], [178], [179], [180], [181], [182], [183], [184] , [185], [186], [187], [188], [189], [190], [191], [192], [193], [194], [195], [196], [197], [198], [199], [200], [201], [202], [203], [204], [205], [206], [207], [208], [209], [210], [211], [212], [213], [214], [215], [216], [217], [218] [219], [220], [221], [222], [223], [224], [225] [226], [227], [228], [229], [230], [231], [232], [233], [234], [235] Forty more papers were excluded because it was not possible to access the full-text articles. At the end of this phase, the remaining 214 papers were identified as eligible reports. ...

Reference:

XAI Unveiled: Revealing the Potential of Explainable AI in Medicine - A Systematic Review
Performance and Explainability of Feature Selection-Boosted Tree-based Classifiers for COVID-19 Detection
  • Citing Article
  • December 2023

Heliyon

... There are works that analyze the severity of new COVID-19 variants compared to others (Nyber et al., 2023) but do not carry out a prediction of severity. They only carry out a study of reported cases in their databases; other works compare different methods but only for the detection of COVID-19 through symptoms (Rufino et al., 2023). On the other hand, there are works that evaluate the severity of patients with COVID-19 and their possibilities of entering the Intensive Care Unit (ICU) (Boussen et al., 2022), but they do so using respiratory rate and oxygen saturation signals, whereas other works use clinical data but only limit themselves to the prediction of intubated cases (Arvind et al., 2021). ...

Consistent comparison of symptom-based methods for COVID-19 infection detection
  • Citing Article
  • June 2023

International Journal of Medical Informatics

... Under intensive testing, these numbers would likely not have differed much from the peak of 300000 weekly cases in the simulated Control Population. During the Omicron era, however, the observed disease burden was much lower by the persisting strong vaccine protection against severe disease [26][27][28] and the lower severity of Omicron [29]. ...

Using survey data to estimate the impact of the omicron variant on vaccine efficacy against COVID-19 infection

... The UMD Global COVID-19 Trends and Impact Survey is a daily tracking survey of Facebook users conducted in 115 countries, including Peru [95][96][97][98]. The survey is conducted through Facebook recruitment that is weighted to represent the characteristics of each national population, with non-response and missing data treated as Missing Completely at Random (MACR) and computed with non-response weights with an inverse propensity score weighting (IPSW) approach (for details on survey methodology, please refer to [91,92,99]). ...

Consistent Comparison of Symptom-based Methods for COVID-19 Infection Detection

... Only Menasria et al., 76 reported the prevalence of the different variants of SARS-CoV-2 in some North African countries as 0.1 and 6.3% for Omicron in Egypt and Morocco, respectively, in December 2021. However, multiple studies reported the prevalence in South Africa, which was 24.74% between June and December 2021 22,23,26,44 and increased to 89.2% between October 1, 2021, and April 26, 2022. ...

Using Survey Data to Estimate the Impact of the Omicron Variant on Vaccine Efficacy against COVID-19 Infection

... Furthermore, electronic surveys can automatically check for incomplete or invalid responses and clarify how to respond correctly (for example, suggesting typing a number in digits rather than letters). Whereas, in the 1980s, the most severe drawback of an electronic survey was that many people did not have access to a personal terminal, with mobile phone technology now pervasive in our lives, everybody is now potentially reachable by an online survey, even during a pandemic (Baquero et al., 2021). ...

The CoronaSurveys System for COVID-19 Incidence Data Collection and Processing

Frontiers in Computer Science

... In a replicated KV-store, a consistency model is enforced through replication protocols [17,20,22,25,28,41], which manage the coordination between replicas and perform data replication. Replication protocols generally rely on consensus algorithms to achieve agreement among replicas on the order of operations. ...

Efficient replication via timestamp stability
  • Citing Conference Paper
  • April 2021

... Primarily, the choice of a fixed relative infectiousness value for mild and asymptomatic cases ( φ ), which also directly affects the model output for the number of infected individuals who are not in the hospital (A(t)), may have hindered the realism of our model, as it would be likely affected by the changes in the mobility of the population. Nevertheless, when looking at the cumulative number of infected individuals, our model outputs (see section 2 of Additional File) are aligned with estimates of seroprevalence in Spain [24,91,92], Italy [93], the Basque Country [91], and Tuscany [93,94]. Further research could focus on refining the model to incorporate relevant dynamics (such as mobility, compliance to NPIs, vaccination, waning immunity, and the emergence of COVID-19 variants) to answer questions relating to the effectiveness of health policies. ...

Estimating the COVID-19 Prevalence in Spain With Indirect Reporting via Open Surveys

... The goal of this paper is to mitigate liquidity fragmentation across rollups without introducing the aforementioned limitations. We propose leveraging Conflictfree Replicated Data Types (CRDTs)-abstract data types that converge to the same state in distributed environments [18], [19]-as a promising solution. Specifically, we introduce a universal abstract token standard, UAT20, which unifies a user's ERC20 tokens across multiple rollups. ...

Efficient Synchronization of State-Based CRDTs
  • Citing Conference Paper
  • April 2019

... After clearing outliers from the dataset (e.g., respondents that declared a higher number of vaccinated colleagues than the total number of people they stated they knew in their working network), we used the remaining data (510 observations) to build a post-stratified average and confidence interval according to the procedure adopted in Ojo et al. (2020). We summarize the procedure in Appendix 2. ...

CoronaSurveys: Using Surveys with Indirect Reporting to Estimate the Incidence and Evolution of Epidemics