César Soto-Valero
César Soto-Valero
PhD in Computer Science
cesarsotovalero.net
About
43
Publications
84,763
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
431
Citations
Introduction
Additional affiliations
September 2018 - April 2024
Publications
Publications (43)
Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a li...
Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application’s code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challeng...
We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we...
Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Rec...
Software bills of materials (SBOM) promise to become the backbone of software supply chain hardening. We deep-dive into six tools and the SBOMs they produce for complex open source Java projects, revealing challenges regarding the accurate production and usage of SBOMS.
Large-scale code reuse significantly reduces both development costs and time. However, the massive share of third-party code in software projects poses new challenges, especially in terms of maintenance and security. In this paper, we propose a novel technique to specialize dependencies of Java projects, based on their actual usage. Given a project...
As all software, blockchain nodes are exposed to faults in their underlying execution stack. Unstable execution environments can disrupt the availability of blockchain nodes interfaces, resulting in downtime for users. This paper introduces the concept of N-version Blockchain nodes. This new type of node relies on simultaneous execution of differen...
Modern software systems rely on a multitude of third-party dependencies. This large-scale code reuse reduces development costs and time, and it poses new challenges with respect to maintenance and security. Techniques such as tree shaking or shading can remove dependencies that are completely unused by a project, which partly address these challeng...
As all software, blockchain nodes are exposed to faults in their underlying execution stack. Unstable execution environments can disrupt the availability of blockchain nodes' interfaces, resulting in downtime for users. This paper introduces the concept of N-Version Blockchain nodes. This new type of node relies on simultaneous execution of differe...
Ethereum is the single largest programmable blockchain platform today. Ethereum nodes operate the blockchain, relying on a vast supply chain of third-party software dependencies. In this article, we perform an analysis of the software supply chain of Java Ethereum nodes and distill the challenges of maintaining and securing this blockchain technolo...
Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating, which we call coverage-based debloating. We implement the technique for one singl...
The rise of blockchain technologies has triggered tremendous research interests, coding efforts, and monetary investments in the last decade. Ethereum is the largest programmable blockchain platform today. It features cryptocurrency trading, digital art, and decentralized finance through smart contracts. So-called Ethereum nodes operate the blockch...
Hyrum’s law states a common observation in the software industry: “With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody”. Meanwhile, recent research results seem to contradict this observation when they state that “for most APIs, the...
We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we...
The automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning to make computers able to interpret signs of the Swedish Sign Language...
Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suit...
Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suit...
The objective of this work was focused on modeling to predict animal conditions through the use of time series for Cuba. The analysis was based on the temperature and humidity index (ITH) calculated from the data of 64 meteorological stations of the country, grouped into three regions: east, center and west. Dickey - Fuller and KPSS tests were appl...
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers u...
The objective of this work was focused on modeling to predict animal conditions through the use of time series for Cuba. The analysis was based on the temperature and humidity index (ITH)calculated from the data of 64 meteorological stations of the country, grouped into three regions: east, center, and west. Dickey-Fuller and KPSS tests were applie...
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers u...
Objectives:
Spastic hemiplegia is one of the most common forms of cerebral palsy, in which one side of the body is affected to a greater extent than the other one. Hemiplegia severity (i.e. moderate vs mild forms) is currently used in some Para sports for classification purposes. This study evaluates the sensitivity of several tests of stability (...
Competitive balance is a key concept in sport because it creates uncertainty on the outcome that leads to increased interest and demand for these events. The Spanish Professional Football League (LaLiga) has been one of the top European leagues in the last decade, and it has given rise to a particular research interest regarding its characteristics...
During compilation from Java source code to byte-code, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Moder...
Con el objetivo de caracterizar variables y fenómenos meteorológicos relevantes que afectan la acuicultura de aguas interiores se seleccionaron embalses de la provincia de Villa Clara y Santiago de Cuba. Para determinar la influencia de las variables se estudiaron la temperatura del agua (C) y la concentración de oxígeno (mg·L −1) a diferentes prof...
In order to characterize variables and relevant meteorological phenomena that affect inland aquaculture, reservoirs from Villa Clara and Santiago de Cuba province were selected. To determine the influence of the variables, water temperature (oC) and oxygen concentration (mg • L-1) were studied at different depths and from the meteorological point o...
Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a li...
El aprendizaje automático es una herramienta muy útil para el análisis de la gran cantidad de datos que se manejan en el deporte moderno. En la actualidad, este tipo de métodos se han convertido en un ámbito de investigación con grandes perspectivas de aplicación. En el presente trabajo se realiza una revisión del estado del arte sobre los principa...
The popularity of Android has motivated a significant increase in the amount of malware specially designed to target this operating system. During the last years, the threat has become more serious and every day cybercriminals create and share new specimens through almost all existing markets. This situation has promoted a notable research interest...
A growing body of research in empirical software engineering applies recurrent patterns analysis in order to make sense of the developers' behavior during their interactions with IDEs. However, the exploration of hidden real-time structures of programming behavior remains a challenging task. In this paper, we investigate the presence of temporal be...
Pitcher’s performance is a key factor for winning or losing baseball games. Predicting when a starting pitcher will enter into an unfortunate pitching sequence is one of the most difficult decision-making problems for baseball managers. Since 2007, vast amounts of pitch-by-pitch records are available for free via the PITCHf/x system, but obtaining...
The application of machine learning methods has proven to be a successful approach for managing a wide variety of computer science problems. The aim of this technical report is to present some ideas related to the analysis of source code and software systems using machine learning techniques. In particular, we focus our study on its applications to...
The generation and availability of football data has increased considerably last decades, mostly due to its popularity and also because of technological advances. Gaussian mixture clustering models represents a novel approach to exploring and analyzing performance data in sports. In this paper, we use principal components analysis in conjunction wi...
El aprendizaje automático de datos deportivos constituye un área de investigación novedosa. Las tareas de predicción han acaparado la atención en el contexto deportivo debido sobre todo a los intereses del mercado y sus amplias aplicaciones como apoyo en la toma de decisiones. Actualmente se dispone de una gran cantidad de datos y registros históri...
Las series temporales posibilitan la descripción de una gran variedad de fenómenos que transcurren a lo largo del tiempo. Los métodos que realizan análisis de series temporales usando técnicas de minería de datos son capaces de resolver múltiples problemas, superando muchas de las limitaciones presentes en los modelos estadísticos y matemáticos usa...
The automatic extraction of useful knowledge and statistical information from waterpolo video sequences is a complex task currently little studied. Following the guidelines of the Observational Methodology, this paper describes the implementation of ACI-Polo, a computer system for the analysis of individual competitive activity in waterpolo games....
The defensive efficiency in boxing is an essential key for obtaining the victory in combat, and is one of the basic elements in the pursuit of sporting expertise in boxing. In this work we proposed exercises, from the technical and tactical preparation, with the aim of improving the effectiveness of the defenses of arms, trunk and legs in junior bo...
Sabermetrics is recognized as a new trend in the study of baseball game. This is based on the rigorous statistical study of the objective evidence obtained and has been used extensively in its empirical analysis. Considering both theoretical and practical contributions, sabermetrics involves the constant quest of understanding how to play baseball...
Las series temporales permiten describir una gran variedad de fenómenos que transcurren a lo largo del tiempo. Los modelos que realizan análisis de series temporalesusando técnicas de minería de datos son capaces de resolver múltiples problemas, superando las limitaciones de los métodos estadísticos tradicionales. Weka es un poderoso sistema de apr...