César Soto-Valero

César Soto-Valero
KTH Royal Institute of Technology | KTH · Department of Software and Computer Systems

Master of Science
I'm a Ph.D. student doing research in software technology

About

35
Publications
61,315
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
139
Citations
Introduction
I'm a WASP Ph.D. student at KTH Royal Institute of Technology, in Sweden. My current research focuses on leveraging static and dynamic program analysis techniques for software debloating. For more details on what I'm doing, visit my personal website https://www.cesarsotovalero.net
Additional affiliations
September 2018 - present
KTH Royal Institute of Technology
Position
  • PhD Student
Description
  • I’m a WASP PhD student working with Benoit Baudry at KTH Royal Institute of Technology in Sweden. My current research is focused on software diversity, automatic program debloating and software specialization.
September 2014 - July 2015
Universidad Central "Marta Abreu" de las Villas
Position
  • Instructor

Publications

Publications (35)
Conference Paper
Full-text available
Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a li...
Preprint
Full-text available
Software bloat is code that is packaged in an application but is actually not used and not necessary to run the application. The presence of bloat is an issue for software security, for performance, and for maintenance. In recent years, several works have proposed techniques to detect and remove software bloat. In this paper, we introduce a novel t...
Article
Full-text available
Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application’s code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challeng...
Conference Paper
Full-text available
We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we...
Article
Full-text available
Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Rec...
Preprint
Full-text available
The rise of blockchain technologies has triggered tremendous research interests, coding efforts, and monetary investments in the last decade. Ethereum is the largest programmable blockchain platform today. It features cryptocurrency trading, digital art, and decentralized finance through smart contracts. So-called Ethereum nodes operate the blockch...
Article
Hyrum’s law states a common observation in the software industry: “With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody”. Meanwhile, recent research results seem to contradict this observation when they state that “for most APIs, the...
Preprint
Full-text available
We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we...
Article
Full-text available
The automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning to make computers able to interpret signs of the Swedish Sign Language...
Conference Paper
Full-text available
Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suit...
Preprint
Full-text available
Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suit...
Preprint
Full-text available
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers u...
Article
Full-text available
The objective of this work was focused on modeling to predict animal conditions through the use of time series for Cuba. The analysis was based on the temperature and humidity index (ITH)calculated from the data of 64 meteorological stations of the country, grouped into three regions: east, center, and west. Dickey-Fuller and KPSS tests were applie...
Article
Full-text available
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers u...
Article
Full-text available
Objectives: Spastic hemiplegia is one of the most common forms of cerebral palsy, in which one side of the body is affected to a greater extent than the other one. Hemiplegia severity (i.e. moderate vs mild forms) is currently used in some Para sports for classification purposes. This study evaluates the sensitivity of several tests of stability (...
Article
Full-text available
Competitive balance is a key concept in sport because it creates uncertainty on the outcome that leads to increased interest and demand for these events. The Spanish Professional Football League (LaLiga) has been one of the top European leagues in the last decade, and it has given rise to a particular research interest regarding its characteristics...
Conference Paper
Full-text available
During compilation from Java source code to byte-code, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Moder...
Preprint
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Modern...
Article
Full-text available
Con el objetivo de caracterizar variables y fenómenos meteorológicos relevantes que afectan la acuicultura de aguas interiores se seleccionaron embalses de la provincia de Villa Clara y Santiago de Cuba. Para determinar la influencia de las variables se estudiaron la temperatura del agua (C) y la concentración de oxígeno (mg·L −1) a diferentes prof...
Preprint
Full-text available
Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a li...
Article
Full-text available
El aprendizaje automático es una herramienta muy útil para el análisis de la gran cantidad de datos que se manejan en el deporte moderno. En la actualidad, este tipo de métodos se han convertido en un ámbito de investigación con grandes perspectivas de aplicación. En el presente trabajo se realiza una revisión del estado del arte sobre los principa...
Article
Full-text available
The popularity of Android has motivated a significant increase in the amount of malware specially designed to target this operating system. During the last years, the threat has become more serious and every day cybercriminals create and share new specimens through almost all existing markets. This situation has promoted a notable research interest...
Conference Paper
Full-text available
A growing body of research in empirical software engineering applies recurrent patterns analysis in order to make sense of the developers' behavior during their interactions with IDEs. However, the exploration of hidden real-time structures of programming behavior remains a challenging task. In this paper, we investigate the presence of temporal be...
Article
Full-text available
Pitcher’s performance is a key factor for winning or losing baseball games. Predicting when a starting pitcher will enter into an unfortunate pitching sequence is one of the most difficult decision-making problems for baseball managers. Since 2007, vast amounts of pitch-by-pitch records are available for free via the PITCHf/x system, but obtaining...
Technical Report
Full-text available
The application of machine learning methods has proven to be a successful approach for managing a wide variety of computer science problems. The aim of this technical report is to present some ideas related to the analysis of source code and software systems using machine learning techniques. In particular, we focus our study on its applications to...
Article
Full-text available
The generation and availability of football data has increased considerably last decades, mostly due to its popularity and also because of technological advances. Gaussian mixture clustering models represents a novel approach to exploring and analyzing performance data in sports. In this paper, we use principal components analysis in conjunction wi...
Thesis
Full-text available
El aprendizaje automático de datos deportivos constituye un área de investigación novedosa. Las tareas de predicción han acaparado la atención en el contexto deportivo debido sobre todo a los intereses del mercado y sus amplias aplicaciones como apoyo en la toma de decisiones. Actualmente se dispone de una gran cantidad de datos y registros históri...
Conference Paper
Full-text available
Las series temporales posibilitan la descripción de una gran variedad de fenómenos que transcurren a lo largo del tiempo. Los métodos que realizan análisis de series temporales usando técnicas de minería de datos son capaces de resolver múltiples problemas, superando muchas de las limitaciones presentes en los modelos estadísticos y matemáticos usa...
Article
Full-text available
The automatic extraction of useful knowledge and statistical information from waterpolo video sequences is a complex task currently little studied. Following the guidelines of the Observational Methodology, this paper describes the implementation of ACI-Polo, a computer system for the analysis of individual competitive activity in waterpolo games....
Article
Full-text available
The defensive efficiency in boxing is an essential key for obtaining the victory in combat, and is one of the basic elements in the pursuit of sporting expertise in boxing. In this work we proposed exercises, from the technical and tactical preparation, with the aim of improving the effectiveness of the defenses of arms, trunk and legs in junior bo...
Article
Full-text available
Sabermetrics is recognized as a new trend in the study of baseball game. This is based on the rigorous statistical study of the objective evidence obtained and has been used extensively in its empirical analysis. Considering both theoretical and practical contributions, sabermetrics involves the constant quest of understanding how to play baseball...
Thesis
Full-text available
Las series temporales permiten describir una gran variedad de fenómenos que transcurren a lo largo del tiempo. Los modelos que realizan análisis de series temporalesusando técnicas de minería de datos son capaces de resolver múltiples problemas, superando las limitaciones de los métodos estadísticos tradicionales. Weka es un poderoso sistema de apr...

Network

Cited By

Projects

Projects (4)
Project
In this project, we investigate the diversity of software artifacts in the Maven Central ecosystem.
Project
The main goal of this project is to use Data Science for analyzing data in software repositories.