• Home
  • Yaroslav Golubev
Yaroslav Golubev

Yaroslav Golubev
JetBrains Research · Laboratory of Machine Learning Methods in Software Engineering

Master of Technology
Senior Researcher at JetBrains Research

About

40
Publications
2,777
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
59
Citations
Introduction
My main areas of interest include large-scale code analysis, statistics, machine learning in software engineering, and, as a hobby, linguistics and natural language processing.
Education
September 2020 - August 2025
September 2018 - August 2020
ITMO University
Field of study
  • Laser Technologies
September 2014 - August 2018
ITMO University
Field of study
  • Applied Physics

Publications

Publications (40)
Preprint
Full-text available
Despite the availability of refactoring as a feature in popular IDEs, recent studies revealed that developers are reluctant to use them, and still prefer the manual refactoring of their code. At JetBrains, our goal is to fully support refactoring features in IntelliJ-based IDEs and improve their adoption in practice. Therefore, we start by raising...
Preprint
Full-text available
Code changes constitute one of the most important features of software evolution. Studying them can provide insights into the nature of software development and also lead to practical solutions - recommendations and automations of popular changes for developers. In our work, we developed a tool called PythonChangeMiner that allows to discover code...
Preprint
Full-text available
Code cloning plays a very important role in open-source software engineering. The presence of clones within a project may indicate a need for refactoring, and clones between projects are even more interesting, since code migration takes place and violations are possible. But how is code being copied? How prevalent is the process and on what level d...
Preprint
Full-text available
In software engineering, different approaches and machine learning models leverage different types of data: source code, textual information, historical data. An important part of any project is its dependencies. The list of dependencies is relatively small but carries a lot of semantics with it, which can be used to compare projects or make judgem...
Preprint
Full-text available
Integrated Development Environments (IDE) are designed to make users more productive, as well as to make their work more comfortable. To achieve this, a lot of diverse tools are embedded into IDEs, and the developers of IDEs can employ anonymous usage logs to collect the data about how they are being used to improve them. A particularly important c...
Preprint
Full-text available
The automatic collection of stack traces in bug tracking systems is an integral part of many software projects and their maintenance. However, such reports often contain a lot of duplicates, and the problem of de-duplicating them into groups arises. In this paper, we propose a new approach to solve the deduplication task and report on its use on th...
Preprint
Full-text available
In recent years, Jupyter notebooks have grown in popularity in several domains of software engineering, such as data science, machine learning, and computer science education. Their popularity has to do with their rich features for presenting and visualizing data, however, recent studies show that notebooks also share a lot of drawbacks: high numbe...
Preprint
Full-text available
In this paper, we present Lupa - a framework for large-scale analysis of the programming language usage. Lupa is a command line tool that uses the power of the IntelliJ Platform under the hood, which gives it access to powerful static analysis tools used in modern IDEs. The tool supports custom analyzers that process the rich concrete syntax tree o...
Preprint
Full-text available
The task of finding the best developer to fix a bug is called bug triage. Most of the existing approaches consider the bug triage task as a classification problem, however, classification is not appropriate when the sets of classes change over time (as developers often do in a project). Furthermore, to the best of our knowledge, all the existing mo...
Preprint
Full-text available
We have developed a plugin for IntelliJ IDEA called AntiCopyPaster that tracks the pasting of code inside the IDE and suggests appropriate Extract Method refactorings to combat the propagation of duplicates. To implement the plugin, we gathered a dataset of code fragments that should and should not be extracted, compiled a list of metrics of code t...
Preprint
Full-text available
Jupyter notebooks represent a unique format for programming - a combination of code and Markdown with rich formatting, separated into individual cells. We propose to perceive a Jupyter Notebook cell as a simplified and raw version of a programming function. Similar to functions, Jupyter cells should strive to contain singular, self-contained action...
Conference Paper
Full-text available
The popularity of cloud technologies has led to the development of a new type of applications that specifically target cloud environments. Such applications require a lot of cloud infrastructure to run, which brought about the Infrastructure as Code approach, where the infrastructure is also coded using a separate language in parallel to the main a...
Conference Paper
Full-text available
Similarly to production code, code smells also occur in test code, where they are called test smells. Test smells have a detrimental effect not only on test code but also on the production code that is being tested. To date, the majority of the research on test smells has been focusing on programming languages such as Java and Scala. However, there...
Preprint
Full-text available
In software engineering, a great number of new approaches are being actively researched, and a lot of tools are being developed based on them. These tools require a framework for their creation and an opportunity to be used by potential developers. Modern IDEs provide both. In this paper, we describe the main capabilities of the IntelliJ Platform t...
Preprint
Full-text available
Many code changes that developers make in their projects are repeated and constitute recurrent change patterns. It is of interest to collect such patterns from the version history of open-source repositories and suggest the most useful of them as quick fixes. In this paper, we present Revizor - a tool aimed to build custom plugins for PyCharm, a po...
Preprint
Full-text available
The popularity of cloud technologies has led to the development of a new type of applications that specifically target cloud environments. Such applications require a lot of cloud infrastructure to run, which brought about the Infrastructure as Code approach, where the infrastructure is also coded using a separate language in parallel to the main a...
Preprint
Full-text available
Similarly to production code, code smells also occur in test code, where they are called test smells. Test smells have a detrimental effect not only on test code but also on the production code that is being tested. To date, the majority of the research on test smells has been focusing on programming languages such as Java and Scala. However, there...
Preprint
Full-text available
Software development is a complex process that includes many different tasks besides just writing code. One of the aspects of software engineering is selecting and managing licenses for the given project. In this paper, we present Sorrel - a plugin for managing licenses and detecting potential incompatibilities for IntelliJ IDEA, a popular Java IDE...
Preprint
Full-text available
A lot of problems in the field of software engineering - bug fixing, commit message generation, etc. - require analyzing not only the code itself but specifically code changes. Applying machine learning models to these tasks requires us to create numerical representations of the changes, i.e. embeddings. Recent studies demonstrate that the best way...
Preprint
Full-text available
Recent trends in Web development demonstrate an increased interest in serverless applications, i.e. applications that utilize computational resources provided by cloud services on demand instead of requiring traditional server management. This approach enables better resource management while being scalable, reliable, and cost-effective. However, i...
Preprint
Full-text available
Clone detection plays an important role in software engineering. Finding clones within a single project introduces possible refactoring opportunities, and between different projects it could be used for detecting code reuse or possible licensing violations.In this paper, we propose a modification to bag-of-tokens based clone detection that allows d...
Article
Double-pulse femtosecond laser ablation of thin aluminum films and bulk aluminum counterintuitively demonstrated a strong (60-70%) raise of the thickness-dependent thresholds for inter-pulse delays of 20-200 ps, preventing material removal at above-threshold fluencies. Time-resolved optical pump-probe reflection and double-pump transmission studies...
Preprint
Full-text available
In this paper, we present Sosed, a tool for discovering similar software projects. We use fastText to compute the embeddings of subtokens into a dense space for 120,000 GitHub repositories in 200 languages. Then, we cluster embeddings to identify groups of semantically similar sub-tokens that reflect topics in source code. We use a dataset of 9 mil...
Preprint
Full-text available
Software refactoring plays an important role in increasing code quality. One of the most popular refactoring types is the Move Method refactoring. It is usually applied when a method depends more on members of other classes than on its own original class. Several approaches have been proposed to recommend Move Method refactoring automatically. Most...
Preprint
Full-text available
With an ever-increasing amount of open source software, the popularity of services like GitHub that facilitate code reuse, and common misconceptions about the licensing of open source software, the problem of license violations in the code is getting more and more prominent. In this study, we compile an extensive corpus of popular Java projects fro...
Article
Full-text available
The process of formation and the characteristics of silver nanostructures created by pulsed laser annealing in the air are studied. Nanoparticles were obtained by way of irradiating thin silver films (62 and 175 nm) on the dielectric (glass) base with an excimer laser emission ($lambda$ = 193 nm). Created nanostructures were studied using the metho...

Network

Cited By