Chapter

Services Extraction for Integration in Software Projects via an Agent-Based Negotiation System

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The great development of the internet and all its associated systems has led to the growth of multiple and diverse capabilities in the field of software development. One such development was the emergence of code repositories that allow developers to share their projects, as well as allowing other developers to contribute to the growth and improvement of those projects. However, there has been such a growth in the use of these systems that there are multiple works with very similar names and themes that it is not easy to find a repository that completely adapts to the developer’s needs quickly. This process of searching and researching for repositories that fit the initial needs has become a complex task. Due to the complexity of this process, developers need tools that allow them to process a large amount of information that can be downloaded and analysed programmatically. This complexity can be solved by approaches such as big data and scraping. This paper presents the design of a data ingestion system for libraries, components and repositories in a multi-agent programming environment.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Such services need to be discovered, usually though a manual search of the developer. In SmartCLIDE service discovery is assisted by an AI agent that receives as input necessary functional information about the service that is going to be used, and goes through several sources (e.g., programmable web) for identifying fitting services [2], [3]. ...
... After the user sends a query with the service specification, through a Natural Language Query Interface, the discovery will draw results from web pages, code repositories and Service Registries by invoking 3 rd party search APIs. SmartCLIDE can rewrite the provided user input query on the basis of indexed popular search queries leveraging AI-based techniques, displaying the results of identified services to the user as a ranked list [2], [3]. ...
Conference Paper
Nowadays the majority of cloud applications are developed based on the Service-Oriented Architecture (SOA) paradigm. Large-scale applications are structured as a collection of well-integrated services that are deployed in public, private or hybrid cloud. Despite the inherent benefits that service-based cloud development provides, the process is far from trivial, in the sense that it requires the software engineer to be (at least) comfortable with the use of various technologies in the long cloud development toolchain: programming in various languages, testing tools, build / CI tools, repositories, deployment mechanisms, etc. In this paper, we propose an approach and corresponding toolkit (termed SmartCLIDE-as part of the results of an EU-funded research project) for facilitating SOA-based software development for the cloud, by extending a well-known cloud IDE from Eclipse. The approach aims at shortening the toolchain for cloud development, hiding the process complexity and lowering the required level of knowledge from software engineers. The approach and tool underwent an initial validation from professional cloud software developers. The results underline the potential of such an automation approach, as well as the usability of the research prototype, opening further research opportunities and providing benefits for practitioners.
Article
Full-text available
New mapping and location applications focus on offering improved usability and services based on multi-modal door to door passenger experiences. This helps citizens develop greater confidence in and adherence to multi-modal transport services. These applications adapt to the needs of the user during their journey through the data, statistics and trends extracted from their previous uses of the application. The My-Trac application is dedicated to the research and development of these user-centered services to improve the multi-modal experience using various techniques. Among these techniques are preference extraction systems, which extract user information from social networks, such as Twitter. In this article, we present a system that allows to develop a profile of the preferences of each user, on the basis of the tweets published on their Twitter account. The system extracts the tweets from the profile and analyzes them using the proposed algorithms and returns the result in a document containing the categories and the degree of affinity that the user has with each category. In this way, the My-Trac application includes a recommender system where the user receives preference-based suggestions about activities or services on the route to be taken.
Article
Full-text available
Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus. We evaluated Aroma on 2000 randomly selected queries created from the corpus, as well as 64 queries derived from code snippets obtained from Stack Overflow, a popular website for discussing code. We implemented Aroma for 4 different languages, and developed an IDE plugin for Aroma. Furthermore, we conducted a study where we asked 12 programmers to complete programming tasks using Aroma, and collected their feedback. Our results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently.
Article
Full-text available
README files play an essential role in shaping a developer's first impression of a software repository and in documenting the software project that the repository hosts. Yet, we lack a systematic understanding of the content of a typical README file as well as tools that can process these files automatically. To close this gap, we conduct a qualitative study involving the manual annotation of 4,226 README file sections from 393 randomly sampled GitHub repositories and we design and evaluate a classifier and a set of features that can categorize these sections automatically. We find that information discussing the `What' and `How' of a repository is very common, while many README files lack information regarding the purpose and status of a repository. Our multi-label classifier which can predict eight different categories achieves an F1 score of 0.746. This work enables the owners of software repositories to improve the quality of their documentation and it has the potential to make it easier for the software development community to discover relevant information in GitHub README files.
Conference Paper
Full-text available
Android has grown to be the world's most popular mobile platform with apps that are capable of doing everything from checking sports scores to purchasing stocks. In order to assist researchers and developers in better understanding the development process as well as the current state of the apps themselves, we present a large dataset of analyzed open-source Android applications and provide a brief analysis of the data, demonstrating potential usefulness. This dataset contains 1,179 applications, including 4,416 different versions of these apps and 435,680 total commits. Furthermore, for each app we include the analytical results obtained from several static analysis tools including Androguard, Sonar, and Stowaway. In order to better support the community in conducting research on the security characteristics of the apps, our large analytical dataset comes with the detailed information including various versions of AndroidManifest.xml files and synthesized information such as permissions, intents, and minimum SDK. We collected 13,036 commits of the manifest files and recorded over 69,707 total permissions used. The results and a brief set of analytics are presented on our website: http://androsec.rit.edu.
Article
Full-text available
Text mining has become an exciting research field as it tries to discover valuable information from unstructured texts. The unstructured texts which contain vast amount of information cannot simply be used for further processing by computers. Therefore, exact processing methods, algorithms and techniques are vital in order to extract this valuable information which is completed by using text mining. In this paper, we have discussed general idea of text mining and comparison of its techniques. In addition, we briefly discuss a number of text mining applications which are used presently and in future.
Article
The optimization of energy use in family homes and public buildings is an ongoing topic of discussion. State-of-the-art research has almost always focused on reducing the consumption of heating systems, airconditioning or lighting. Despite their importance, user-related variables, such as comfort, are normally not included in the optimization process. These aspects should be considered to be able to effectively minimize energy consumption. Thus, there is a need for a comprehensive energy optimization approach, one that will consider both climatological factors and user behaviour. Learning about user behaviour is key to effective optimization. In this work, the proposed architecture's capacity to organize Virtual Agent Organizations (VAO) allows it to adapt to highly variable user behavior and preferences. This agent methodology has the ability to manage Wireless Sensor Networks (WSNs), Artificial Neural Networks (ANN) and Case-Based Reasoning (CBR) to obtain user preferences and predict their behaviour in the home or building. The proposed approach has been tested in two different buildings, a traditional-construction house and a modular home, obtaining savings of 30.16% and 13.43%, respectively. These results validate the proposed mixed approach of temperature adjustment algorithms together with the extraction of user behavior patterns for the establishment of a threshold based on preferences.
Article
Educational institutions continually strive to improve the services they offer, their aim is to have the best possible teaching staff, increase the quality of teaching and the academic performance of their students. A knowledge of the factors that affect student learning could help universities and study centres adjust their curricula and teaching methods to the needs of their students. One of the first measures employed by teaching institutions was to create Virtual Learning Environments (VLEs). This type of environment makes it possible to attract a larger number of students because it enables them to study from wherever they are in the world, meaning that the student’s location is no longer a constraint. Moreover, VLEs facilitate access to teaching resources, they make it easier to monitor the activity of the teaching staff and of the interactions between students and teachers. Thus, online environments make it possible to assess the factors that cause the students’ academic performance to increase or decrease. To understand the factors that influence the university learning process, this paper applies a series of automatic learning techniques to a public dataset, including tree-based models and different types of Artificial Neural Networks (ANNs). Having applied these techniques to the dataset, the number of times students have accessed the resources made available on VLE platforms has been identified as a key factor affecting student performance. This factor has been analysed by conducting a real case study which has involved 120 students doing a masters degree over a VLE platform. Concretely, the case study participants were masters degree students in areas related to computer engineering at the University of Salamanca.
Conference Paper
Discovering regularities in source code is of great interest to software engineers, both in academia and in industry, as regularities can provide useful information to help in a variety of tasks such as code comprehension, code refactoring, and fault localisation. However, traditional pattern mining algorithms often find too many patterns of little use and hence are not suitable for discovering useful regularities. In this paper we propose FREQTALS, a new algorithm for mining patterns in source code based on the FREQT tree mining algorithm. First, we introduce several constraints that effectively enable us to find more useful patterns; then, we show how to efficiently include them in FREQT. To illustrate the usefulness of the constraints we carried out a case study in collaboration with software engineers, where we identified a number of interesting patterns in a repository of Java code.
Article
With the rapid development of Cloud-based services, the necessity of a Cloud service discovery engine becomes a fundamental requirement. A semantic focused crawler is one of the most key components of Cloud service discovery engines. However, the huge size and varied functionalities of Cloud services on the Web have a great effect on crawlers to provide effective Cloud services. It is a challenge for semantic crawlers to search only for URLs that offer Cloud services from this explosion of information. To solve these issues, this paper proposes a self-adaptive semantic focused crawler based on Latent Dirichlet Allocation (LDA) for efficient Cloud service discovery. In this paper, we present a Cloud Service Ontology (CSOnt) that defines Cloud service categories. CSOnt contains a set of concepts, allowing the crawler to automatically collect and categorize Cloud services. Moreover, our proposed crawler adopts URLs priority techniques to maintain the order of URLs to be parsed for efficient retrieval of the relevant Cloud services. Additionally, we create a self-adaptive semantic focused crawler, which has an ontology-learning function to automatically improve the proposed Cloud Service Ontology and maintain the crawler's performance.
Conference Paper
The number of open source software projects has been growing exponentially. The major online software repository host, GitHub, has accumulated tens of millions of publicly available Git version-controlled repositories. Although the research potential enabled by the available open source code is clearly substantial, no significant large-scale open source code datasets exist. In this paper, we present the Public Git Archive -- dataset of 182,014 top-bookmarked Git repositories from GitHub. We describe the novel data retrieval pipeline to reproduce it. We also elaborate on the strategy for performing dataset updates and legal issues. The Public Git Archive occupies 3.0 TB on disk and is an order of magnitude larger than the current source code datasets. The dataset is made available through HTTP and provides the source code of the projects, the related metadata, and development history. The data retrieval pipeline employs an optimized worker queue model and an optimized archive format to efficiently store forked Git repositories, reducing the amount of data to download and persist. Public Git Archive aims to open a myriad of new opportunities for ``Big Code`` research.
Article
Performance of any search engine relies heavily on its Web crawler. Web crawlers are the programs that get webpages from the Web by following hyperlinks. These webpages are indexed by a search engine and can be retrieved by a user query. In the area of Web crawling, we still lack an exhaustive study that covers all crawling techniques. This study follows the guidelines of systematic literature review and applies it to the field of Web crawling. We used the standard procedure of carrying out a systematic literature review on 248 studies from a total of 1488 articles published in 12 leading journals and other premier conferences and workshops. Existing literature about the Web crawler is classified into different key subareas. Each subarea is further divided according to the techniques being used. We analyzed the distribution of various articles using multiple criteria and depicted conclusions. Various studies that use open source Web crawlers are also reported. We have highlighted future areas of research. We call for an increased awareness in various fields of the Web crawler and identify how techniques from other domains can be used for crawling the Web. Limitations and recommendations for future are also discussed. WIREs Data Mining Knowl Discov 2017, 7:e1218. doi: 10.1002/widm.1218 This article is categorized under: Algorithmic Development > Web Mining Fundamental Concepts of Data and Knowledge > Information Repositories Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
Article
We present the first method for automatically mining code idioms from a corpus of previously written, idiomatic software projects. We take the view that a code idiom is a syntactic fragment that recurs across projects and has a single semantic role. Idioms may have metavariables, such as the body of a for loop. Modern IDEs commonly provide facilities for manually defining idioms and inserting them on demand, but this does not help programmers to write idiomatic code in languages or using libraries with which they are unfamiliar. We present HAGGIS, a system for mining code idioms that builds on recent advanced techniques from statistical natural language processing, namely, nonparametric Bayesian probabilistic tree substitution grammars. We apply HAGGIS to several of the most popular open source projects from GitHub. We present a wide range of evidence that the resulting idioms are semantically meaningful, demonstrating that they do indeed recur across software projects and that they occur more frequently in illustrative code examples collected from a Q&A site. Manual examination of the most common idioms indicate that they describe important program concepts, including object creation, exception handling, and resource management.
Autocompletion with deep learning
  • Tabnine
TabNine, Autocompletion with deep learning (2019). https://www.kite.com/. Accessed 2020
  • M Balog
  • A L Gaunt
  • M Brockschmidt
  • S Nowozin
  • D Tarlow
Balog, M., Gaunt, A.L., Brockschmidt, M., Nowozin, S., Tarlow, D.: DeepCoder: learning to write programs. arXiv preprint arXiv:1611.01989 (2016)