Conference Paper

Service Classification through Machine Learning: Aiding in the Efficient Identification of Reusable Assets in Cloud Application Development

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Developing software based on services is one of the most emerging programming paradigms in software development. Service-based software development relies on the composition of services (i.e., pieces of code already built and deployed in the cloud) through orchestrated API calls. Black-box reuse can play a prominent role when using this programming paradigm, in the sense that identifying and reusing already existing/deployed services can save substantial development effort. According to the literature, identifying reusable assets (i.e., components, classes, or services) is more successful and efficient when the discovery process is domain-specific. To facilitate domain-specific service discovery, we propose a service classification approach that can categorize services to an application domain, given only the service description. To validate the accuracy of our classification approach, we have trained a machine-learning model on thousands of open-source services and tested it on 67 services developed within two companies employing service-based software development. The study results suggest that the classification algorithm can perform adequately in a test set that does not overlap with the training set; thus, being (with some confidence) transferable to other industrial cases. Additionally, we expand the body of knowledge on software categorization by highlighting sets of domains that consist 'grey-zones' in service classification.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In the recent past, service development has demonstrated remarkable success in overcoming the complexities associated with multiplatform software products by using automation in service classification (Zou et al., 2022;Alizadehsani et al., 2022), discovery (Ma et al., 2021a), composing and developing web services (Alwasouf and Kumar, 2019;Wang et al., 2022;Sun et al., 2019;Sangaiah et al., 2020) and web API recommendation Wu et al., 2022) etc. Due to web service text features, most recent works in SOA tasks take advantage of text mining, DL-based NLP techniques, and ML. For example, Zou et al. (2022) propose a service classification model called DeepLTSC, which aims to address two service classification problems i.e. web service multi-dimensional data and unequal distribution of service data. ...
Article
Service-Oriented Architectures (SOA) have become a standard for developing software applications, including but not limited to cloud-based ones and enterprise systems. When using SOA, software engineers organize the desired functionality into self-contained and independent services that are invoked through end-points (with API calls). The use of this emerging technology has changed drastically the way that software reuse is performed, in the sense that a "service" is a "code chunk" that is reusable (preferably in a black-box manner), but in many (es-pecially "in-house") cases, white-box reuse is also meaningful. To confront the reuse challenges opened-up by the rise of SOA, in the SmartCLIDE project 1 we have developed a framework (a methodology and a platform) to aid software engineers in systematic and more efficient (in terms of time, quality, defects, and process) reuse of services, when developing SOA-based cloud applications. In this work, we: (a) present the SmartCLIDE methodology and the Eclipse Open SmartCLIDE platform; and (b) evaluate the usefulness of the framework, in terms of relevance, usability, and obtained benefits. The results of the study have confirmed the relevance and rigor of the framework, unveiled some limitations, and pointed to interesting future work directions, but also provided some actionable implications for researchers and practitioners.
Article
Full-text available
Over the last decades, web services are used for performing specific tasks demanded by users. The most important task of service’s classification system is to match an anonymous input service with the stored pre-classified web services. The most challenging issue is that web services are currently organized and classified according to syntax while the context of the requested service is ignored. Due to this motivation, Cloud-based Classification Methodology is proposed as it presents a new methodology based on semantic web service’s classification. Furthermore, cloud computing is used for not only storing but also allocating the high scale of web services with both high availability and accessibility. Fog technology is employed to reduce the latency and to speed up response time. The experimental results using the suggested methodology show a better performance of the proposed system regarding both precision and accuracy in comparison with most of the methods discussed in the literature of the current study.
Article
Full-text available
Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing an effective search query is not only challenging but also time-consuming for the developers according to existing studies. In this article, we propose a novel query reformulation technique–RACK–that suggests a list of relevant API classes for a natural language query intended for code search. Our technique offers such suggestions by exploiting keyword-API associations from the questions and answers of Stack Overflow (i.e., crowdsourced knowledge). We first motivate our idea using an exploratory study with 19 standard Java API packages and 344K Java related posts from Stack Overflow. Experiments using 175 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the Top-10 results for 83% of the queries, with 46% mean average precision and 54% recall, which are 66%, 79% and 87% higher respectively than that of the state-of-the-art. Reformulations using our suggested API classes improve 64% of the natural language queries and their overall accuracy improves by 19%. Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin. Investigation using three web/code search engines shows that our technique can significantly improve their results in the context of code search.
Article
Full-text available
With the emergence of the Programmable Web paradigm, the World Wide Web is evolving into a Web of Services, where data and services can be effectively reused across applications. Given the wide diversity and scale of published Web services, the problem of service discovery is a big challenge for service-based application development. This is further compounded by the limited availability of intelligent categorization and service management frameworks. In this paper, an approach that extends service similarity analysis by using morphological analysis and machine learning techniques for capturing the functional semantics of real-world Web services for facilitating effective categorization is presented. To capture the functional diversity of the services, different feature vector selection techniques are used to represent a service in vector space, with the aim of finding the optimal set of features. Using these feature vector models, services are classified as per their domain, using ensemble machine learning methods. Experiments were performed to validate the classification accuracy with respect to the various service feature vector models designed, and the results emphasize the effectiveness of the proposed approach.
Article
Full-text available
Servitization is one of the most significant trends that reshapes the information world and society in recent years. The requirement of collecting,storing, processing, and sharing of the Big Data has led to massive software resources being developed and made accessible as web-based services to facilitate such process. These services that handle the Big Data come from various domains and heterogeneous networks, and converge into a huge complicated service network (or ecosystem), called the Big Service.The key issue facing the big data and big service ecosystem is how to optimally configure and operate the related service resources to serve the specific requirements of possible applications, i.e., how to reuse the existing service resources effectively and efficiently to develop the new applications or software services, to meet the massive individualized requirements of end-users.Based on analyzing the big service ecosystem, we present in this paper a new paradigm for software service engineering, RE2SEP (Requirement-Engineering Two-Phase of Service Engineering Paradigm), which includes three components: service-oriented requirement engineering, domain-oriented service engineering, and software service development approach. RE2SEP enables the rapid design and implementation of service solutions to match the requirement propositions of massive individualized customers in the Big Service ecosystem. A case study on people's mobility service in a smart city environment is given to demonstrate the application of RE2SEP.RE2SEP can potentially revolutionize the traditional life-cycle oriented software engineering, leading to a new approach to software service engineering.
Article
Full-text available
Automatic and semi-automatic approaches for classification of web services have garnered much interest due to their positive impact on tasks like service discovery, matchmaking and composition. Currently, service registries support only human classification, which results in limited recall and low precision in response to queries, due to keyword based matching. The syntactic features of a service along with certain semantics based measures used during classification can result in accurate and meaningful results. We propose an approach for web service classification based on conversion of services into a class dependent vector by applying the concept of semantic relatedness and to generate classes of services ranked by their semantic relatedness to a given query. We used the OWLS-tc service dataset for evaluating our approach and the experimental results are presented in this work.
Article
Full-text available
Both practitioners and academics often frown on pragmatic and opportunistic reuse. Large organizations report that structured reuse methods and software product lines are often the way to go when it comes to efficient software reuse. However, opportunistic that is, nonstructured reuse has proven profitable for small to medium-sized organizations. Here, we describe two start-ups that have opportunistically and pragmatically developed their products, reusing functionality from others that they could never have built independently. We demonstrate that opportunistic and pragmatic reuse is necessary to rapidly develop innovative software products. We define pragmatic reuse as extending software with functionality from a third-party software supplier that was found without a formal search-and-procurement process and might not have been built with reuse in mind. We define opportunistic reuse as extending software with functionality from a third-party software supplier that wasn't originally intended to be integrated and reused.
Conference Paper
Full-text available
The increasing number of available online services demands distributed architectures to promote scalability as well as semantics to enable their precise and efficient retrieval. Two common approaches toward this goal are Semantic Overlay Networks (SONs) and Distributed Hash Tables (DHTs) with semantic extensions. This paper presents ERGOT, a system that combines DHTs and SONs to enable semantic-based service discovery in distributed infrastructures such as Grids and Clouds. ERGOT takes advantage of semantic annotations that enrich service specifications in two ways: (i) services are advertised in the DHT on the basis of their annotations, thus allowing to establish a SON among service providers, (ii) annotations enable semantic-based service matchmaking, using a novel similarity measure between service requests and descriptions. Experimental evaluations confirmed the efficiency of ERGOT in terms of accuracy of search and network traffic.
Conference Paper
Full-text available
A promising way of software reuse is Component-Based Software Development (CBSD). There is an increasing number of OSS products available that can be freely used in product development. However, OSS communities themselves have not yet taken full advantage of the “reuse mechanism”. Many OSS projects duplicate effort and code, even when sharing the same application domain and topic. One successful counter-example is the FFMpeg multimedia project, since several of its components are widely and consistently reused into other OSS projects. This paper documents the history of the libavcodec library of components from the FFMpeg project, which at present is reused in more than 140 OSS projects. Most of the recipients use it as a black-box component, although a number of OSS projects keep a copy of it in their repositories, and modify it as such. In both cases, we argue that libavcodec is a successful example of reusable OSS library of components.
Article
Full-text available
A Web service is a Web accessible software that can be published, located and invoked by using standard Web protocols. Automatically determining the category of a Web service, from several pre-defined categories, is an important problem with many applications such as service discovery, semantic annotation and service matching. This paper describes AWSC (Automatic Web Service Classification), an automatic classifier of Web service descriptions. AWSC exploits the connections between the category of a Web service and the information commonly found in standard descriptions. In addition, AWSC bridges different styles for describing services by combining text mining and machine learning techniques. Experimental evaluations show that this combination helps our classification system at improving its precision. In addition, we report an experimental comparison of AWSC with a related work.
Conference Paper
Full-text available
Software reuse can become a key factor for improving and guaranteeing software quality, when adopted systematically all along the software process. The main characteristic of reuse-oriented processes is that they require a common repository for storing, searching and retrieving software modules. Moreover, reuse occurs systematically and is an integrated part of the process. Previous works of the same authors have empirically shown that the full reuse maintenance model (FRM) slows down quality degradation following to maintenance interventions on a software system. This work is a further step in the investigation towards demonstrating how reuse oriented development (ROD) impacts on software quality; how it favors FRM model; and finally, whether reuse-oriented development influences productivity, and as so, is more efficient. This has been done through a case study carried out on two ongoing industrial projects. Results are positive and support our research hypotheses
Conference Paper
Full-text available
Domain engineering is successful in promoting reuse. An approach to domain-specific reuse in service-oriented environments is proposed to facilitate service requesters to reuse Web services. In the approach, we present a conceptual model of domain-specific services (called domain service). Domain services in a certain business domain are modeled by semantic and feature modeling techniques, and bound to Web services with diverse capabilities through a variability-supported matching mechanism. By reusing pre-modeled domain services, service requesters can describe their requests easily through a service customization mechanism. Web service selection based on customized results can also be optimized by reusing the pre-matching results between domain services and Web services. Feasibility of the whole approach is demonstrated on an example. Published (author's copy) Peer Reviewed
Chapter
Reusing software is a promising way to reduce software development costs. Nowadays, applications compose available web services to build new software products. In this context, service composition faces the challenge of proper service selection. This paper presents a model for classifying web services. The service dataset has been collected from the well-known public service registry called ProgrammableWeb. The results were obtained by breaking service classification into a two-step process. First, Natural Language Processing(NLP) pre-processed web service data have been clustered by the Agglomerative hierarchical clustering algorithm. Second, several supervised learning algorithms have been applied to determine service categories. The findings show that the hybrid approach using the combination of hierarchical clustering and SVM provides acceptable results in comparison with other unsupervised/supervised combinations.
Article
In software technology, over the diversified environment, services can be rendered using an innovative mechanism of a novel paradigm called web services. In a business environment, rapid changes and requirements from various customers can be adapted using this service. For service management and discovery, the classification of Web services having the same functions is an efficient technique. However, there will be short lengthened Web services functional description documents, having less information, and sparse features. This makes difficulties in modelling short text in various topic models and leads to make an effect in the classification of Web services. A Mixed Wide and PSO-Bi-LSTM-CNN model (MW-PSO-Bi-LSTM-CNN) is proposed in this work for solving this issue. In this technique, the Web service category‟s breadth prediction is performed by combining Web services description document‟s discrete features, which exploits the wide learning model. In the next stage, the PSO-Bi-LSTM-CNN model is used for mining Web services description document word‟s context information and word order, for performing the Web service category‟s depth prediction. Here, particle swarm optimization (PSO) is integrated with the Bi-LSTM-CNN network for computing various hyper-parameters in an automatic manner. In third stage, Web service categories, results of depth, and breadth prediction are integrated using a linear regression model as final service classification result. At last, MW-PSO-Bi-LSTM-CNN, Wide&Bi-LSTM, and Wide&Deep web service classification techniques are compared and a better result with respect to web service classification accuracy is obtained using the proposed technique as shown in experimental results.
Conference Paper
Software reuse is a well-established software engineering pro-cess that aims at improving development productivity. Although reuse can be performed in a very systematic way (e.g., through product lines), in practice, reuse is performed in many cases opportunistically, i.e., copying small code chunks either from the web or in-house developed projects. Knowledge sharing communities and especially StackOverflow constitute the primary source of code-related information for amateur and professional software developers. Despite the obvious benefit of increased productivity, reuse can have a mixed effect on the quality of the resulting code depending on the properties of the reused solutions. An efficient concept for capturing a wide-range of internal software qualities is the metaphor of Tech-nical Debt which expresses the impact of shortcuts in software development on its maintenance costs. In this paper, we pre-sent the results from an empirical study on the effect of code retrieved from StackOverflow on the technical debt of the tar-get system. In particular, we study several open-source projects and identify non-trivial pieces of code that exhibit a perfect or near-perfect match with code provided in the context of an-swers in StackOverflow. Then, we compare the technical debt density of the reused fragments—obtained as the ratio of inef-ficiencies identified by SonarQube over the lines of reused code—to the technical debt density of the target codebase. The results provide insight to the potential impact of code reuse on technical debt and highlight the benefits of assessing code qual-ity before committing changes to a repository
Conference Paper
Software reuse is a popular practice, which is constantly gaining ground among practitioners. The main reason for this is the potential that it provides for reducing development effort and increasing the end-product quality. At the same time, Open-Source Software (OSS) repositories are nowadays flourishing and can facilitate the reuse process, through the provision of a variety of software artifacts. However, up-to-date OSS reuse processes have mostly been opportunistic, leading to not fully capitalizing existing reuse potentials. In this study we propose a process (namely REACT) for improving planned OSS reuse practices, i.e., we define the activities that a software engineer can perform to reuse OSS artifacts. To illustrate the applicability of REACT, we provide an example, in which a mobile application is developed based upon the reuse of OSS artifacts. To validate the proposed process we compared the effort required to develop the application with and without adapting REACT process. Our preliminary results suggest that REACT may reduce up to 50% the effort required to build an application from scratch.
Conference Paper
The reuse of reliable, domain-specific software components is a strategy commonly used in the avionics industry to develop safety critical airborne systems. One method of achieving reuse is to use domain specific languages that map closely onto abstractions in the problem domain. While this works well for control algorithms, it is less successful for some complex ancillary functions such as failure management. The characteristics of device failures are often difficult to predict resulting in late requirements changes. Hence a small semantic gap is especially desirable but difficult to achieve. Object-oriented design techniques include mechanisms, such as inheritance, that cater well for variations in behaviour. However, object-oriented notations such as the UML lack the precision, and rigor, needed for safety critical software. UML-B is a profile of the UML for formal modelling. In this paper we show how UML-B can be used to model failure management systems via progressive refinement, and indicate how this approach could utilise UML concepts to cope with high variability, while providing rigorous verification.
Article
My last column ended with some comments about Kuhn and word2vec. Word2vec has racked up plenty of citations because it satisifies both of Kuhn's conditions for emerging trends: (1) a few initial (promising, if not convincing) successes that motivate early adopters (students) to do more, as well as (2) leaving plenty of room for early adopters to contribute and benefit by doing so. The fact that Google has so much to say on 'How does word2vec work' makes it clear that the definitive answer to that question has yet to be written. It also helps citation counts to distribute code and data to make it that much easier for the next generation to take advantage of the opportunities (and cite your work in the process).
Book
With the rapid development of computing hardware, high-speed network, web programming, distributed and parallel computing, and other technologies, cloud computing has recently emerged as a commercial reality. Software Reuse in the Emerging Cloud Computing Era targets a spectrum of readers, including researcher, practitioners, educators, and students and even part of the end users in software engineering, computing, networks and distributed systems, and information systems. The handbook will help to clarify the present fast-advancing literature of the current state of art and knowledge in the areas of the development and reuse of reusable assets in emerging software systems and applications, as part of the information science and technology literature. It will no doubt expand the above literature, and promote the exchange and evolution of the above advances in software reuse and cloud computing among multiple disciplines, and a wide spectrum of research, industry, and user communities.
Book
Introduction Design of the Case Study Data Collection Data Analysis Reporting and Dissemination Lessons Learned
Chapter
Abstract This chapter provides an overview of social customer relationship management (CRM) and explores the Web-based platforms that provide social CRM solution in software as a service (SaaS) model as well as the applications and tools that complement traditional CRM systems. Based on a review of current practices, the chapter also outlines the potential benefits social CRM provides to organizations in their sales, service, and marketing efforts. Furthermore, while the Web and its new breed of technologies and applications open new opportunities for businesses, these technologies also pose several new challenges for organizations in implementation, integration, data security, and consumer privacy, among others. In addition, these technologies can be exploited in a negative way to propagate misinformation against businesses and their reputations. In view of this, this chapter also examines ethical and legal challenges businesses could face in embracing social media technologies at the core of their customer management processes and systems.
Article
Software reuse is the process of creating software systems from existing software rather than building software systems from scratch. This simple yet powerful vision was introduced in 1968. Software reuse has, however, failed to become a standard software engineering practice. In an attempt to understand why, researchers have renewed their interest in software reuse and in the obstacles to implementing it. This paper surveys the different approaches to software reuse found in the research literature. It uses a taxonomy to describe and compare the different approaches and make generalizations about the field of software reuse. The taxonomy characterizes each reuse approach in terms of its reusable artifacts and the way these artifacts are abstracted, selected, specialized, and integrated . Abstraction plays a central role in software reuse. Concise and expressive abstractions are essential if software artifacts are to be effectively reused. The effectiveness of a reuse technique can be evaluated in terms of cognitive distance —an intuitive gauge of the intellectual effort required to use the technique. Cognitive distance is reduced in two ways: (1) Higher level abstractions in a reuse technique reduce the effort required to go from the initial concept of a software system to representations in the reuse technique, and (2) automation reduces the effort required to go from abstractions in a reuse technique to an executable implementation. This survey will help answer the following questions: What is software reuse? Why reuse software? What are the different approaches to reusing software? How effective are the different approaches? What is required to implement a software reuse technology? Why is software reuse difficult? What are the open areas for research in software reuse?
Article
The reuse of reliable, domain-specific software components is a strategy commonly used in the avionics industry to develop safety critical airborne systems. One method of achieving reuse is to use domain specific languages that map closely onto abstractions in the problem domain. While this works well for control algorithms, it is less successful for some complex ancillary functions such as failure management. The characteristics of device failures are often difficult to predict resulting in late requirements changes. Hence a small semantic gap is especially desirable but difficult to achieve. Object-oriented design techniques include mechanisms, such as inheritance, that cater well for variations in behaviour. However, object-oriented notations such as the UML lack the precision, and rigor, needed for safety critical software. UML-B is a profile of the UML for formal modelling. In this paper we show how UML-B can be used to model failure management systems via progressive refinement, and indicate how this approach could utilise UML concepts to cope with high variability, while providing rigorous verification.
Article
The authors explore the concept of software as a serviced, which envisages a demand-led software market in which businesses assemble and provide services when needed to address a particular requirement. The SaaS vision focuses on separating the possession and ownership of software from its use. Delivering software's functionality as a set of distributed services that can be configured and bound at delivery time can overcome many current limitations constraining software use, deployment, and evolution.
React - a process for improving open-source software reuse
  • A Lampropoulos
  • A Ampatzoglou
  • S Bibi
  • A Chatzigeorgiou
  • I Stamelos
A. Lampropoulos, A. Ampatzoglou, S. Bibi, A. Chatzigeorgiou, and I. Stamelos, "React -a process for improving open-source software reuse," in 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC), 2018, pp. 251-254.
A text mining based approach for web service classification
  • R Nisa
  • U Qamar
R. Nisa and U. Qamar, "A text mining based approach for web service classification," Information Systems and e-Business Management, vol. 13, no. 4, pp. 751-768, 2015.
BERT: Pre-training of deep bidirectional transformers for language understanding
  • J Devlin
  • M.-W Chang
  • K Lee
  • K Toutanova
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, MN, USA: Association for Computational Linguistics, 2019, pp. 4171-4186.
BERT: Pre-training of deep bidirectional transformers for language understanding
  • devlin