Conference Paper

Service Classification through Machine Learning: Aiding in the Efficient Identification of Reusable Assets in Cloud Application Development

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Developing software based on services is one of the most emerging programming paradigms in software development. Service-based software development relies on the composition of services (i.e., pieces of code already built and deployed in the cloud) through orchestrated API calls. Black-box reuse can play a prominent role when using this programming paradigm, in the sense that identifying and reusing already existing/deployed services can save substantial development effort. According to the literature, identifying reusable assets (i.e., components, classes, or services) is more successful and efficient when the discovery process is domain-specific. To facilitate domain-specific service discovery, we propose a service classification approach that can categorize services to an application domain, given only the service description. To validate the accuracy of our classification approach, we have trained a machine-learning model on thousands of open-source services and tested it on 67 services developed within two companies employing service-based software development. The study results suggest that the classification algorithm can perform adequately in a test set that does not overlap with the training set; thus, being (with some confidence) transferable to other industrial cases. Additionally, we expand the body of knowledge on software categorization by highlighting sets of domains that consist 'grey-zones' in service classification.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Over the last decades, web services are used for performing specific tasks demanded by users. The most important task of service’s classification system is to match an anonymous input service with the stored pre-classified web services. The most challenging issue is that web services are currently organized and classified according to syntax while the context of the requested service is ignored. Due to this motivation, Cloud-based Classification Methodology is proposed as it presents a new methodology based on semantic web service’s classification. Furthermore, cloud computing is used for not only storing but also allocating the high scale of web services with both high availability and accessibility. Fog technology is employed to reduce the latency and to speed up response time. The experimental results using the suggested methodology show a better performance of the proposed system regarding both precision and accuracy in comparison with most of the methods discussed in the literature of the current study.
Article
Full-text available
Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing an effective search query is not only challenging but also time-consuming for the developers according to existing studies. In this article, we propose a novel query reformulation technique–RACK–that suggests a list of relevant API classes for a natural language query intended for code search. Our technique offers such suggestions by exploiting keyword-API associations from the questions and answers of Stack Overflow (i.e., crowdsourced knowledge). We first motivate our idea using an exploratory study with 19 standard Java API packages and 344K Java related posts from Stack Overflow. Experiments using 175 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the Top-10 results for 83% of the queries, with 46% mean average precision and 54% recall, which are 66%, 79% and 87% higher respectively than that of the state-of-the-art. Reformulations using our suggested API classes improve 64% of the natural language queries and their overall accuracy improves by 19%. Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin. Investigation using three web/code search engines shows that our technique can significantly improve their results in the context of code search.
Article
Full-text available
With the emergence of the Programmable Web paradigm, the World Wide Web is evolving into a Web of Services, where data and services can be effectively reused across applications. Given the wide diversity and scale of published Web services, the problem of service discovery is a big challenge for service-based application development. This is further compounded by the limited availability of intelligent categorization and service management frameworks. In this paper, an approach that extends service similarity analysis by using morphological analysis and machine learning techniques for capturing the functional semantics of real-world Web services for facilitating effective categorization is presented. To capture the functional diversity of the services, different feature vector selection techniques are used to represent a service in vector space, with the aim of finding the optimal set of features. Using these feature vector models, services are classified as per their domain, using ensemble machine learning methods. Experiments were performed to validate the classification accuracy with respect to the various service feature vector models designed, and the results emphasize the effectiveness of the proposed approach.
Article
Full-text available
Servitization is one of the most significant trends that reshapes the information world and society in recent years. The requirement of collecting,storing, processing, and sharing of the Big Data has led to massive software resources being developed and made accessible as web-based services to facilitate such process. These services that handle the Big Data come from various domains and heterogeneous networks, and converge into a huge complicated service network (or ecosystem), called the Big Service.The key issue facing the big data and big service ecosystem is how to optimally configure and operate the related service resources to serve the specific requirements of possible applications, i.e., how to reuse the existing service resources effectively and efficiently to develop the new applications or software services, to meet the massive individualized requirements of end-users.Based on analyzing the big service ecosystem, we present in this paper a new paradigm for software service engineering, RE2SEP (Requirement-Engineering Two-Phase of Service Engineering Paradigm), which includes three components: service-oriented requirement engineering, domain-oriented service engineering, and software service development approach. RE2SEP enables the rapid design and implementation of service solutions to match the requirement propositions of massive individualized customers in the Big Service ecosystem. A case study on people's mobility service in a smart city environment is given to demonstrate the application of RE2SEP.RE2SEP can potentially revolutionize the traditional life-cycle oriented software engineering, leading to a new approach to software service engineering.
Article
Full-text available
Both practitioners and academics often frown on pragmatic and opportunistic reuse. Large organizations report that structured reuse methods and software product lines are often the way to go when it comes to efficient software reuse. However, opportunistic that is, nonstructured reuse has proven profitable for small to medium-sized organizations. Here, we describe two start-ups that have opportunistically and pragmatically developed their products, reusing functionality from others that they could never have built independently. We demonstrate that opportunistic and pragmatic reuse is necessary to rapidly develop innovative software products. We define pragmatic reuse as extending software with functionality from a third-party software supplier that was found without a formal search-and-procurement process and might not have been built with reuse in mind. We define opportunistic reuse as extending software with functionality from a third-party software supplier that wasn't originally intended to be integrated and reused.
Conference Paper
Full-text available
The increasing number of available online services demands distributed architectures to promote scalability as well as semantics to enable their precise and efficient retrieval. Two common approaches toward this goal are Semantic Overlay Networks (SONs) and Distributed Hash Tables (DHTs) with semantic extensions. This paper presents ERGOT, a system that combines DHTs and SONs to enable semantic-based service discovery in distributed infrastructures such as Grids and Clouds. ERGOT takes advantage of semantic annotations that enrich service specifications in two ways: (i) services are advertised in the DHT on the basis of their annotations, thus allowing to establish a SON among service providers, (ii) annotations enable semantic-based service matchmaking, using a novel similarity measure between service requests and descriptions. Experimental evaluations confirmed the efficiency of ERGOT in terms of accuracy of search and network traffic.
Conference Paper
Full-text available
A promising way of software reuse is Component-Based Software Development (CBSD). There is an increasing number of OSS products available that can be freely used in product development. However, OSS communities themselves have not yet taken full advantage of the “reuse mechanism”. Many OSS projects duplicate effort and code, even when sharing the same application domain and topic. One successful counter-example is the FFMpeg multimedia project, since several of its components are widely and consistently reused into other OSS projects. This paper documents the history of the libavcodec library of components from the FFMpeg project, which at present is reused in more than 140 OSS projects. Most of the recipients use it as a black-box component, although a number of OSS projects keep a copy of it in their repositories, and modify it as such. In both cases, we argue that libavcodec is a successful example of reusable OSS library of components.
Article
Full-text available
A Web service is a Web accessible software that can be published, located and invoked by using standard Web protocols. Automatically determining the category of a Web service, from several pre-defined categories, is an important problem with many applications such as service discovery, semantic annotation and service matching. This paper describes AWSC (Automatic Web Service Classification), an automatic classifier of Web service descriptions. AWSC exploits the connections between the category of a Web service and the information commonly found in standard descriptions. In addition, AWSC bridges different styles for describing services by combining text mining and machine learning techniques. Experimental evaluations show that this combination helps our classification system at improving its precision. In addition, we report an experimental comparison of AWSC with a related work.
Conference Paper
Full-text available
Software reuse can become a key factor for improving and guaranteeing software quality, when adopted systematically all along the software process. The main characteristic of reuse-oriented processes is that they require a common repository for storing, searching and retrieving software modules. Moreover, reuse occurs systematically and is an integrated part of the process. Previous works of the same authors have empirically shown that the full reuse maintenance model (FRM) slows down quality degradation following to maintenance interventions on a software system. This work is a further step in the investigation towards demonstrating how reuse oriented development (ROD) impacts on software quality; how it favors FRM model; and finally, whether reuse-oriented development influences productivity, and as so, is more efficient. This has been done through a case study carried out on two ongoing industrial projects. Results are positive and support our research hypotheses
Conference Paper
Full-text available
Domain engineering is successful in promoting reuse. An approach to domain-specific reuse in service-oriented environments is proposed to facilitate service requesters to reuse Web services. In the approach, we present a conceptual model of domain-specific services (called domain service). Domain services in a certain business domain are modeled by semantic and feature modeling techniques, and bound to Web services with diverse capabilities through a variability-supported matching mechanism. By reusing pre-modeled domain services, service requesters can describe their requests easily through a service customization mechanism. Web service selection based on customized results can also be optimized by reusing the pre-matching results between domain services and Web services. Feasibility of the whole approach is demonstrated on an example. Published (author's copy) Peer Reviewed
Article
In software technology, over the diversified environment, services can be rendered using an innovative mechanism of a novel paradigm called web services. In a business environment, rapid changes and requirements from various customers can be adapted using this service. For service management and discovery, the classification of Web services having the same functions is an efficient technique. However, there will be short lengthened Web services functional description documents, having less information, and sparse features. This makes difficulties in modelling short text in various topic models and leads to make an effect in the classification of Web services. A Mixed Wide and PSO-Bi-LSTM-CNN model (MW-PSO-Bi-LSTM-CNN) is proposed in this work for solving this issue. In this technique, the Web service category‟s breadth prediction is performed by combining Web services description document‟s discrete features, which exploits the wide learning model. In the next stage, the PSO-Bi-LSTM-CNN model is used for mining Web services description document word‟s context information and word order, for performing the Web service category‟s depth prediction. Here, particle swarm optimization (PSO) is integrated with the Bi-LSTM-CNN network for computing various hyper-parameters in an automatic manner. In third stage, Web service categories, results of depth, and breadth prediction are integrated using a linear regression model as final service classification result. At last, MW-PSO-Bi-LSTM-CNN, Wide&Bi-LSTM, and Wide&Deep web service classification techniques are compared and a better result with respect to web service classification accuracy is obtained using the proposed technique as shown in experimental results.
Conference Paper
Software reuse is a well-established software engineering pro-cess that aims at improving development productivity. Although reuse can be performed in a very systematic way (e.g., through product lines), in practice, reuse is performed in many cases opportunistically, i.e., copying small code chunks either from the web or in-house developed projects. Knowledge sharing communities and especially StackOverflow constitute the primary source of code-related information for amateur and professional software developers. Despite the obvious benefit of increased productivity, reuse can have a mixed effect on the quality of the resulting code depending on the properties of the reused solutions. An efficient concept for capturing a wide-range of internal software qualities is the metaphor of Tech-nical Debt which expresses the impact of shortcuts in software development on its maintenance costs. In this paper, we pre-sent the results from an empirical study on the effect of code retrieved from StackOverflow on the technical debt of the tar-get system. In particular, we study several open-source projects and identify non-trivial pieces of code that exhibit a perfect or near-perfect match with code provided in the context of an-swers in StackOverflow. Then, we compare the technical debt density of the reused fragments—obtained as the ratio of inef-ficiencies identified by SonarQube over the lines of reused code—to the technical debt density of the target codebase. The results provide insight to the potential impact of code reuse on technical debt and highlight the benefits of assessing code qual-ity before committing changes to a repository
Conference Paper
The reuse of reliable, domain-specific software components is a strategy commonly used in the avionics industry to develop safety critical airborne systems. One method of achieving reuse is to use domain specific languages that map closely onto abstractions in the problem domain. While this works well for control algorithms, it is less successful for some complex ancillary functions such as failure management. The characteristics of device failures are often difficult to predict resulting in late requirements changes. Hence a small semantic gap is especially desirable but difficult to achieve. Object-oriented design techniques include mechanisms, such as inheritance, that cater well for variations in behaviour. However, object-oriented notations such as the UML lack the precision, and rigor, needed for safety critical software. UML-B is a profile of the UML for formal modelling. In this paper we show how UML-B can be used to model failure management systems via progressive refinement, and indicate how this approach could utilise UML concepts to cope with high variability, while providing rigorous verification.
Book
With the rapid development of computing hardware, high-speed network, web programming, distributed and parallel computing, and other technologies, cloud computing has recently emerged as a commercial reality. Software Reuse in the Emerging Cloud Computing Era targets a spectrum of readers, including researcher, practitioners, educators, and students and even part of the end users in software engineering, computing, networks and distributed systems, and information systems. The handbook will help to clarify the present fast-advancing literature of the current state of art and knowledge in the areas of the development and reuse of reusable assets in emerging software systems and applications, as part of the information science and technology literature. It will no doubt expand the above literature, and promote the exchange and evolution of the above advances in software reuse and cloud computing among multiple disciplines, and a wide spectrum of research, industry, and user communities.
Article
The authors explore the concept of software as a serviced, which envisages a demand-led software market in which businesses assemble and provide services when needed to address a particular requirement. The SaaS vision focuses on separating the possession and ownership of software from its use. Delivering software's functionality as a set of distributed services that can be configured and bound at delivery time can overcome many current limitations constraining software use, deployment, and evolution.
React -a process for improving open-source software reuse
  • A Lampropoulos
  • A Ampatzoglou
  • S Bibi
  • A Chatzigeorgiou
  • I Stamelos
A. Lampropoulos, A. Ampatzoglou, S. Bibi, A. Chatzigeorgiou, and I. Stamelos, "React -a process for improving open-source software reuse," in 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC), 2018, pp. 251-254.
A composite classification model for web services based on semantic amp; syntactic information integration
  • A Ahmed
  • M Shankar
S. K. S., A. Ahmed, and M. Shankar, "A composite classification model for web services based on semantic amp; syntactic information integration," in 2015 IEEE International Advance Computing Conference (IACC), 2015, pp. 1169-1173.
A text mining based approach for web service classification
  • R Nisa
  • U Qamar
R. Nisa and U. Qamar, "A text mining based approach for web service classification," Information Systems and e-Business Management, vol. 13, no. 4, pp. 751-768, 2015.
A hybrid supervised/unsupervised machine learning approach to classify web services
  • Z Alizadeh-Sani
  • P Martínez
  • G González
  • A González-Briones
  • P Chamoso
  • J Corchado
Z. Alizadeh-Sani, P. Martínez, G. González, A. González-Briones, P. Chamoso, and J. Corchado, "A hybrid supervised/unsupervised machine learning approach to classify web services," in 2021 International Workshops of Practical Applications of Agents and Multi-Agent Systems (PAAMS). Cham: Springer, 2021, pp. 93-103.
BERT: Pre-training of deep bidirectional transformers for language understanding
  • J Devlin
  • M.-W Chang
  • K Lee
  • K Toutanova
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, MN, USA: Association for Computational Linguistics, 2019, pp. 4171-4186.