Kirity Rapuru’s research while affiliated with Otto-von-Guericke University Magdeburg and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


Analysis and Comparison of Block-Splitting-Based Load Balancing Strategies for Parallel Entity Resolution
  • Conference Paper

November 2020

·

25 Reads

·

Nishanth Entoor Venkatarathnam

·

Kirity Rapuru

·

[...]

·


Exploring Spark-SQL-Based Entity Resolution Using the Persistence Capability: 14th International Conference, BDAS 2018, Held at the 24th IFIP World Computer Congress, WCC 2018, Poznan, Poland, September 18-20, 2018, Proceedings
  • Chapter
  • Full-text available

August 2018

·

91 Reads

·

2 Citations

Communications in Computer and Information Science

Download

Performance Comparison of Three Spark-Based Implementations of Parallel Entity Resolution: DEXA 2018 International Workshops, BDMICS, BIOKDD, and TIR, Regensburg, Germany, September 3–6, 2018, Proceedings

August 2018

·

178 Reads

·

3 Citations

Communications in Computer and Information Science

During the last decade, several big data processing frameworks have emerged enabling users to analyze large scale data with ease. With the help of those frameworks, people are easier to manage distributed programming, failures and data partitioning issues. Entity Resolution is a typical application that requires big data processing frameworks, since its time complexity increases quadratically with the input data. In recent years Apache Spark has become popular as a big data framework providing a flexible programming model that supports in-memory computation. Spark offers three APIs: RDDs, which gives users core low-level data access, and high-level APIs like DataFrame and Dataset, which are part of the Spark SQL library and undergo a process of query optimization. Stemming from their different features, the choice of API can be expected to have an influence on the resulting performance of applications. However, few studies offer experimental measures to characterize the effect of such distinctions. In this paper we evaluate the performance impact of such choices for the specific application of parallel entity resolution under two different scenarios, with the goal to offer practical guidelines for developers.