Roberto V. Zicari's research while affiliated with Goethe-Universität Frankfurt am Main and other places

Publications (21)

Chapter
Full-text available
Enlightenment is man’s emergence from his self-imposed immaturity. Immaturity is the inability to use one’s understanding without guidance from another.
Chapter
Die Digitalisierung schreitet ungebremst voran, gefährdet aber auch unsere Demokratie, wenn wir sie nicht zügeln. Was müssen wir tun?
Chapter
Die Macht der Daten lässt sich für gute und für schlechte Zwecke nutzen. Fünf Prinzipien für eine Big-Data-Ethik.
Chapter
Big Data, Nudging, Verhaltenssteuerung: Droht uns die Automatisierung der Gesellschaft durch Algorithmen und künstliche Intelligenz? Ein Appell zur Sicherung von Freiheit und Demokratie.
Conference Paper
Full-text available
To be able to handle big data workloads, modern NoSQL database management systems like Cassandra are designed to scale well over multiple machines. However, with each additional machine in a cluster, the likelihood for hardware failure increases. In order to still achieve high availability and fault tolerance, the data needs to be replicated within...
Research
Full-text available
Big Data, Artificial Intelligence, Big Nudging, and Cybernetic Society: Is the automation of society coming? A joint appeal to secure freedom and democracy. This is the English translation of the Digital Manifesto, which appeared in Spektrum der Wissenschaft http://www.spektrum.de/pdf/digital-­‐manifest/1376682
Chapter
Full-text available
In the first part of this chapter we illustrate how a big data project can be set up and optimized. We explain the general value of big data analytics for the enterprise and how value can be derived by analyzing big data. We go on to introduce the characteristics of big data projects and how such projects can be set up, optimized and managed. Two e...
Conference Paper
Full-text available
This work investigates the performance of Big Data applications in virtualized Hadoop environments, hosted on a single physical node. An evaluation and performance comparison of applications running on a virtualized Hadoop cluster with separated data and computation layers against standard Hadoop installation is presented. Our experiments show how...
Article
Full-text available
This report evaluates the new analytical capabilities of DataStax Enterprise (DSE) [1] through the use of standard Hadoop workloads. In particular, we run experiments with CPU and I/O bound micro-benchmarks as well as OLAP-style analytical query workloads. The performed tests should show that DSE is capable of successfully executing Hadoop applicat...
Article
Full-text available
In this report we investigate the performance of Hadoop clusters, deployed with separated storage and compute layers, on top of a hypervisor managing a single physical host. We have analyzed and evaluated the different Hadoop cluster configurations by running CPU bound and I/O bound workloads. The report is structured as follows: Section 2 provides...
Article
In this work, we present a system called PoliTwi, which was designed to detect emerging political topics (Top Topics) in Twitter sooner than other standard information channels. The recognized Top Topics are shared via different channels with the wider public. For the analysis, we have collected about 4,000,000 tweets before and during the parliame...
Article
Helpfulness prediction of online consumer reviews is an interesting research topic with immediate practical applications both from a data mining and marketing perspective. As such a set of studies have been published in the last few years to tackle this problem, targeting the reviews' textual characteristics. In this paper, we propose and evaluate...
Article
Full-text available
The well-known 3V architectural paradigm for Big Data introduced by Laney (2011), provides a simplified framework for defining the architecture of a big data platform to be deployed in various scenarios tackling processing of massive datasets. While additional components such as Variability and Veracity have been discussed as an extension to the 3V...
Article
Current trends on the Internet indicate an increasing supply of content from anonymous users (e.g. blogs), which may become popular among website visitors. Motivated by these Internet trends, the present study explores the tradeoffs between the source's reputation and the way content is displayed or offered on the web page as well as the effects of...
Conference Paper
In this paper, we report our experience in the implementation of a module for creating user profiles of Web visitors by using Zones, Weights and Actions. The module is part of Gugubarra 2.0, a tool for better understanding and management of communities of registered Web visitors, currently being developed by the database group at the Computer Scien...
Conference Paper
This paper addresses the issue of how to define clusters of web visitors with respect to their behavior and supposed interests. We will use the non-obvious user profiles (NOPs) approach defined in (10), and present a new clustering algorithm which is a combination of hierarchical clustering together with a centroid based method with priority, which...
Conference Paper
In (6) we have introduced the concept of non-obvious user profiles (NOPs) to capture the hypothetical interest of web users. In this paper we present the design principles and rules of our Gugubarra engine, which is a tool to calculate and visualize these non-obvious user profiles.

Citations

... But on the contrary, we now have overwhelming empirical evidence that AI applications are not objective or value neutral [35] or even robust or safe [39]. We now know that notwithstanding the conveniences it creates [5], AI applications disadvantage minorities, engenders inequality, and poses a big challenge to the stability of democratic processes [8,18,35]. ...
... Angesichts der gestiegenen Manipulations-und Missbrauchsmöglichkeiten, die Ökonomie und auch Politik auf der Grundlage von Big Data zugewachsen sind, erscheint die Schulung von Data-Literacy als eine dringliche Aufgabe für ‚Demokratien im Stresstest'. Programmatisch wurde gefordert, "die Mündigkeit der Bürger in der digitalen Welt zu fördern" (Helbing et al. 2015 Vorkenntnisse, verwendete Software usw. (vgl. ...
... A rejection seems particularly absurd when it is based on empirical evidence which, rooted in world wide data collection, cannot realistically be falsified; when a margin of statistical error can be provided, and is sufficiently small; and when the decision is made by a computer which, per wide-spread belief, cannot err. A critical mind might 13 User Data Manifesto 2.0 https://userdatamanifesto.org/, the European Digital Charta https:// digitalcharta.eu/, the Swiss manifest for digital democracy http://digital-manifest.ch/, or the digital manifesto in Helbing et al. (2015). 14 We leave aside for a moment the question of who may choose the target function for optimization. ...
... Another notable work in the adoption of gamification for data science education could be found in [22], in which the authors propose a tailored data science programme for anyone who wishes to become a data science professional. In this broad work, the authors identify various professions under the broad term data science according to the EDISON data science education framework [23] and also propose a mechanism to map learning resources to an appropriate profession using a method they call data crowdsourcing. ...
... Usually the technology specific benchmarks are used to simulate specific types of applications, which will be hosted on the platform and should run in an optimal way. At the Frankfurt Big Data Lab we use benchmarks to not only evaluate the performance of big data platforms [61] [62], but also to evaluate the availability and fault-tolerance [63]. ...
... Ivanov et al. [31] compared the performances of two enterprise-grade applications, DataStax Enterprise (DSE), a production-level implementation of Apache Cassandra with extended features like in-memory computing and advanced security, to name but two, and Cloudera's Distribution of Hadoop (CDH) comprising core Hadoop elements HDFS and YARN integrated with elements belonging to the Hadoop ecosystem. DSE's HDFS compatible file system(CSF) lets Hadoop applications run without any modification. ...
... As data are increasingly becoming the fuel of algorithms that are key in almost every process of companies, and play a role even in individuals' everyday life, a new awareness of the importance of how these data are managed and used is growing all over the world. The widespread opinion that "we are experiencing the largest transformation since the end of the Second World War" carries a clear message, as described in [4]: "after the automation of production and the creation of the self-driving cars the automation of society is next. With this, society is at a crossroads, which promises great opportunities, but also considerable risks. ...
... database, integration and analytics skills". This point is not trivial and James Kobielus adds [7]:"Data-driven organizations succeed when all personnel-both technical and business-have a common understanding of the core big data best skills, tools and practices. You need all the skills of data management, integration, modeling, and so forth that you already have running your data marts, warehouses, OLAP cubes". ...
... The textual content contains rich information [16], which is an ideal source [11] for learning helpfulness information. On the other hand, star ratings [40] provide a more straightforward form to quantify reviewers' opinions. The valence (positive or negative) [66] and extremity [15,45,57] of ratings are shown to have considerable impact on review helpfulness. ...
... Apache Spark terdiri dari beberapa komponen yaitu (1) Spark SQL adalah evolusi terbaru sesudah dari Shark SQL yang lebih dulu digunakan di dalam Spark. Spark SQL dapat melakukan penyimpanan memori atau dalam Bahasa yang lain (In-Memory Columnar Storage) dari Shark, Spark SQL juga compatible dengan Hive salah satu Query yang terdapat pada Hadoop (Ivanov et al. 2014 (Wang, Fu, and Wang 2016) (Aminudin 2019), (3) GraphX adalah adalah sebuah parallel komputasi dengan menggunakan API chart dan graph pada pemrosesan data. GraphX memiliki perbaikan besar pada kinerja dan penurunan overhead memori. ...