Chapter

Data Analysis Using R Programming

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Data are facts or figures from which conclusions can be drawn. There are several steps involved in turning data into information, and these steps are known as data processing. This chapter describes data processing and how computers perform these steps efficiently and effectively. It will be indicated that many of these processing activities may be undertaken using R programming, or performed in an R environment with the aid of available R packages – where R functions and datasets are stored. Quality control is a regulatory procedure through which one may measure quality, with pre‐set standards, and then act on any differences. To learn to do statistical analysis and computations, one may start by considering the R programming language as a simple calculator! In epidemiology, after preparing the collected datasets to undertake biostatistical analysis, the first step is to enter the datasets into the R environment.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... They are tools for extracting data patterns and unknown data correlations in datasets. A few examples of the well-known data analysis tools are R-Programming [12] [13], Apache Hadoop [14] [15] [16], RapidMiner [17] [18] [19], and Microsoft Azure [20] [21] [22] [23]. Aside from data utilization, data privacy [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] must also be considered when datasets are allowed to be utilized in big data analytics. ...
Article
Full-text available
Data utility and data privacy are serious issues that must be considered when datasets are utilized in big data analytics such that they are traded off. That is, the datasets have high data utility and often have high risks in terms of privacy violation issues. To balance the data utility and the data privacy in datasets when they are provided to utilize in big data analytics, several privacy preservation models have been proposed, e.g., k-Anonymity, l-Diversity, t-Closeness, Anatomy, k-Likeness, and (lp1, . . . , lpn)-Privacy. Unfortunately, these privacy preservation models are highly complex data models and still have data utility issues that must be addressed. To rid these vulnerabilities of these models, a new privacy preservation model is proposed in this work. It is based on aggregate query answers that can guarantee the confidence of the range and the number of values that can be re-identified. Furthermore, we show that the proposed model is more effcient and effective in big data analytics by using extensive experiments.
R in a Nutshell – A Desktopbastopol
  • J. Adler
Statistical Methods for Environmental Epidemiology with R – A Case Study in Air Pollution and Health
  • R.D. Peng
  • F. Domonici
Rmetrics eBook Rmetrics Association and Finance Online Zurich
  • D Wuertz
  • Y Chalabi
  • W Chen
  • A Ellis