Conference Paper

Confidentiality, Integrity, and Availability (CIA) Approach for Private Dataset Management System Design

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Dataset management systems are essential for assisting research and development (R&D) organizations in applying data governance protocols, especially in managing the utilization of datasets. In R&D, datasets are vital sources for producing analysis, machine learning prediction, and supporting decision-making. As such, it is important to deploy dataset management systems with considerations regarding the aspects of Confidentiality, Integrity, and Availability (CIA) which is a design principle that emphasized information protection and collaboration facilitation among dataset users. The study proposed an analysis and a design for a dataset management system that takes CIA aspects into account. In the future, the proposed design will be utilized as a benchmark for building a dataset management system. In this study, the Waterfall method was adopted. A use case diagram, an ERD (Entity Relationship Diagram), and an activity diagram were constructed in the design stage. In the project implementation stage, the Lara

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Numerous research on stunting supplementation interventions in Indonesia have been published. The information can be extracted through data mining, especially from academic research databases. In this paper, we presented a text mining-based literature review strategy to create a pipeline that researchers can use to accelerate the development of stunting supplementation intervention research in Indonesia. Utilizing various NLP (Natural Language Processing) techniques, data were crawled, processed, and visualized using Python. The crawling dataset used a module from the Pubmed API (Application Programming Interface) to collect literature papers. The NLTK (Natural Language Toolkit) module and itertools were used to process text data. The n-grams model was applied to process tokens into bigrams and trigrams. Text information was visualized using Matplotlib and Word cloud packages. There is an increasing number of publication in stunting supplementation intervention according to our result, which was observed from 2015 to 2021. West Java was the province where most of the stunting research has been conducted, as determined by research abstracts. Top occurrences obtained from the bigram and trigrams models calculation produced different terms. The word pairings that occurred the most frequently in the bigram and trigram model analyses were "child-aged" and "iron-folic-acid," respectively. The findings of this study are expected to help researchers to obtain the latest research topics related to stunting supplementation interventions in Indonesia.
Article
Full-text available
The high prevalence of tuberculosis (TB) in Indonesia puts Indonesia in the second-highest national TB prevalence in the world after India. This high prevalence can cause a failure to deliver medical treatments to TB patients, which is exacerbated by the disproportionate distribution of doctors in Indonesia. To address this issue, an artificial intelligence (AI) system is necessary to help doctors in screening a large number of patients in a short time. However, to develop a robust AI for this purpose, we need a large dataset. This study aims to develop a database system for storing TB sputum sample images, which can be used as the dataset to train an AI system for TB detection. The developed system can help doctors and health workers to manage the images during their daily job.
Article
Full-text available
Evidence shows that appropriate use of technology in education has the potential to increase the effectiveness of, eg, teaching, learning and student support. There is also evidence that technology can introduce new problems and ethical issues, eg, student privacy. This article maps some limitations of technological approaches that ensure student data privacy in learning analytics from a critical data studies (CDS) perspective. In this conceptual article, we map the claims, grounds and warrants of technological solutions to maintaining student data privacy in learning analytics. Our findings suggest that many technological solutions are based on assumptions, such as that individuals have control over their data (‘data as commodity’), which can be exchanged under agreed conditions, or that individuals embrace their personal data privacy as a human right to be respected and protected. Regulating student data privacy in the context of learning analytics through technology mostly depends on institutional data governance, consent, data security and accountability. We consider alternative approaches to viewing (student) data privacy, such as contextual integrity; data privacy as ontological; group privacy; and indigenous understandings of privacy. Such perspectives destabilise many assumptions informing technological solutions, including privacy enhancing technology (PET). Practitioner notes What is already known about this topic Various actors (including those in higher education) have access to and collect, use and analyse greater volumes of personal (student) data, with finer granularity, increasingly from multiplatforms and data sources. There is growing awareness and concern about individual (student) privacy. Privacy enhancing technologies (PETs) offer a range of solutions to individuals to protect their data privacy. What this paper adds A review of the assumption that technology provides adequate or complete solutions for ensuring individual data privacy. A mapping of five alternative understandings of personal data privacy and its implications for technological solutions. Consideration of implications for the protection of student privacy in learning analytics. Implications for practice and/or policy Student data privacy is not only a technological problem to be solved but should also be understood as a social problem. The use of PETs offers some solutions for data privacy in learning analytics. Strategies to protect student data privacy should include student agency, literacy and a whole‐system approach.
Article
Full-text available
The benefits and drawbacks of various technologies, as well as the scope of their application, are thoroughly discussed. The use of anonymity technology and differential privacy in data collection can aid in the prevention of attacks based on background knowledge gleaned from data integration and fusion. The majority of medical big data are stored on a cloud computing platform during the storage stage. To ensure the confidentiality and integrity of the information stored, encryption and auditing procedures are frequently used. Access control mechanisms are mostly used during the data sharing stage to regulate the objects that have access to the data. The privacy protection of medical and health big data is carried out under the supervision of machine learning during the data analysis stage. Finally, acceptable ideas are put forward from the management level as a result of the general privacy protection concerns that exist throughout the life cycle of medical big data throughout the industry.
Article
Full-text available
Cervical cancer is one of the most leading causes of death for women worldwide. To reduce the mortality caused by cervical cancer, early screening techniques such as pap smear need to be carried out more extensively. Forthat, the availability of an automatic screening system is essential. In this paper, we proposed a system that can collect the dataset needed to train an Artificial Intelligence (AI) system for the automatic pap smear screening system. The proposed system can be integrated seamlessly to the current pap smear result recording procedure; hence avoiding any possible complication.
Article
Full-text available
Digital personal data is increasingly framed as the basis of contemporary economies, representing an important new asset class. Control over these data assets seems to explain the emergence and dominance of so-called "Big Tech" firms, consisting of Apple, Microsoft, Amazon, Google/Alphabet, and Facebook. These US-based firms are some of the largest in the world by market capitalization, a position that they retain despite growing policy and public condemnation-or "techlash"-of their market power based on their monopolistic control of personal data. We analyse the transformation of personal data into an asset in order to explore how personal data is accounted for, governed, and valued by Big Tech firms and other political-economic actors (e.g., investors). However, our findings show that Big Tech firms turn "users" and "user engagement" into assets through the performative measurement, governance, and valuation of user metrics (e.g., user numbers, user engagement), rather than extending ownership and control rights over personal data per se. We conceptualize this strategy as a form of "techcraft" to center attention on the means and mechanisms that Big Tech firms deploy to make users and user data measurable and legible as future revenue streams.
Article
Full-text available
In recent years, research in human counting from CCTV (Closed Circuit Television) images have found an increasing demand to be deployed in real-world applications. The applications have been implemented in various settings, both indoor and outdoor. In the case of indoor setting, we found a type of room setting that conveys a problem to human counting model if we need to count only humans inside a room. With this respect, we present RHC (Room Human Counting) dataset, which images are captured in the aforementioned setting. The dataset can be used to develop a robust model that can differentiate between humans inside and outside a room. The dataset is publicly available at https://data.mendeley.com/datasets/vt5c8h6kmh/1.
Article
Full-text available
The domestic combustion of polluting fuels is associated with an estimated 3 million premature deaths each year and contributes to climate change. In many low- and middle-income countries (LMICs), valid and representative estimates of people exposed to household air pollution (HAP) are scarce. The Demographic and Health Survey (DHS) is an important and consistent source of data on household fuel use for cooking and has facilitated studies of health effects. However, the body of research based on DHS data has not been systematically identified, nor its strengths and limitations critically assessed as a whole. We aimed to systematically review epidemiological studies using DHS data that considered cooking fuel type as the main exposure, including the assessment of the extent and key drivers of bias. Following PRISMA guidelines, we searched PubMed, Web of Science, Scopus and the DHS publication portal. We assessed the quality and risk of bias (RoB) of studies using a novel tool. Of 2748 records remaining after removing duplicates, 63 were read in full. A total of 45 out of 63 studies were included in our review, spanning 11 different health outcomes and representing 50 unique analyses. In total, 41 of 45 (91%) studies analysed health outcomes in children
Article
Full-text available
The concept of smart building includes the optimization of energy usage in a building. One of the possible solutions for this is to adaptively adjust appliances utilization according to activity level in the building. Thus, an intelligent activity estimation system needs to be developed. However, massive annotated dataset is necessary to train the system. Therefore, we propose a system that enables rapid data annotation to collect the massive dataset. With the proposed system, an image can be annotated within 4.8 seconds in average.
Article
Full-text available
Now days, health prediction in modern life becomesvery much essential. Big data analysis plays a crucial role to predict future status of healthand offerspreeminenthealth outcome to people. Heart disease is a prevalent disease cause’s death around the world. A lotof research is going onpredictive analytics using machine learning techniques to reveal better decision making. Big data analysis fosters great opportunities to predict future health status from health parameters and provide best outcomes. WeusedBig Data Predictive Analytics Model for Disease Prediction using Naive Bayes Technique (BPA-NB). It providesprobabilistic classification based on Bayes’ theorem with independence assumptions between the features. Naive Bayes approach suitable for huge data sets especially for bigdata. The Naive Bayes approachtrain the heart disease data taken from UCI machine learning repository. Then, it was making predictions on the test data to predict the classification. The results reveal that the proposed BPA-NB scheme providesbetter accuracy about 97.12% to predict the disease rate. The proposed BPA-NB scheme used Hadoop-spark as big data computing tool to obtain significant insight on healthcare data. The experiments are done to predict different patients’ future health condition. It takes the training dataset to estimate the health parameters necessary for classification. The results show the early disease detection to figure out future health of patients.
Article
Full-text available
This study explores a big and open database of soccer leagues in 10 European countries. Data related to players, teams and matches covering seven seasons (from 2009/2010 to 2015/2016) were retrieved from Kaggle, an online platform in which big data are available for predictive modelling and analytics competition among data scientists. Based on both preliminary data analysis, experts’ evaluation and players’ position on the football pitch, role-based indicators of teams’ performance have been built and used to estimate the win probability of the home team with the binomial logistic regression (BLR) model that has been extended including the ELO rating predictor and two random effects due to the hierarchical structure of the dataset. The predictive power of the BLR model and its extensions has been compared with the one of other statistical modelling approaches (Random Forest, Neural Network, k-NN, Na¨ıve Bayes). Results showed that role-based indicators substantially improved the performance of all the models used in both this work and in previous works available on Kaggle. The base BLR model increased prediction accuracy by 10 percentage points, and showed the importance of defence performances, especially in the last seasons. Inclusion of both ELO rating predictor and the random effects did not substantially improve prediction, as the simpler BLR model performed equally good. With respect to the other models, only Na¨ıve Bayes showed more balanced results in predicting both win and no-win of the home team.
Article
Full-text available
Studies are underway to catalog genetic diversity in plants and animals and identify genetic variants predictive of economically important traits (e.g., yield, resistance to disease). The high-throughput DNA genotyping and sequencing technologies used in these studies produce large amounts of complex data. Agriculture geneticists are faced with numerous computational obstacles with storing, processing, and analyzing genetic and trait data. We introduce a web application for large-scale agriculture genetic diversity and association studies that aims to simplify and automate many of the data management and analysis tasks common across studies. We present a case study where our software is configured and populated with genome-wide data of over 750,000 genetic markers from the commercially available BovineHD array (Illumina Inc.). Our software is scalable to multiple species and applicable to a wide range of genotyping and sequencing technologies and study designs.
Article
In modern reproducible, hypothesis-driven plant research, scientists are increasingly relying on research data management (RDM) services and infrastructures to streamline the processes of collecting, processing, sharing, and archiving research data. FAIR (i.e., findable, accessible, interoperable, and reusable) research data play a pivotal role in enabling the integration of interdisciplinary knowledge and facilitating the comparison and synthesis of a wide range of analytical findings. The PLANTdataHUB offers a solution that realizes RDM of scientific (meta)data as evolving collections of files in a directory - yielding FAIR digital objects called ARCs - with tools that enable scientists to plan, communicate, collaborate, publish, and reuse data on the same platform while gaining continuous quality control insights. The centralized platform is scalable from personal use to global communities and provides advanced federation capabilities for institutions that prefer to host their own satellite instances. This approach borrows many concepts from software development and adapts them to fit the challenges of the field of modern plant science undergoing digital transformation. The PLANTdataHUB supports researchers in each stage of a scientific project with adaptable continuous quality control insights, from the early planning phase to data publication. The central live instance of PLANTdataHUB is accessible at (https://git.nfdi4plants.org), and it will continue to evolve as a community-driven and dynamic resource that serves the needs of contemporary plant science.
Conference Paper
In the field of bioinformatics, the protein Post-Translational Modification (PTM) site prediction has been widely studied and Web Information Systems (WIS) has been deployed by researchers for this task. Through a literature review and benchmarking process, we identified the requirements which included quick predictions, efficient memory usage, and input validations. However, no detailed designs have been proposed so far, which may have contributed to some requirements not being implemented in some of the websites. Therefore, we propose a detailed WIS conceptual design, which can be used for predicting the sites of multiple PTM types, equipped with a validation algorithm and compared the usage of various string searching algorithms as well as file storage formats. Experiment results showed that the linear search algorithm is the fastest for this task and storing the protein data in npz format when performing multi-PTMs site prediction can assist in reducing memory usage. The proposed design can be implemented into user-friendly web tools that are both efficient in speed and memory usage in future studies.
Article
Academic libraries are one of the vital institutions that collect and manage the intellectual and cultural output of human beings, both in print and digital forms. Along with the growth of digital collections, the threats to or vulnerability of those collections are also increasing. This paper aims to describe the digital disaster preparedness measures implemented by respondent academic libraries in Indonesia that participated in the study. The study was conducted in Jakarta, Indonesia. The finding shows that even though libraries have taken steps to protect their digital data and collections, most of the libraries that participated in this study do not have a written digital disaster preparedness policy and had never conducted a risk analysis of the potential for digital disasters in their institutions. This research recommends that the libraries need to do a risk assessment first to prevent and manage any potential disaster.
Article
Industry 4.0 and the associated IoT and data applications are evolving rapidly and expand in various fields. Industry 4.0 also manifests in the farming sector, where the wave of Agriculture 4.0 provides multiple opportunities for farmers, consumers and the associated stakeholders. Our study presents the concept of Data Sharing Agreements (DSAs) as an essential path and a template for AI applications of data management among various actors. The approach we introduce adopts design science principles and develops role-based access control based on AI techniques. The application is presented through a smart farm scenario while we incrementally explore the data sharing challenges in Agriculture 4.0. Data management and sharing practices should enforce defined contextual policies for access control. The approach could inform policymaking decisions for role-based data management, specifically the data-sharing agreements in the context of Industry 4.0 in broad terms and Agriculture 4.0 in specific.
Article
Internet of Things (IoT) has fundamentally changed the way information technology and communication environments work, with significant advantages derived from wireless sensors and nanotechnology, among others. While IoT is still a growing and expanding platform, the current research in privacy and security shows there is little integration and unification of security and privacy that may affect user adoption of the technology because of fear of personal data exposure. The surveys conducted so far focus on vulnerabilities based on information exchange technologies applicable to the Internet. None of the surveys has brought out the integrated privacy and security perspective centered on the user. The aim of this paper is to provide the reader with a comprehensive discussion on the current state of the art of IoT, with particular focus on what have been done in the areas of privacy and security threats, attack surface, vulnerabilities and countermeasures and to propose a threat taxonomy. IoT user requirements and challenges were identified and discussed to highlight the baseline security and privacy needs and concerns of the user. The paper also proposed threat taxonomy to address the security requirements in broader perspective. This survey of IoT Privacy and Security has been undertaken through a systematic literature review using online databases and other resources to search for all articles that meet certain criteria, entering information about each study into a personal database, and then drawing up tables summarizing the current state of literature. As a result, the paper distills the latest developments in IoT privacy and security, highlights the open issues and identifies areas for further research.
Conference Paper
A database is an organized collection of data. Though a number of techniques, such as encryption and electronic signatures, are currently available for the protection of data when transmitted across sites. Database security refers to the collective measures used to protect and secure a database or database management software from illegitimate use and malicious threats and attacks. In this paper, we create 6 types of method for more secure ways to store and retrieve database information that is both convenient and efficient. Confidentiality, integrity, and availability, also known as the CIA triad, is a model designed to guide policies for information security within the database. There are many cryptography techniques available among them, ECC is one of the most powerful techniques. A user wants to the data stores or request, the user needs to authenticate. When a user who is authenticated, he will get key from a key generator and then he must be data encrypt or decrypt within the database. Every keys store in a key generator and retrieve from the key generator. We use 256 bits of AES encryption for rows level encryption, columns level encryption, and elements level encryption for the database. Next two method is encrypted AES 256 bits random key by using 521 bits of ECC encryption and signature for rows level encryption and column level encryption. Last method is most secure method in this paper, which method is element level encryption with AES and ECC encryption for confidentiality and ECC signature use for every element within the database for integrity. As well as encrypting data at rest, it's also important to ensure confidential data are encrypted in motion over our network to protect against database signature security. The advantages of elements level are difficult for attack because the attacker gets a key that is lose only one element. The disadvantages need to thousands or millions of keys to manage.
Conference Paper
The research discusses the implementation of open data program in DKI Jakarta Province. DKI Jakarta is the first region to implement an open data program in Indonesia. The implementation of open data program should pay attention to the various supporting aspects to run properly. This research uses a post-positivist approach by qualitative data analysis technique. The theory applied to this research is a theory of implementation drivers. There are three core components in the implementation drivers, namely competency drivers, organization drivers, and leadership drivers. The result of the research shows that the implementation of open data program has some disadvantages in fulfilling implementation drivers components for competency drivers and organization drivers. In the competency drivers component, indicators that have been in trouble are in staff selection, training, and coaching. Furthermore, in the organization drivers component, the problematic indicators are in facilitative administration and system level intervention.
Article
High quality models of factors influencing rice crop yield are needed in countries where rice is a staple food. These models can help select optimal rice varieties for expected field conditions. Development of a system to help scientist track and make decisions using this data is challenging. It involves incorporation of complex data structures - genomic, phenotypic, and remote sensing - with computationally intensive statistical modeling. In this article, the authors present a web portal designed to help researchers to manage and analyze their datasets, apply machine learning to detect how factors taken together influence crop production, and summarize the results to help scientists make decisions based on the learned models. The authors developed the system to be easily accessed by the entire team including rice scientist, genetics, and farmers. As such, they developed a system on a server architecture comprised of a SQLite database, a web interface developed in Python, the Celery job scheduler, and statistical computing in R.
Data annotation system for intelligent energy conservator in smart building
  • A Fali Oklilas
A. Fali Oklilas et al., "Data annotation system for intelligent energy conservator in smart building," IOP Conf Ser Earth Environ Sci, vol. 426, no. 1, p. 012008, Feb. 2020, doi: 10.1088/1755-1315/426/1/012008.
Development of XYZ University's Student Admission Site Using Waterfall Method
  • R M Firzatullah
R. M. Firzatullah, "Development of XYZ University's Student Admission Site Using Waterfall Method," Jurnal Mantik, vol. 5, no. 1, pp. 201-206, 2021.