Article

Big data: A survey

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Big data provides the capability to investigate very large datasets. It mitigates food security issues [46], provides predictive analysis and real time decision making, and it introduces new business models [178,242]. All of these benefits can be attributed to the fact that the efficiency of the end-to-end supply chain in smart agricultural systems is improved. ...
... Support Vector Machines (SVM) and Artifical Neural Networks (ANN) have been applied in order to integrate a big data platform in order to ensure the safety of the milk production chain [110]. The big data workflow in a smart agriculture system is depicted in Fig. 4 [46,242]. It begins with the gathering of data at a variety of sensor nodes and finishes with the use of a variety of methodologies for data analysis, which can include both conventional and big data analysis. ...
Article
Full-text available
The world population is anticipated to increase by 2 billion by 2050, causing a rapid escalation of food demand. A recent projection shows that the world is lagging in accomplishing the “Zero Hunger” goal despite some advancements. Socioeconomic factors can affect food security, leading to malnutrition among vulnerable populations. The agricultural industry must be upgraded, smartened, and automated to serve the growing population. Adopting existing technologies can make traditional agriculture efficient, sustainable, and eco-friendly. In this survey, we present Agriculture 4.0 and its applications, technology trends, available datasets, networking, and implementation challenges. We concentrate on Artificial Intelligence (AI) and Machine Learning (ML) technologies that support automation, as well as Distributed Ledger Technology (DLT), which provides data integrity and security. Following an in-depth investigation of several architectures, we also provide a framework for smart agriculture that relies on data processing locations. We have discussed open research problems in smart agriculture from two perspectives: technology and communications. AI, ML, DLT, and Physical Unclonable Function (PUF)-based hardware security fall under the technology group, whereas Internet-based attacks, fake data injection, and similar threats fall under the network research problem group. The survey aims to provide researchers with an in-depth study of recent works, challenges, and open research problems in smart agriculture.
... The early 2000s saw the advent of big data, characterized by the exponential growth of data generated by digital technologies and the internet. This era provided the necessary fuel for modern AI systems, allowing machine learning algorithms to be trained on vast datasets and improving their performance significantly [7]. The combination of increased computational power, sophisticated algorithms, and abundant data led to breakthroughs in natural language processing, computer vision, and other AI applications. ...
... Machine learning algorithms can now uncover patterns and insights from data that were previously inaccessible, leading to improved decision-making and operational efficiency. For example, predictive analytics is used in healthcare to forecast disease outbreaks and in finance to predict market trends [7]. However, the use of AI in data analytics raises ethical concerns about privacy and data security, particularly when dealing with sensitive personal information. ...
Article
Full-text available
The rapid advancement of artificial intelligence (AI) technologies has fundamentally transformed the landscape of information technology (IT), offering unprecedented opportunities for innovation and efficiency. However, these advancements also bring significant ethical challenges, including issues of bias, privacy, transparency, and accountability. This paper explores these ethical challenges and proposes a comprehensive ethical framework for the responsible development and deployment of AI in IT. Through an examination of historical context, current trends, and detailed case studies, the framework aims to provide actionable guidelines to mitigate biases, protect privacy, enhance transparency, and ensure accountability in AI systems. By fostering ethical AI practices, this framework aspires to support the sustainable and equitable advancement of AI technologies, ultimately benefiting society as a whole
... The telecommunications industry is increasingly reliant on data-driven services, and cloud platforms provide the computational power and storage capacity necessary to support these services. This capability is particularly important in the era of 5G and the Internet of Things (IoT), where the demand for real-time data processing and analytics is paramount (Chen et al., 2014). By integrating cloud computing, telecommunications companies can offer advanced services such as enhanced mobile broadband, ultra-reliable low latency communication, and massive machine-type communication, thereby meeting the evolving needs of consumers and businesses. ...
... IoT devices generate massive amounts of data that need to be processed, analyzed, and stored efficiently. Cloud computing provides the necessary infrastructure to handle this data influx, enabling telecommunications companies to offer advanced IoT services such as smart cities, connected vehicles, and industrial automation (Chen et al., 2014). As IoT continues to expand, the integration of cloud computing will be essential in managing and leveraging the data generated by these connected devices. ...
Article
Full-text available
This paper explores the transformative potential of cloud computing in the telecommunications industry, emphasizing its scalability and flexibility. The objective is to analyze how cloud computing solutions can revolutionize telecommunications by providing scalable, cost-effective, and flexible infrastructures that accommodate the industry's growing demands. The research methodology involves a comprehensive literature review, case studies of leading telecommunications companies adopting cloud computing, and an analysis of industry reports and data. Key findings indicate that cloud computing significantly enhances the scalability of telecommunications networks, allowing for dynamic resource allocation and efficient handling of fluctuating traffic patterns. The flexibility of cloud-based solutions facilitates rapid deployment of new services, seamless integration with emerging technologies such as 5G and IoT, and improved disaster recovery capabilities. Additionally, cloud computing reduces capital expenditures and operational costs by shifting from traditional hardware-based models to virtualized environments. The paper concludes that cloud computing is a critical enabler for the future of telecommunications, offering a robust framework for innovation and growth. By leveraging cloud technologies, telecommunications providers can achieve greater agility, optimize network performance, and deliver enhanced services to customers. The study underscores the need for continued investment in cloud infrastructure and the development of standardized protocols to ensure interoperability and security. Ultimately, the adoption of cloud computing represents a paradigm shift that positions the telecommunications industry to meet future challenges and opportunities effectively.
... RF створює кілька дерев рішень і агрегує їхні прогнози, роблячи їх стійкими до переобладнання та здатними обробляти зашумлені дані. Цей метод ефективно використовується при вирішенні різних завдань, зокрема при аналізі експресії генів та класифікації захворювань [6]. ...
Article
Full-text available
У статті представлено комплексне порівняльне дослідження двох різних методологій класифікації послідовностей ДНК як у здорових людей, так і у хворих, описано переваги та обмеження їхнього застосування. Перший підхід передбачає представлення k-mer, де кожен можливий k-мер — підрядок довжини k в послідовності ДНК — кодується як двійкова характеристика. Потім ці функції класифікуються за допомогою алгоритму випадкового лісу (random forest — RF), потужної методики ансамблевого навчання, відомої своєю надійністю, здатністю обробляти дані великої розмірності та можливістю інтерпретації. Цей алгоритм створює кілька дерев рішень під час навчання та агрегує їхні прогнози, забезпечуючи надійну структуру класифікації в управлінні різноманітними та зашумленими даними. Другий підхід використовує згорткові нейронні мережі (convolutional neural networks — CNN), які безпосередньо навчаються на необроблених послідовностях ДНК, наданих у форматі FASTA. CNN розроблені для автоматичного вибору ієрархічних характеристик з вхідних даних за допомогою кількох рівнів згортання та об’єднання, що дозволяє їм розпізнавати складні моделі та тонкі варіації в послідовностях ДНК, які можуть вказувати на здоровий стан людини чи хворобу. В процесі навчання CNN використовується зворотне поширення — алгоритм, що широко застосовується для оптимізації нейронних мереж та ітеративно регулює ваги мережі, щоб мінімізувати помилку класифікації та підвищити точність прогнозування. Результати дослідження показують, що CNN, незважаючи на високу точність у визначенні складних шаблонів послідовності, вимагають значно більше обчислювальних ресурсів і гірше інтерпретуються порівняно з RF. Особливо ефективні CNN при виборі нелінійних зв’язків у даних, що робить їх придатними для завдань, де потрібна висока точність. Однак підхід RF пропонує більш ефективне, з погляду обчислень, рішення зі швидшим навчанням і прогнозуванням, а також забезпечує вищий ступінь інтерпретації. Це робить RF особливо цінним у контекстах, де важлива прозорість моделі, наприклад у нормативному середовищі або коли результати потрібно повідомляти зацікавленим сторонам без глибокої технічної експертизи.
... "The goal is to turn data into information and information into insight", former Hewlett-Packard CEO, Carly Fiorina, 2004. In recent years it has become much easier to record, store and accumulate data -leading to the concept of so-called "big data" (Chen et al., 2014). This has led to a huge growth in data-driven models, especially machine learning methods and statistical inference (Bishop, 2006;Girolami, 2011). ...
Chapter
Full-text available
In this Chapter we will discuss modelling and reductionism in science and engineering, and how this relates to the new idea of digital twins. In particular, we focus on the historical context of modelling and reductionism for dynamics and control of engineering systems. Both active and passive control methods will be discussed, including the novel ideas associated with the inerter. Based on a selected review of the philosophy of modelling, we consider the role of knowledge and complexity in model making. The related topics of systems engineering, uncertainty analysis and artificial intelligence are also briefly discussed in the context of digital twins. We will argue that utility, trust and insight are the three key properties of models that will ideally be extended to digital twins. We then consider how digital twins will require the dynamic assembly of digital objects in order to recreate emergent behaviours. In order to implement a digital twin, an operational platform is required. We briefly present an aircraft example of a digital twin operational platform. Lastly we consider digital twin knowledge models and ontologies, and how this topic might help shape digital twins in the future.
... This integration has fundamentally changed industry processes and is driving innovation. The problems and opportunities posed by Big Data, characterised by its immense size, quick production, and many formats, necessitate the use of sophisticated analytical methods to fully exploit its potential (Chen, Mao, & Liu, 2015). Artificial intelligence (AI) improves these endeavours by utilising advanced tools for analysing data and creating predictive models, which allows for more precise and perceptive decision-making (Cheng & Zhao, 2015). ...
Article
Full-text available
The intersection of Artificial Intelligence (AI) and Big Data represents a crucial advancement in the age of digital transformation, fundamentally altering industries and fundamentally transforming organisational operations. Artificial Intelligence (AI) has the ability to perform sophisticated data analysis and predictive modelling. This allows for the extraction of valuable insights from large datasets, leading to better informed decision-making and promoting innovation in various industries. This paper provides a thorough examination of the mutually beneficial connection between Artificial Intelligence (AI) and Big Data. It emphasises important approaches and technologies, including data mining, natural language processing, and neural networks, that enable this integration. The suggested technique exhibits substantial enhancements in predicting performance, with an accuracy rate of 95.6%. The evaluation metrics indicate a mean absolute error (MAE) of 0.401 and a root mean square error (RMSE) of 0.206, highlighting the success of the approach. These measurements demonstrate the method's accuracy and dependability in practical scenarios. This paper explores practical applications and case studies in several sectors such as healthcare, banking, retail, and transportation, illustrating the significant influence of AI-powered Big Data analytics. The topic encompasses ethical considerations and challenges, such as data privacy, algorithmic bias, and the imperative for openness, to ensure the responsible and fair use of these technologies.
Article
In modern society, an increasing number of occasions need to effectively verify people’s identities. Biometrics is the most effective technology for personal authentication. Theoretical research on automated biometrics recognition mainly started in the 1960s and 1970s. In the following 50 years, the research and application of biometrics have achieved fruitful results. Approximately 2014–2015, with the successful applications of some emerging information technologies and tools, such as deep learning, cloud computing, big data, mobile communication, smartphones, location-based services, blockchain, new sensing technology, the Internet of Things and federated learning, biometric technology entered a new development phase. Therefore, taking 2014–2015 as the time boundary, the development of biometric technology can be divided into two phases. In addition, according to our knowledge and understanding of biometrics, we further divide the development of biometric technology into three phases, i.e., biometrics 1.0, 2.0 and 3.0. Biometrics 1.0 is the primary development phase, or the traditional development phase. Biometrics 2.0 is an explosive development phase due to the breakthroughs caused by some emerging information technologies. At present, we are in the development phase of biometrics 2.0. Biometrics 3.0 is the future development phase of biometrics. In the biometrics 3.0 phase, biometric technology will be fully mature and can meet the needs of various applications. Biometrics 1.0 is the initial phase of the development of biometric technology, while biometrics 2.0 is the advanced phase. In this paper, we provide a brief review of biometrics 1.0. Then, the concept of biometrics 2.0 is defined, and the architecture of biometrics 2.0 is presented. In particular, the application architecture of biometrics 2.0 in smart cities is proposed. The challenges and perspectives of biometrics 2.0 are also discussed.
Article
Cloud‐based Electronic Health Records (EHRs) have seen a substantial increase in usage in recent years, especially for remote patient monitoring. Researchers are interested in investigating the use of Healthcare 4.0 in smart cities. This involves using Internet of Things (IoT) devices and cloud computing to remotely access medical processes. Healthcare 4.0 focuses on the systematic gathering, merging, transmission, sharing, and retention of medical information at regular intervals. Protecting the confidential and private information of patients presents several challenges in terms of thwarting illegal intrusion by hackers. Therefore, it is essential to prioritize the protection of patient medical data that is stored, accessed, and shared on the cloud to avoid unauthorized access or compromise by the authorized components of E‐healthcare systems. A multitude of cryptographic methodologies have been devised to offer safe storage, exchange, and access to medical data in cloud service provider (CSP) environments. Traditional methods have not been effective in providing a harmonious integration of the essential components for EHR security solutions, such as efficient computing, verification on the service side, verification on the user side, independence from a trusted third party, and strong security. Recently, there has been a lot of interest in security solutions that are based on blockchain technology. These solutions are highly effective in safeguarding data storage and exchange while using little computational resources. The researchers focused their efforts exclusively on blockchain technology, namely on Bitcoin. The present emphasis has been on the secure management of healthcare records through the utilization of blockchain technology. This study offers a thorough examination of modern blockchain‐based methods for protecting medical data, regardless of whether cloud computing is utilized or not. This study utilizes and evaluates several strategies that make use of blockchain. The study presents a comprehensive analysis of research gaps, issues, and a future roadmap that contributes to the progress of new Healthcare 4.0 technologies, as demonstrated by research investigations.
Article
This study proposes an unparalleled integrated scheduling model for simultaneous loading, unloading, and maintenance operations in a port container terminal, wherein several heterogeneous handling equipment serve inbound and outbound vessels. In this regard, a multi-mode loading approach and the possibility of multiple allocations are also considered, wherein outbound vessels can be loaded by containers belonging to inbound vessels or the storage area. In addition, the proposed model covers all cases of equality and inequality in the number of inbound and outbound containers. More importantly, a flexible loading approach based on a class-based stowage plan is considered to better utilize equipment and increase the efficiency of the loading process. Additionally, since the timing of maintenance operations undoubtedly affects scheduling and entails miscellaneous uncertainties, a two-phase data-driven method is presented to estimate robust maintenance operation times. The first phase involves a hybrid machine learning strategy that combines the Savitzky–Golay Filter (SGF), the Takagi–Sugeno–Kang (TSK) fuzzy system, and minibatch gradient descent with regularization, DropRule, and AdaBound (MBGD-RDA) for the estimation of maintenance operation times. The second phase employs the distributionally robust optimization (DRO) technique that leverages -divergence to address the uncertainties associated with the estimated times. Finally, a case study is conducted to demonstrate the effectiveness and applicability of the proposed framework, accompanied by an examination of various simulation experiments. The results obtained indicate that the implementation of a flexible loading strategy can lead to a 24% reduction in the loading time of vessels.
Article
Full-text available
Competitiveness and digitalization are important topics for businesses, as in the rapidly changing environment, they determine the ability to survive and thrive. This study examines the impact of information technology (IT) investments on firms’ competitiveness. The study adopts the dynamic capability approach to examine how IT investments enable firms to adapt to digital transformation and generate value. This study employs causal econometrics methods to test the hypothesis that supplementary IT investments enhance the growth, efficiency, and capital accumulation of firms, which are key indicators of ex-ante competitiveness. The hypotheses are tested on a dataset of 65536 Hungarian firms from 1999 to 2014. Empirical evidence was found to support these hypotheses and confirm the positive relationship between IT investments and firm-level growth, efficiency, and capital accumulation. The findings indicate that a small IT investment does not improve efficiency, while an excessive investment is likely to include irrational investments as well.
Article
Full-text available
The Rise of Big Data: Today, the scale and domain of massive data management has prolonged to an unmatched state. In this paper, we will examine the process of data science developed over the recent past and argue that artificial intelligence (AI) and data integration converge to create intelligent data ecosystem reaching new heights. This paper combines AI's strength to automate analytics and data integration, which is able to harmonize different data sources and shows how businesses can benefit from a unified approach for innovation, operational efficiency, decision-making. We also cover some of the key use cases in the industry, and ways these technologies are gaining momentum and reshaping the future of data science.
Thesis
Full-text available
In an era marked by the rapid digitization of data and the widespread adoption of cloud computing, the security of sensitive information has emerged as a paramount concern for organizations and individuals alike. This dissertation explores the integration of cryptographic techniques into cloud computing and big data security, emphasizing the need for robust protective measures against increasing threats. The study employs a comprehensive methodology based on the Waterfall Model to design and implement a functional web-based cryptographic prototype utilizing the Advanced Encryption Standard (AES) in Galois/Counter Mode (GCM). User-friendly interfaces developed with HTML, CSS, and JavaScript facilitate seamless encryption and decryption processes, allowing users to engage with cryptographic practices effectively. Through empirical user testing, the findings reveal that the prototype significantly enhances users' confidence in handling sensitive data while simultaneously fostering a deeper understanding of the importance of data security. Users expressed satisfaction with the application's usability, highlighting the effectiveness of intuitive design and clear instructions in promoting user engagement and compliance with best practices. The research underscores the significance of integrating cryptographic solutions within broader security frameworks to address regulatory challenges, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA). In conclusion, this study contributes to the existing body of knowledge by illustrating how practical applications of cryptography can mitigate risks associated with data breaches and unauthorized access in cloud environments. It advocates for ongoing user education and the adoption of advanced security measures, thereby paving the way for a more secure and resilient digital landscape in the face of evolving cybersecurity threats. Keywords: Data Protection, Cryptography, Cloud Computing, Big Data Security, Web-Based Portal, User Authentication.
Article
Full-text available
Building Information Modelling (BIM) has been increasingly integrated with Artificial Intelligence (AI) solutions to automate building construction processes. However, the methods for effectively transforming data from BIM formats, such as Industry Foundation Classes (IFC), into formats suitable for AI applications still need to be explored. This paper conducts a Systematic Literature Review (SLR) following the PRISMA guidelines to analyse current data preparation approaches in BIM applications. The goal is to identify the most suitable methods for AI integration by reviewing current data preparation practices in BIM applications. The review included a total of 93 articles from SCOPUS and WoS. The results include eight common data types, two data management frameworks, and four primary data conversion methods. Further analysis identified three barriers: first, the IFC format’s lack of support for time-series data; second, limitations in extracting geometric information from BIM models; and third, the absence of established toolchains to convert IFC files into usable formats. Based on the evidence, the data readiness is at an intermediate level. This research may serve as a guideline for future studies to address the limitations in data preparation within BIM for AI integration.
Chapter
Precision agriculture or smart farming brings the promise of significantly more efficient systems which lay a substantial foundation for the EU Green Deal in terms of carbon footprint reduction, sustainability, and increased productivity coupled with energy efficiency. However, deployment of technological solutions that enable this new era of agriculture is not without substantial challenges and significant barriers. This chapter addresses the many challenges from multiple perspectives. Beginning with the current agriculture landscape in terms of technology adoption barriers and challenges, we present future pathways toward a new era of smart farming. Next, issues specific to the employment of IoT in the agrifood sector are elaborated upon, ranging from informational, behavioral, and social issues to technological, business, and financial matters. Following this, the capability of the DEMETER project to address these is presented. In this respect, an overview of the DEMETER main concepts and objectives is provided, while the multi-actor approach employed to tackle the project’s vision is introduced. Finally, this chapter provides an outline of the 20 large-scale pilots to be executed by DEMETER, which spans 18 countries, 80 sites, and several thousands of farmers, devices, and machinery and aims to optimize various operations in the entire food supply chain resulting in reduced effort and costs, higher food quality and safety, reduced environmental footprint, increased stakeholder cooperation, etc. This chapter concludes with an insight on the current situation of the digital transformation of the agrifood sector and the role that DEMETER aims to fulfill in promoting this.
Article
In response to the COVID‐19 pandemic, an abrupt wave of digitisation and online migration swept the higher education institutions around the globe. In the aftermath of this digital transformation which endures as the legacy of the pandemic, what lacks in knowledge is how effective the anti‐COVID measures were in maintaining quality education. Using machine learning to analyse student grades as a proxy for educational standards, this study investigates and demonstrates the evaluative potential of machine learning (vs. traditional statistics) with respect to not only crisis responses in education but also applied studies such as Information Systems and Tourism. Main implication of this study is the analytical utility of machine learning even when educational data are irregular and small. However, incorporating accurate and meaningful data points into the existing online educational systems is crucial to leverage this utility of machine learning.
Chapter
This chapter explores the transformative role of AI-driven personalization in omnichannel marketing, emphasizing its importance in enhancing customer engagement and loyalty. It begins by defining omnichannel marketing and tracing its evolution from traditional to digital channels. The chapter explores into key components of AI-driven personalization, including data collection and analysis, customer segmentation, predictive analytics, and real-time personalization. Implementation strategies are discussed, highlighting the integration of AI tools, data management, and the design of personalized marketing strategies. The chapter also addresses best practices for overcoming common challenges and measuring customer engagement using key performance indicators (KPIs). Additionally, the chapter examines the future trends in AI-driven personalization, such as emerging AI technologies, integration with IoT and AR/VR, and the evolving landscape of marketing strategies.
Article
Full-text available
Today, making decisions in multi-parameter contexts requires long computational times and increasingly powerful computers. In addition, we are often interested not only in the final decision to be made, but above all in the decision-making path to follow, i.e. the set of subsequent steps to reach a final goal. After the realisation of a new methodology to connect the original space of the problem and a space with a reduced number of dimensions, in this paper we present the algorithms to build the decision path in a reduced space with a level of complexity less than that of the original space of the problem. Indeed, starting from a n-dimensional space represented by problem variables (referred to as CSF - Critical Success Factors as fixed by the expert), a dimensional embedding procedure is used to move to a two-dimensional space. In the 2-dimensional space thanks to new lattice motion algorithms, the decision support system can quickly determine the optimal solution with lower computational cost based on the decision-maker’s preferences. Then, thanks to an algorithm that takes into account the hierarchical order of importance of the 7 CSFs as per the expert’s liking or according to his optimization logics, the results are restored to the n-dimensional space from the 2-dimensional one and the final solution in the original space is shown. As we will see, the starting and ending states in the n-dimensional space (referred to as micro-states) when projected into the two-dimensional space generate states (referred to as macro-states) which are degenerate. In other words, the correspondence between micro-states and macro-states is not one-to-one, as multiple microstates correspond to one macro-state. As result, following the decision-maker’s preferences, the DSS will provide the decision-maker with the micro-state of interest in the n-dimensional space (dimensional emergence procedure) starting from the obtained optimal macro-state. The target of the present paper is the definition and the implementation of the algorithms to obtain optimal solution in 2-dimensional space. The optimisation is realising by reducing the information disorder and by increasing the dynamics of the system subjected to DSS for studying its evolution: this is made with some specific algorithms able to suggest the state transitions of the system as movements on a discrete 2-dimensional lattice. Moreover, some specific algorithms are developed to emerge from the 2-dimensional space to n-dimensional original space, where the original semantics of the problem takes place.
Article
Full-text available
Public Transportation (PT) is a universal service in most countries, and it is acknowledged for its social and environmental role in enhancing accessibility and promoting a sustainable transport system. However, when disruptions alter the service, the level of service (LoS) can be massively affected. Consequently, the perceived quality can be influenced, and users can be encouraged (or forced) to modify their subsequent modal choice, in accordance with the users’ socioeconomic profile. A survival analysis, namely a Cox proportional hazards model, was tested in Bologna, Italy, using real data provided by TPER S.p.A, specifically Automatic Vehicle Location (AVL) and Automatic Passenger Counter (APC). This analysis aimed to assess the variations in demand over time taking into account variables related to the socioeconomic characteristics of the demand and several service attributes. The results contribute to the literature in several ways. Firstly, they confirm the predominant role of PT in the modal alternative spectrum of disadvantaged users. Secondly, they provide insights into the perception of quality service among different user categories, including commuters and non-frequent users.
Article
Full-text available
As the demand for faster and more reliable mobile networks intensifies, the deployment of 5G has emerged as a transformative solution to meet the growing needs of connectivity. However, to fully leverage the potential of 5G networks, it is crucial to optimize their performance. This paper explores the application of Artificial Intelligence (AI) and Machine Learning (ML) algorithms in the performance tuning of 5G networks, focusing on areas such as network resource allocation, traffic prediction, and real-time decision-making. By analyzing vast datasets generated by 5G infrastructures, AI and ML enable dynamic adjustments, thereby improving network efficiency, reducing latency, and enhancing overall user experience. The integration of these advanced technologies allows for self-optimizing networks that adapt in real time, minimizing human intervention and operational costs. This research highlights the key algorithms and techniques used for performance optimization, discusses the challenges of implementing AI in real-world 5G networks, and outlines the future directions for achieving fully autonomous network management. Ultimately, the study illustrates how AI and ML are pivotal in driving the future of telecommunications through intelligent 5G network tuning.
Article
Full-text available
While Artificial Intelligence (AI), Data Science and Data Integration have dramatically changed the technological landscape in their own right, their impact is even more dramatic when combined. This article looks into the commonalities in these fields of practice and shows them to be extracts from a framework that incorporates integrated data management, advanced analytics capabilities and intelligent algorithms which collectively have had profound impacts on how we continue to innovate across industries. The article then explores the nature of these technologies at present, the associated challenges and their future potential to yield insights into strategic approaches toward the alignment of AI, Data Science and Data Integration with the common goal of converting data into actionable intelligence.
Article
Full-text available
Objetivo: Explorar as relações entre Ciência da Informação e Ciência de Dados de modo a identificar possíveis aplicações em Sistemas de Informação Governamentais, relacionando essas intersecções com benefícios relacionados ao Governo Eletrônico. Metodologia: Pesquisa exploratória com abordagem qualitativa, utilizando revisão bibliográfica e análise de conteúdo para análise dos dados obtidos, contextualização e classificação dos resultados.Resultados: A partir do levantamento bibliográfico encontraram-se significativas interações entre Ciência de Dados e Ciência da Informação com aplicações possíveis em Sistemas de Informação Governamentais, tendo-se criado grupos categóricos representativos dessas intersecções: dados abertos, interoperabilidade, Gestão do Conhecimento, analytics e Inteligência Artificial.Conclusões: Evidencia-se que os Sistemas de Informação Governamentais tendem a incorporar tecnologias para atender demandas por transparência, responsabilização, acesso à informação e melhores serviços governamentais, podendo-se correlacionar Ciência da Informação e Ciência de Dados em áreas como Dados Abertos, interoperabilidade, Gestão do Conhecimento, analytics e Inteligência Artificial. Entende-se que o estudo e aplicação dessas correlações pode gerar benefícios à operação de Sistemas de Informação Governamentais, considerando competências informacionais da Ciência da Informação como suporte ao uso de tecnologias, contribuindo para maior interação entre governo e sociedade e ao atendimento das necessidades informacionais envolvidas, havendo ampla possibilidade de estudos e aplicações da temática.
Article
Full-text available
Areas of computational mechanics such as uncertainty quantification and optimization usually involve repeated evaluation of numerical models that represent the behavior of engineering systems. In the case of complex nonlinear systems however, these models tend to be expensive to evaluate, making surrogate models quite valuable. Artificial neural networks approximate systems very well by taking advantage of the inherent information of its given training data. In this context, this paper investigates the improvement of the training process by including sensitivity information, which are partial derivatives w.r.t. inputs, as outlined by Sobolev training. In computational mechanics, sensitivities can be applied to neural networks by expanding the training loss function with additional loss terms, thereby improving training convergence resulting in lower generalisation error. This improvement is shown in two examples of linear and non-linear material behavior. More specifically, the Sobolev designed loss function is expanded with residual weights adjusting the effect of each loss on the training step. Residual weighting is the given scaling to the different training data, which in this case are response and sensitivities. These residual weights are optimized by an adaptive scheme, whereby varying objective functions are explored, with some showing improvements in accuracy and precision of the general training convergence.
Article
Full-text available
This article explores the concept of digitalisation and its impact on marketing management in the tourism industry. The study examines the evolution of digitalisation from its origins in the 1960s and 1970s to its current role in the economy, with a particular focus on the tourism sector. Objective of the research. The aim is to explain the multifaceted nature of digitalisation and highlight its importance as a transformative force in the tourism industry. The study aims to demonstrate how digitalisation fosters innovation, increases operational efficiency and creates personalised and interactive experiences for consumers. It seeks to provide a clear understanding of how digital tools and technologies can be strategically integrated into marketing practices to improve business results and customer satisfaction. Research methodology. The study employs a comprehensive literature review, in which the various definitions and interpretations of digitalisation proposed by renowned scholars are subjected to analysis. Furthermore, case studies and practical applications of digital technologies, including Big Data, blockchain, artificial intelligence, mobile applications, virtualisation technologies, and the Internet of Things (IoT) in the tourism industry, are also considered. A comparative analysis is applied to identify the most informative and widely recognised definitions of digitalisation, with particular emphasis on the definition proposed by Laudon and Laudon (2019) due to its clarity and comprehensiveness. Research results. The findings indicate that digitalisation is a critical catalyst for rewriting the travel narrative for consumers, transforming destinations into dynamic, interactive, and responsive ecosystems. The incorporation of digital technologies within the tourism sector not only optimises operational efficiency but also markedly enhances the customer experience. The implementation of personalised marketing, the utilisation of AI-based customer service tools and the application of predictive analytics have been identified as key benefits. However, the study also identifies challenges associated with data privacy, integration with existing systems, and the necessity for continuous adaptation and improvement of CRM strategies to align with evolving customer expectations. Practical implications. Digitalisation opens up significant opportunities for the tourism industry to innovate and maintain competitive advantage in a rapidly changing market environment.
Article
This study employs advanced data mining techniques to investigate the DASS‐42 questionnaire, a widely used psychological assessment tool. Administered to 680 students at Necmettin Erbakan University's Ahmet Kelesoglu Faculty of Education, the DASS‐42 comprises three distinct subscales—depression, anxiety and stress—each consisting of 14 items. Departing from traditional statistical methodologies, the study harnesses the power of the WEKA data mining program to analyse the dataset. Employing Naive Bayes (NB), Artificial Neural Network (ANN), Logistic Regression (LR), Support Vector Machine (SVM) and Random Forest (RF) algorithms, the research unveils novel insights. The ANN method emerges as a standout performer, achieving remarkable distinctiveness scores for all subscales: depression (99.26%), anxiety (98.67%) and stress (97.35%). The study highlights the potential of data mining in enhancing psychological assessment and showcases the ANN's prowess in capturing intricate patterns within complex psychological dimensions. By charting a course beyond conventional statistical methods, this research pioneers a new frontier for employing data mining within the realm of social sciences. As a result of the study, it is recommended that teacher candidates in the teacher education process should have knowledge about depression, anxiety and stress, and relevant courses on these topics should be added to the curriculum of teacher education programs.
Article
This article explores the transformative potential of quantum computing in the field of business analytics. It begins with an introduction to quantum computing, explaining its fundamental principles and recent advancements. The study highlights the limitations of current business analytics methods and demonstrates how quantum computing could address these limitations by offering enhanced data processing capabilities, advanced algorithms, and solutions to complex optimization problems. A comprehensive literature review is conducted to provide context and identify gaps in the existing research. The article then outlines a research design that incorporates both real-world and simulated data, using online datasets and quantum computing frameworks for analysis. The findings reveal significant opportunities for quantum computing to revolutionize business analytics, including improved efficiency, accuracy, and the ability tosolve previously intractable problems. However, the article also addresses key challenges such as technical limitations, cost, accessibility, and integration issues. The discussion highlights emerging trends and provides strategic recommendations for businesses considering the adoption of quantum computing. The article concludes with a summary of the implications of integrating quantum computing into business analytics and reflects onfuture potential and challenges.
Chapter
This chapter explores the ethical considerations and challenges associated with AI-driven visualizations. It highlights the importance of ethics in maintaining trust, fairness, transparency, and privacy. The chapter discusses key challenges such as bias, transparency, privacy, accountability, and accessibility. Strategies for addressing these challenges include implementing ethical AI frameworks, enhancing transparency, promoting fairness, ensuring privacy, and fostering an ethical culture. Case studies from IBM Watson and Microsoft AI are examined to illustrate these points. Future trends in AI and ML for data visualization are also considered, emphasizing the need for responsible use of technology.
Article
Full-text available
In the rapidly evolving field of commerce, data analytics has emerged as a transformative tool for driving growth and competitive advantage. This paper explores the role of data analytics in commerce, emphasizing how businesses leverage insights to make informed decisions, enhance operational efficiency, and stimulate growth. By integrating data-driven strategies, companies can uncover valuable patterns and trends that inform product development, marketing strategies, and customer engagement initiatives. The paper examines various data analytics techniques, including descriptive, diagnostic, predictive, and prescriptive analytics, and their applications in different commercial contexts. Additionally, it highlights case studies of organizations that have successfully implemented data analytics to achieve significant business outcomes. The discussion extends to the challenges of integrating data analytics into business operations, such as data quality, privacy concerns, and the need for skilled personnel. By addressing these challenges and showcasing best practices, the paper provides a comprehensive understanding of how data analytics can be effectively harnessed to drive commerce growth and innovation.
Article
Full-text available
Crisis management has become a critical aspect of modern business and public administration, especially in the face of global crises such as economic recessions, pandemics, and natural disasters. In this context, digital technologies are playing an increasingly important role, providing new tools and approaches for effective crisis management. The definition of crisis management includes a set of measures aimed at identifying, assessing and neutralizing crisis situations, as well as minimizing their negative consequences. It is a management discipline that covers strategic, operational and tactical actions that allow organizations to respond quickly to changes in the external and internal environment. The role of digital technologies in modern management cannot be overestimated. They provide tools for the rapid collection, analysis and processing of information, which is critical in crisis situations. For example, Big Data management systems allow analyzing huge amounts of information in real time, which contributes to a more accurate assessment of the situation and informed decision-making. Cloud technologies provide access to resources and data from anywhere in the world, which is especially important in a crisis when it is necessary to ensure the continuity of business processes and the work of teams on a remote basis. Big data analytics is one of the key components of digital technologies in crisis management. It allows collecting and analyzing data from various sources, including social media, news, internal company systems, etc., to identify potential crises at early stages and predict their development. This enables organizations to respond quickly to threats and minimize negative consequences. Cloud technologies provide flexibility and scalability of the IT infrastructure, allowing organizations to quickly adapt to changes in the external environment and ensure business continuity. They also help reduce IT infrastructure costs and increase resource efficiency. Artificial intelligence and machine learning are powerful tools for automating crisis management processes. They can be used to analyze large amounts of data, detect anomalies, predict the development of crisis situations, and support decision-making. Machine learning algorithms can analyze historical data to identify patterns that precede crises and recommend appropriate actions to prevent them. Digital platforms and tools for communication and collaboration, such as Microsoft Teams, Slack, Zoom, ensure continuous interaction between employees and teams, which is critical in crisis situations. They allow for quick information exchange, virtual meetings, and coordination of actions, which contributes to more effective crisis management. Practical cases of successful use of digital technologies in crisis management include the experience of large corporations, government organizations, and international organizations. For example, Microsoft uses Azure cloud technologies to ensure business continuity during crises such as the COVID-19 pandemic. Government agencies, such as the US Federal Emergency Management Agency (FEMA), use big data management systems and analytics to coordinate actions during natural disasters. Practical cases of successful use of digital technologies in crisis management include the experience of large corporations, government organizations, and international organizations. For example, Microsoft uses Azure cloud technologies to ensure business continuity during crises such as the COVID-19 pandemic. Government agencies, such as the US Federal Emergency Management Agency (FEMA), use big data management systems and analytics to coordinate actions during natural disasters. Recommendations for the implementation of digital technologies in crisis management include strategies and steps for successful implementation, the role of management, IT department and employees, as well as planning and preparation for implementation. It is important that the organization's management understands the importance of digital technologies in crisis management and provides the necessary resources for their implementation. The IT department should be prepared to quickly deploy new technologies and ensure their smooth operation. Employees should be trained to use new tools and technologies, which may require additional training and professional development. Planning and preparation for implementation include the development of a detailed action plan that covers all stages of digital technology implementation, from needs assessment and technology selection to deployment and integration into existing business processes. It is also important to ensure monitoring and evaluation of the effectiveness of the implemented technologies in order to be able to identify and eliminate possible problems in time. The conclusions summarize the importance of digital technologies for crisis management, the main results of the study, and prospects for further research in this area. In particular, it is noted that the use of digital technologies allows organizations to more effectively manage crisis situations, reduce risks and minimize negative consequences. Prospects for further research include the study of new technologies, such as blockchain and the Internet of Things (IoT), and their potential for crisis management.
Chapter
The concept of smart cities has emerged as a response to the increasing urbanization and the need to create sustainable, efficient, and livable environments for the growing urban population (Angelidou, J. Urban Technol. 24:3–28, 2017, [8]).
Chapter
Human geography and urban planning have long relied on a variety of data sources to understand spatial patterns, analyze demographic trends, and inform policy decisions.
Article
Full-text available
This study proposes a novel multimodal approach for mixed-frequency time series forecasting in the oil industry, enabling the use of high-frequency (HF) data in their original frequency. We specifically address the challenge of integrating HF data streams, such as pressure and temperature measurements, with daily time series without introducing noise. Our approach was compared with existing econometric regression model mixed-data sampling (MIDAS) and with the data-driven models N-HiTS and a GRU-based network, across short-, medium-, and long-term prediction horizons. Additionally, we validated the proposed method on datasets from other domains beyond the oil industry. The experimental results indicate that our multimodal approach significantly improves long-term prediction accuracy.
Article
Full-text available
In today's digital era, organizations are inundated with vast amounts of data, presenting both challenges and opportunities for businesses. Big data analytics has emerged as a powerful tool to extract valuable insights from this abundance of data. However, for organizations to derive maximum benefit from big data analytics, alignment with business objectives is crucial. This paper explores the relationship between big data analytics and business alignment, aiming to elucidate the ways in which organizations can effectively integrate analytics into their strategic decision-making processes. The research begins by providing an overview of big data analytics and its significance in the contemporary business landscape. It then delves into the concept of business alignment, elucidating its importance for organizational success. Drawing on existing literature and empirical studies, the paper examines the various dimensions of alignment between big data analytics initiatives and business goals, including organizational culture, leadership support, technological infrastructure, and human resources capabilities. Furthermore, the paper investigates the challenges and barriers that organizations may encounter in aligning big data analytics with business objectives, such as data quality issues, skill shortages, and cultural resistance to change. Strategies and best practices for overcoming these challenges are also discussed, drawing on real-world examples and case studies. Ultimately, this research contributes to the growing body of knowledge on big data analytics by highlighting the critical role of business alignment in maximizing the value of analytics investments. By fostering alignment between analytics initiatives and strategic business objectives, organizations can enhance their competitiveness, drive innovation, and achieve sustainable growth in an increasingly data-driven world.
Research
Full-text available
Project Research Cryptography at Tartu University, Estonia
Article
Full-text available
The architecture of Blue Martini Software's e-commerce suite has supported data collection, data transformation, and data mining since its inception. With clickstreams being collected at the application-server layer, high-level events being logged, and data automatically transformed into a data warehouse using meta-data, common problems plaguing data mining using weblogs (e.g., sessionization and conflating multi-sourced data) were obviated, thus allowing us to concentrate on actual data mining goals. The paper briefly reviews the architecture and discusses many lessons learned over the last four years and the challenges that still need to be addressed. The lessons and challenges are presented across two dimensions: business-level vs. technical, and throughout the data mining lifecycle stages of data collection, data warehouse construction, business intelligence, and deployment. The lessons and challenges are also widely applicable to data mining domains outside retail e-commerce.
Article
Full-text available
The promise of data-driven decision-making is now being recognized broadly, and there is growing enthusiasm for the notion of "Big Data," including the recent announcement from the White House about new funding initiatives across different agencies, that target research for Big Data. While the promise of Big Data is real -- for example, it is estimated that Google alone contributed 54 billion dollars to the US economy in 2009 -- there is no clear consensus on what is Big Data. In fact, there have been many controversial statements about Big Data, such as "Size is the only thing that matters." In this panel we will try to explore the controversies and debunk the myths surrounding Big Data.
Conference Paper
Full-text available
We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.
Conference Paper
Full-text available
Web applications have come a long way, both in terms of adoption to provide information and services, and in terms of the technologies to develop them. With the emergence of richer and more advanced technologies such as AJAX, web applications have become more interactive, responsive and user friendly. These applications, often called Rich Internet Applications (RIAs), changed the web applications in two ways: (1) dynamic manipulation of client-side state and (2) asynchronous communication with the server. However, at the same time, such techniques also introduced new challenges. One important challenge is the difficulty of automatically crawling these new applications. Without crawling, RIAs cannot be indexed nor tested automatically. Traditional crawlers are not able to handle these newer technologies. This paper surveys the research on addressing the problem of crawling RIAs and provides some experimental results to compare existing crawling strategies. In addition, we provide some future directions for research on crawling RIAs.
Article
Full-text available
We are now seeing governments and funding agencies looking at ways to increase the value and pace of scientific research through increased or open access to both data and publications. In this point of view article, we wish to look at another aspect of these twin revolutions, namely, how to enable developers, designers and researchers to build intuitive,multimodal, user-centric, scientific applications that can aid and enable scientific research.
Article
Full-text available
This document is obsolete. The definitive document is Standard ECMA-404 The JSON Data Interchange Syntax. JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data interchange format. It was derived from the ECMAScript Programming Language Standard. JSON defines a small set of formatting rules for the portable representation of structured data.
Article
We provide here an overview of the new and rapidly emerging research area of privacy preserving data mining. We also propose a classification hierarchy that sets the basis for analyzing the work which has been performed in this context. A detailed review of the work accomplished in this area is also given, along with the coordinates of each work to the classification hierarchy. A brief evaluation is performed, and some initial conclusions are made.
Article
Editor's Note: This article proposes several models of community, including a model of "mobile community"---an extension of physical community merged with online community. The authors also provide examples of how these models have contributed to the development of community applications in their work at Samsung. ---Hugh Dubberly
Conference Paper
This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration. Examples of data intensive science will be used to illustrate the explosion of data and the associated new challenges for data capture, curation, analysis, and sharing. The role of cloud computing, collaboration services, and research repositories will be discussed.
Article
Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational "vertices" with communication "channels" to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of available computers, communicating as appropriate through flies, TCP pipes, and shared-memory FIFOs. The vertices provided by the application developer are quite simple and are usually written as sequential programs with no thread creation or locking. Concurrency arises from Dryad scheduling vertices to run simultaneously on multiple computers, or on multiple CPU cores within a computer. The application can discover the size and placement of data at run time, and modify the graph as the computation progresses to make efficient use of the available resources. Dryad is designed to scale from powerful multi-core single computers, through small clusters of computers, to data centers with thousands of computers. The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
Article
This paper presents BCube, a new network architecture specifically designed for shipping-container based, modular data centers. At the core of the BCube architecture is its server-centric network structure, where servers with multiple network ports connect to multiple layers of COTS (commodity off-the-shelf) mini-switches. Servers act as not only end hosts, but also relay nodes for each other. BCube supports various bandwidth-intensive applications by speeding-up one-to-one, one-to-several, and one-to-all traffic patterns, and by providing high network capacity for all-to-all traffic. BCube exhibits graceful performance degradation as the server and/or switch failure rate increases. This property is of special importance for shipping-container data centers, since once the container is sealed and operational, it becomes very difficult to repair or replace its components. Our implementation experiences show that BCube can be seamlessly integrated with the TCP/IP protocol stack and BCube packet forwarding can be efficiently implemented in both hardware and software. Experiments in our testbed demonstrate that BCube is fault tolerant and load balancing and it significantly accelerates representative bandwidth-intensive applications.
Article
Modern data centers are massive, and support a range of distributed applications across potentially hundreds of server racks. As their utilization and bandwidth needs continue to grow, traditional methods of augmenting bandwidth have proven complex and costly in time and resources. Recent measurements show that data center traffic is often limited by congestion loss caused by short traffic bursts. Thus an attractive alternative to adding physical bandwidth is to augment wired links with wireless links in the 60 GHz band. We address two limitations with current 60 GHz wireless proposals. First, 60 GHz wireless links are limited by line-of-sight, and can be blocked by even small obstacles. Second, even beamforming links leak power, and potential interference will severely limit concurrent transmissions in dense data centers. We propose and evaluate a new wireless primitive for data centers, 3D beamforming, where 60 GHz signals bounce off data center ceilings, thus establishing indirect line-of-sight between any two racks in a data center. We build a small 3D beamforming testbed to demonstrate its ability to address both link blockage and link interference, thus improving link range and number of concurrent transmissions in the data center. In addition, we propose a simple link scheduler and use traffic simulations to show that these 3D links significantly expand wireless capacity compared to their 2D counterparts.
Article
Nowadays, attacking the passwords is one of the most straightforward attack vectors, which authorize access to information system. There are numerous feasible methods, attempt to guess or crack passwords, with a di erent methods, approaches and tools. This paper analyzes the possibilities of using the tools and gives an example of how to accomplish the password guesses in di erent methods with tests which can be demonstrated together with comparison of input dictionary lists. The overall service to the follower is to insure for the potential needs: preventing password cracking, information security audit, password recovery, security policy etc.
Article
GOOD with numbers? Fascinated by data? The sound you hear is opportunity knocking. Mo Zhou was snapped up by I.B.M. last summer, as a freshly minted Yale M.B.A., to join the technology company's fast-growing ranks of data consultants. They help businesses make sense of an explosion of data — Web traffic and social network comments, as well as software and sensors that monitor shipments, suppliers and customers — to guide decisions, trim costs and lift sales. "I've always had a love of numbers," says Ms. Zhou, whose job as a data analyst suits her skills. To exploit the data flood, America will need many more like her. A report last year by the McKinsey Global Institute, the research arm of the consulting firm, projected that the United States needs 140,000 to 190,000 more workers with "deep analytical" expertise and 1.5 million more data-literate managers, whether retrained or hired. The impact of data abundance extends well beyond business. Justin Grimmer, for example, is one of the new breed of political scientists. A 28-year-old assistant professor at Stanford, he combined math with political science in his undergraduate and graduate studies, seeing "an opportunity because the discipline is becoming increasingly data-intensive." His research involves the computer-automated analysis of blog postings, Congressional speeches and press releases, and news articles, looking for insights into how political ideas spread. The story is similar in fields as varied as science and sports, advertising and public health — a drift toward data-driven discovery and decision-making. "It's a revolution," says Gary King, director of Harvard's Institute for Quantitative Social Science. "We're really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched." Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, "Big Data, Big Impact," declared data a new class of economic asset, like currency or gold.
Conference Paper
In recent years, RFID technologies have been used in many applications, such as inventory checking and object tracking. However, raw RFID data are inherently unreliable due to physical device limitations and different kinds of environmental noise. Currently, existing work mainly focuses on RFID data cleansing in a static environment (e.g. inventory checking). It is therefore difficult to cleanse RFID data streams in a mobile environment (e.g. object tracking) using the existing solutions, which do not address the data missing issue effectively. In this paper, we study how to cleanse RFID data streams for object tracking, which is a challenging problem, since a significant percentage of readings are routinely dropped. We propose a probabilistic model for object tracking in a mobile environment. We develop a Bayesian inference based approach for cleansing RFID data using the model. In order to sample data from the movement distribution, we devise a sequential sampler that cleans RFID data with high accuracy and efficiency. We validate the effectiveness and robustness of our solution through extensive simulations and demonstrate its performance by using two real RFID applications of human tracking and conveyor belt monitoring.
Conference Paper
Connections established by users of online social networks are influenced by mechanisms such as preferential attachment and triadic closure. Yet, recent research has found that geographic factors also constrain users: spatial proximity fosters the creation of online social ties. While the effect of space might need to be incorporated to these social mechanisms, it is not clear to which extent this is true and in which way this is best achieved. To address these questions, we present a measurement study of the temporal evolution of an online location-based social network. We have collected longitudinal traces over 4 months, including information about when social links are created and which places are visited by users, as revealed by their mobile check-ins. Thanks to this fine-grained temporal information, we test and compare whether different probabilistic models can explain the observed data adopting an approach based on likelihood estimation, quantitatively comparing their statistical power to reproduce real events. We demonstrate that geographic distance plays an important role in the creation of new social connections: node degree and spatial distance can be combined in a gravitational attachment process that reproduces real traces. Instead, we find that links arising because of triadic closure, where users form new ties with friends of existing friends, and because of common focus, where connections arise among users visiting the same place, appear to be mainly driven by social factors. We exploit our findings to describe a new model of network growth that combines spatial and social factors. We extensively evaluate our model and its variations, demonstrating that it is able to reproduce the social and spatial properties observed in our traces. Our results offer useful insights for systems that take advantage of the spatial properties of online social services.
Conference Paper
This paper introduces CIEL, a universal execution engine for distributed data-flow programs. Like previous execution engines, CIEL masks the complexity of distributed programming. Unlike those systems, a CIEL job can make data-dependent control-flow decisions, which enables it to compute iterative and recursive algorithms. We have also developed Skywriting, a Turing-complete scripting language that runs directly on CIEL. The execution engine provides transparent fault tolerance and distribution to Skywriting scripts and high-performance code written in other programming languages. We have deployed CIEL on a cloud computing platform, and demonstrate that it achieves scalable performance for both iterative and non-iterative algorithms.
Conference Paper
Multimedia event detection (MED) has a significant impact on many applications. Though video concept annotation has received much research effort, video event detection remains largely unaddressed. Current research mainly focuses on sports and news event detection or abnormality detection in surveillance videos. Our research on this topic is capable of detecting more complicated and generic events. Moreover, the curse of reality, i.e., precisely labeled multimedia content is scarce, necessitates the study on how to attain respectable detection performance using only limited positive examples. Research addressing these two aforementioned issues is still in its infancy. In light of this, we explore Ad Hoc MED, which aims to detect complicated and generic events by using few positive examples. To the best of our knowledge, our work makes the first attempt on this topic. As the information from these few positive examples is limited, we propose to infer knowledge from other multimedia resources to facilitate event detection. Experiments are performed on real-world multimedia archives consisting of several challenging events. The results show that our approach outperforms several other detection algorithms. Most notably, our algorithm outperforms SVM by 43% and 14% comparatively in Average Precision when using Gaussian and Χ2 kernel respectively.
Conference Paper
With the widely deployed cloud services, data center networks are evolving toward large-scale and multi-path networks, which cannot be supported by conventional routing methods, such as OSPF and RIP. To alleviate this issue, some new routing methods, such as PortLand and BSR, are proposed for data center networks. However, these routing methods are typically designed for a specific network architecture, and thus lacking adaptability while complex in fault-tolerance. To address this issue, this paper proposes a generic routing method, named fault-avoidance routing (FAR), for data center networks that have regular topologies. FAR simplifies route learning by leveraging the regularity in a topology. FAR also greatly reduces the size of routing tables by introducing a novel negative routing table (NRT) at routers. The operations of FAR is illustrated by an example Fat-tree network and the performance of FAR is analyzed in detail. The advantages of FAR are verified through extensive OPNET simulations.
Article
Increasingly, scientific computing applications must accumulate and manage massive datasets, as well as perform sophisticated computations over these data. Such applications call for data-intensive scalable computer (DISC) systems, which differ in fundamental ways from existing high-performance computing systems.
Article
We present a novel framework, CloudView, for storage, processing and analysis of massive machine maintenance data, collected from a large number of sensors embedded in industrial machines, in a cloud computing environment. This paper describes the architecture, design, and implementation of CloudView, and how the proposed framework leverages the parallel computing capability of a computing cloud based on a large-scale distributed batch processing infrastructure that is built of commodity hardware. A case-based reasoning (CBR) approach is adopted for machine fault prediction, where the past cases of failure from a large number of machines are collected in a cloud. A case-base of past cases of failure is created using the global information obtained from a large number of machines. CloudView facilitates organization of sensor data and creation of case-base with global information. Case-base creation jobs are formulated using the MapReduce parallel data processing model. CloudView captures the failure cases across a large number of machines and shares the failure information with a number of local nodes in the form of case-base updates that occur in a time scale of every few hours. At local nodes, the real-time sensor data from a group of machines in the same facility/plant is continuously matched to the cases from the case-base for predicting the incipient faults-this local processing takes a much shorter time of a few seconds. The case-base is updated regularly (in the time scale of a few hours) on the cloud to include new cases of failure, and these case-base updates are pushed from CloudView to the local nodes. Experimental measurements show that fault predictions can be done in real-time (on a timescale of seconds) at the local nodes and massive machine data analysis for case-base creation and updating can be done on a timescale of minutes in the cloud. Our approach, in addition to being the first reported use of the cloud architecture for maintenance data storag- , processing and analysis, also evaluates several possible cloud-based architectures that leverage the advantages of the parallel computing capabilities of the cloud to make local decisions with global information efficiently, while avoiding potential data bottlenecks that can occur in getting the maintenance data in and out of the cloud.
Article
The promise of data-driven decision-making is now being recognized broadly, and there is growing enthusiasm for the notion of ``Big Data.’’ While the promise of Big Data is real -- for example, it is estimated that Google alone contributed 54 billion dollars to the US economy in 2009 -- there is currently a wide gap between its potential and its realization.Heterogeneity, scale, timeliness, complexity, and privacy problems with Big Data impede progress at all phases of the pipeline that can create value from data. The problems start right away during data acquisition, when the data tsunami requires us to make decisions, currently in an ad hoc manner, about what data to keep and what to discard, and how to store what we keep reliably with the right metadata. Much data today is not natively in structured format; for example, tweets and blogs are weakly structured pieces of text, while images and video are structured for storage and display, but not for semantic content and search: transforming such content into a structured format for later analysis is a major challenge. The value of data explodes when it can be linked with other data, thus data integration is a major creator of value. Since most data is directly generated in digital format today, we have the opportunity and the challenge both to influence the creation to facilitate later linkage and to automatically link previously created data. Data analysis, organization, retrieval, and modeling are other foundational challenges. Data analysis is a clear bottleneck in many applications, both due to lack of scalability of the underlying algorithms and due to the complexity of the data that needs to be analyzed. Finally, presentation of the results and its interpretation by non-technical domain experts is crucial to extracting actionable knowledge.During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led, during the last 35 years, to a multi-billion dollar industry. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today. The many novel challenges and opportunities associated with Big Data necessitate rethinking many aspects of these data management platforms, while retaining other desirable aspects. We believe that appropriate investment in Big Data will lead to a new wave of fundamental technological advances that will be embodied in the next generations of Big Data management and analysis platforms, products, and systems.We believe that these research problems are not only timely, but also have the potential to create huge economic value in the US economy for years to come. However, they are also hard, requiring us to rethink data analysis systems in fundamental ways. A major investment in Big Data, properly directed, can result not only in major scientific advances, but also lay the foundation for the next generation of advances in science, medicine, and business.
Article
Along with the explosive growth of multimedia data, automatic multimedia tagging has attracted great interest of various research communities, such as computer vision, multimedia, and information retrieval. However, despite the great progress achieved in the past two decades, automatic tagging technologies still can hardly achieve satisfactory performance on real-world multimedia data that vary widely in genre, quality, and content. Meanwhile, the power of human intelligence has been fully demonstrated in the Web 2.0 era. If well motivated, Internet users are able to tag a large amount of multimedia data. Therefore, a set of new techniques has been developed by combining humans and computers for more accurate and efficient multimedia tagging, such as batch tagging, active tagging, tag recommendation, and tag refinement. These techniques are able to accomplish multimedia tagging by jointly exploring humans and computers in different ways. This article refers to them collectively as assistive tagging and conducts a comprehensive survey of existing research efforts on this theme. We first introduce the status of automatic tagging and manual tagging and then state why assistive tagging can be a good solution. We categorize existing assistive tagging techniques into three paradigms: (1) tagging with data selection & organization; (2) tag recommendation; and (3) tag processing. We introduce the research efforts on each paradigm and summarize the methodologies. We also provide a discussion on several future trends in this research direction.
Article
Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information in ways that go beyond browsing or collaborative filtering. In this paper we review previous work on audio and video processing, and define the task of Topic-Oriented Multimedia Summarization (TOMS) using natural language generation: given a set of automatically extracted features from a video (such as visual concepts and ASR transcripts) a TOMS system will automatically generate a paragraph of natural language ("a recounting"), which summarizes the important information in a video belonging to a certain topic area, and provides explanations for why a video was matched and retrieved. We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract visual concept features and ASR transcription features from a given video, and develop a template-based natural language generation system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.
Book
Social network analysis applications have experienced tremendous advances within the last few years due in part to increasing trends towards users interacting with each other on the internet. Social networks are organized as graphs, and the data on social networks takes on the form of massive streams, which are mined for a variety of purposes. Social Network Data Analytics covers an important niche in the social network analytics field. This edited volume, contributed by prominent researchers in this field, presents a wide selection of topics on social network data mining such as Structural Properties of Social Networks, Algorithms for Structural Discovery of Social Networks and Content Analysis in Social Networks. This book is also unique in focussing on the data analytical aspects of social networks in the internet scenario, rather than the traditional sociology-driven emphasis prevalent in the existing books, which do not focus on the unique data-intensive characteristics of online social networks. Emphasis is placed on simplifying the content so that students and practitioners benefit from this book. This book targets advanced level students and researchers concentrating on computer science as a secondary text or reference book. Data mining, database, information security, electronic commerce and machine learning professionals will find this book a valuable asset, as well as primary associations such as ACM, IEEE and Management Science.
Article
This talk describes the optimal (revenue maximizing) auction for sponsored search advertising. We show that a search engine's optimal reserve price is independent of the number of bidders. Using simulations, we consider the changes that result from a ...