Texas A&M University – Commerce
Question
Asked 16 February 2024
Is Data Science a Science?
Chalmers in his book: What is this thing called Science? mentions that Science is Knowledge obtained from information. The most important endeavors of science are : Prediction and Explanation of Phenomenon. The emergence of Big (massive) Data leads us to the field of Data Science (DS) with the main focus on prediction. Indeed, data belong to a specific field of knowledge or science (physics, economy, ....).
If DS is able to realize prediction for the field of sociology (for example), to whom the merit is given: Data Scientist or Sociologist?
10.1007/s11229-022-03933-2
#DataScience #ArtificialIntelligence #Naturallanguageprocessing #DeepLearning #Machinelearning #Science #Datamining
Most recent answer
Yes, data science is considered a science because it involves systematic methods, processes, and algorithms to extract knowledge and insights from structured and unstructured data, grounded in principles of statistics, mathematics, and computer science.
All Answers (5)
University of Leicester
It is interesting and simultaneously meaningless question. Data mining is more art than science but work in this area require much knowledge in mathematics. Mathematics is exactly science but data science - not. I have many publications in medicine, genetics,sociology, chemistry and biology but I am mathematician. Models which I help to develop can be used in different areas but I am the same. Data science is tool to use in different areas.
I am not sure that my explanation explain something but it is my understanding of your question.
Data science in its current state seeks to establish empirical statistical associations, data correlations, and patterns in the hope that they can help discover some hidden insights. But it is the reason why data science cannot be called real science at all.
In general, science can be broadly defined as a logical system of getting and organizing insights in some area of knowledge as a set of some principles that can be used for making some previously unknown but testable predictions. To qualify as a principle, insight must be both highly general (applicable to many settings) and stable (relevant now and in future developments).
What are specific data science principles that make it real science? Data science is just a loose collection of various computational methods applied to fit empirical data sets. An empirical association can, of course, be a starting point but it will never become real science until it includes a causal relationship between the variables (factors or features), i.e. understanding.
Indeed, is the data science forecast based on empirically collected data (the past) always possible? No, in general prediction or forecasting the future solely from the past data is not possible regardless of the amount of the past data. It is possible only in a limited number of cases in which a stable past data pattern can be reliably extended in the relatively short future horizon.
The data-centric thinking that dominates both data science and machine learning cultures is actually just 'data-fitting, in contrast to the “data-interpretation” that guides causal inference. The data-fitting school is driven by the faith that the secret to rational decisions lies in the data itself if only we are sufficiently clever at finding data patterns. In contrast, the data-interpreting school views data not as a sole object of inquiry but as an auxiliary means for interpreting reality, and “reality” stands for causal understanding of the processes that generate the data.
University of Leeds
I think that is the difference between data analysis and data science. Data analysis concerns all technics machine learning and data science is the combination between the analysis expertise and the specific domain of knowledge. In this case, data science could be defined as a tool to produce knowledge and the research subject (e.g. nutrition, economics, transport) will be considered the science.
Evgeny Mirkes I am glad that we are both on the same page: data science in its current form is not science at all, it's just a loose collection of various statistical tools.
Similar questions and discussions
The Future of Data Science
Saikat Barua
The 21st century has witnessed an unprecedented surge in data generation, transforming nearly every facet of human endeavor. This "datafication" of our world has spurred the emergence of data science, a multidisciplinary field dedicated to extracting knowledge and insights from this vast ocean of information [1]. Data science is no longer a nascent discipline; it is a transformative force driving innovation, informing decision-making, and reshaping the landscape of science, industry, and society. This review synthesizes the current state of data science, exploring its evolution, key challenges, and promising directions for the future. We examine the diverse applications of data science across various domains, highlighting the critical role of machine learning, cloud computing, and educational initiatives in shaping its trajectory.
The Evolution and Scope of Data Science
The genesis of data science can be traced to the evolution of data analysis techniques. Initially focused on statistical methods, the field has expanded to encompass a broad spectrum of disciplines, including computer science, mathematics, and domain-specific knowledge [1]. This evolution reflects the increasing complexity and volume of data, demanding more sophisticated tools and methodologies. Data science aims to extract meaningful patterns, trends, and relationships from data to address complex problems and drive informed decisions.
The core components of data science typically include data collection and preparation, exploratory data analysis, model building and evaluation, and communication of findings [1]. The process often involves cleaning, transforming, and integrating data from diverse sources, followed by the application of statistical, machine learning, and visualization techniques to uncover hidden insights.
Data Science in Diverse Domains
The impact of data science is evident across a multitude of fields. In materials science, data-driven approaches are revolutionizing the discovery and design of new materials [5]. By leveraging materials databases, machine learning algorithms, and high-throughput methods, researchers are accelerating the identification of promising material candidates and optimizing their properties. Similarly, in the automotive industry, data science and machine learning are key technologies for optimizing processes and products [8]. Applications range from product development and manufacturing to marketing and customer relationship management. The ability to analyze vast amounts of data from connected vehicles allows for improved performance, enhanced safety features, and personalized customer experiences.
The healthcare sector is also undergoing a data-driven transformation. The analysis of patient data, including electronic health records, medical images, and genomic information, enables the development of personalized medicine, predictive diagnostics, and improved treatment outcomes [3]. Furthermore, the rise of surgical data science is leading to the development of novel algorithms and tools for surgical planning, execution, and assessment [10]. This includes the use of ontologies to capture the semantics of surgical data and algorithms, facilitating knowledge sharing and comparison [10].
In the realm of scientific research, the advent of data-driven science is changing the way discoveries are made [3]. The sheer volume of data generated in fields like astronomy, geology, and biodiversity necessitates cloud-based services for storage, computation, and analysis [3]. This shift towards the fourth paradigm of science, where data is the primary driver of research, holds immense promise for accelerating scientific breakthroughs.
Machine Learning and Cloud Computing: Pillars of the Data Science Ecosystem
Machine learning (ML) has become an indispensable component of data science, providing the tools to build predictive models and automate complex tasks [3, 8]. ML algorithms can learn from data without explicit programming, enabling the identification of patterns and relationships that would be impossible for humans to discern. The application of ML spans across various domains, from image and video processing to natural language processing and robotics [3].
Cloud computing provides the infrastructure to support the computational demands of data science [3]. Cloud-based platforms offer scalable storage, processing power, and a wide range of services, enabling researchers and practitioners to handle massive datasets and execute complex analyses efficiently. The integration of cloud technologies with data science allows for the development of new services and paradigms, such as approximate computing and quantum computing, which can further enhance the capabilities of data-driven approaches [3].
Data Infrastructure and Management
The effective use of data depends heavily on the availability of robust data infrastructure and efficient data management practices [9]. This includes the creation of materials databases, data repositories, and data communication pipelines [5, 9]. Investment in such infrastructure is critical for accelerating discovery and innovation in fields such as materials science, where large volumes of experimental data are generated [9].
Data veracity, integration of experimental and computational data, data longevity, and standardization are all key challenges to the development of effective data infrastructures [5]. Addressing these challenges is crucial for ensuring the reliability, accessibility, and reusability of data.
Data Science Education: Cultivating Future Expertise
The rapid growth of data science has created a significant demand for skilled professionals. Addressing this need requires a concerted effort to develop comprehensive data science education programs [6]. These programs should equip students with a strong foundation in mathematics, statistics, computer science, and domain-specific knowledge, as well as the ability to apply these skills to real-world problems [6].
Effective data science education should also emphasize critical thinking, problem-solving, and communication skills [2, 6]. Students should be trained to navigate the diverse backgrounds of data science, adapting to various learning approaches [2]. The curriculum should incorporate active learning, industry experts, and real-world projects to provide students with a practical understanding of the field [6].
Addressing the Challenges in Data Science
Despite the immense potential of data science, several challenges must be addressed to realize its full promise. One key challenge is the ethical implications of data collection, analysis, and use. Data privacy, algorithmic bias, and the potential for misuse of data require careful consideration and the development of ethical guidelines and regulations [1].
Another challenge is the need for improved data quality and standardization. The value of data science depends on the availability of clean, reliable, and well-documented data. Efforts should be made to develop standardized data formats, data validation procedures, and data governance frameworks to ensure data quality and facilitate data sharing [5].
The "black box" nature of some machine learning algorithms also poses a challenge. While these algorithms can achieve high accuracy, they may lack interpretability, making it difficult to understand the reasoning behind their predictions. Research is needed to develop explainable AI (XAI) techniques that can provide insights into the decision-making processes of machine learning models [1].
Future Directions in Data Science
The future of data science holds immense promise, with several key areas poised for further development. One area is the integration of data science with emerging technologies, such as quantum computing and the Internet of Things (IoT) [3]. Quantum computing offers the potential to accelerate complex computations, while the IoT generates vast amounts of data from connected devices, creating new opportunities for data analysis and insights.
Another promising direction is the development of advanced machine learning techniques, such as deep learning and reinforcement learning [3]. These techniques are capable of learning complex patterns and making sophisticated predictions, opening up new possibilities in areas such as natural language processing, computer vision, and robotics.
The development of new data visualization techniques and tools is also crucial for communicating complex data insights effectively [1]. The ability to present data in a clear, concise, and engaging manner is essential for conveying findings to stakeholders and making data-driven decisions.
Furthermore, the exploration of future works in scientific articles is valuable for guiding researchers to emerging research directions [4]. Mining future works can provide insights for future work analysis and facilitate researchers to search and browse future works in a research area [4].
Finally, the advancement of data science requires continued investment in education and training [2, 6]. This includes developing new curricula, providing opportunities for hands-on learning, and fostering collaboration between academia, industry, and government.
Conclusion
Data science has emerged as a transformative field, revolutionizing how we understand and interact with the world. From materials science and healthcare to the automotive industry and scientific research, data-driven approaches are driving innovation, informing decision-making, and shaping the future. While challenges remain, the continued development of machine learning, cloud computing, and data infrastructure, combined with a focus on ethical considerations and education, will pave the way for even greater advancements in the years to come. The ability to harness the power of data will be critical for addressing the complex challenges facing society and unlocking new opportunities for progress.
==================================================
References
- Longbing Cao. Data Science: A Comprehensive Overview. arXiv:2007.03606v1 (2020). Available at: http://arxiv.org/abs/2007.03606v1
- Yehia Elkhatib. Navigating Diverse Data Science Learning: Critical Reflections Towards Future Practice. arXiv:1807.03750v1 (2018). Available at: http://arxiv.org/abs/1807.03750v1
- Hrishav Bakul Barua. Data science and Machine learning in the Clouds: A Perspective for the Future. arXiv:2109.01661v1 (2021). Available at: http://arxiv.org/abs/2109.01661v1
- Yue Hu, Xiaojun Wan. Mining and Analyzing the Future Works in Scientific Articles. arXiv:1507.02140v1 (2015). Available at: http://arxiv.org/abs/1507.02140v1
- Lauri Himanen, Amber Geurts, Adam S. Foster, Patrick Rinke. Data-driven materials science: status, challenges and perspectives. arXiv:1907.05644v2 (2019). Available at: http://arxiv.org/abs/1907.05644v2
- Brian Wright, Peter Alonzi, Ali Riveria. The Future of Data Science Education. arXiv:2407.11824v1 (2024). Available at: http://arxiv.org/abs/2407.11824v1
- Nicolas Chappe, Ludovic Henrio, Amaury Maillé, Matthieu Moy, Hadrien Renaud. An Optimised Flow for Futures: From Theory to Practice. arXiv:2107.07298v1 (2021). Available at: http://arxiv.org/abs/2107.07298v1
- Martin Hofmann, Florian Neukart, Thomas Bäck. Artificial Intelligence and Data Science in the Automotive Industry. arXiv:1709.01989v1 (2017). Available at: http://arxiv.org/abs/1709.01989v1
- Kevin R. Talley, Robert White, Nick Wunder, Matthew Eash, Marcus Schwarting, Dave Evenson, John Perkins, William Tumas, Kristin Munch, Caleb Phillips, Andriy Zakutayev. Research Data Infrastructure for High-Throughput Experimental Materials Science. arXiv:2105.05160v1 (2021). Available at: http://arxiv.org/abs/2105.05160v1
- Darko Katić, Maria Maleshkova, Sandy Engelhardt, Ivo Wolf, Keno März, Lena Maier-Hein, Marco Nolden, Martin Wagner, Hannes Kenngott, Beat Peter Müller-Stich, Rüdiger Dillmann, Stefanie Speidel. What does it all mean? Capturing Semantics of Surgical Data and Algorithms with Ontologies. arXiv:1705.07747v1 (2017). Available at: http://arxiv.org/abs/1705.07747v1
Related Publications
In this paper we illustrate the use of Data Science techniques to analyse complex human communication. In particular, we consider tweets from leaders of political parties as a dynamical proxy to political programmes and ideas. We also study the temporal evolution of their contents as a reaction to specific events. We analyse levels of positive and...
Convolutional Neural Network (CNNs) are typically associated with Computer Vision. CNNs are responsible for major breakthroughs in Image Classification and are the core of most Computer Vision systems today. More recently CNNs have been applied to problems in Natural Language Processing and gotten some interesting results. In this paper, we will tr...