Science topics: MathematicsData Science
Science topic

Data Science - Science topic

Data science combines the power of computer science and applications, modeling, statistics, engineering, economy and analytics. Whereas a traditional data analyst may look only at data from a single source - a single measurement result, data scientists will most likely explore and examine data from multiple disparate sources. According to IBM, "the data scientist will sift through all incoming data with the goal of discovering a previously hidden insight, which in turn can provide a competitive advantage or address a pressing business problem. A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends ways to apply the data." Data Science has grown importance with Big Data and will be used to extract value from the Cloud to business across domains.
Questions related to Data Science
  • asked a question related to Data Science
Question
2 answers
Data augmentation creates something from nothing?
Relevant answer
Answer
Data augmentation is a technique widely used in machine learning and computer vision to increase the size of training datasets by creating new data points from the existing ones. These new data points are variations of the original data, and they help improve the robustness and generalization of machine learning models. While data augmentation doesn't exactly create something from nothing, it generates diverse examples from the available data, enhancing the model's ability to learn patterns and features.
Here are some substantial and reliable advancements in data augmentation techniques up to my knowledge cutoff date in September 2021:
  1. CutMix and MixUp: These techniques involve mixing two or more images to create a new training sample. CutMix combines patches from different images and their corresponding labels, while MixUp linearly interpolates between two images and their labels. Both techniques encourage the model to learn from mixed features and labels, thereby improving generalization.
  2. AutoAugment and RandAugment: AutoAugment uses reinforcement learning to search for the best data augmentation policies for a given dataset and model, automating the process of selecting augmentation techniques. RandAugment introduces random augmentations with adjustable magnitude, making it easy to apply a diverse set of augmentations to the training data.
  3. Style Transfer Augmentation: Inspired by neural style transfer, this approach involves transferring the style of one image to another. It can be used to create new training samples with different artistic styles while keeping the content the same.
  4. Cutout and GridMask: Cutout randomly masks out rectangular regions in the input images, forcing the model to learn from the non-masked regions. GridMask applies grid-like masks to augment the data, similar to Cutout but with a more structured pattern.
  5. Augmentation Policies for Audio and Text: Data augmentation is not limited to images. Researchers have developed augmentation techniques for audio data, such as adding noise or changing pitch and speed. For text data, methods like synonym replacement, word dropout, and word order shuffling have been proposed.
  6. Adaptive Augmentation: Some approaches use feedback from the model's performance during training to adaptively adjust the augmentation strategy. If the model is struggling with certain samples, more augmentations can be applied to those samples to make them more informative.
  7. CycleGAN for Data Translation: CycleGAN is a generative adversarial network (GAN) architecture that can be used to translate data from one domain to another. For example, it can convert images of day scenes to night scenes or horses to zebras, providing additional data for training.
  8. Data Augmentation Libraries: Several libraries and frameworks, such as Albumentations, imgaug, and Augmentor, provide a wide range of data augmentation techniques and easy-to-use APIs, making it convenient for practitioners to apply advanced augmentations to their datasets.
  • asked a question related to Data Science
Question
3 answers
Data augmentation creates something from nothing?
Relevant answer
Answer
The premise that data augmentation creates something from nothing is incorrect, in most, it takes an existing data set and extends it to handle variations. For example, taking the data case and creating cases where your model is trained on rotation and translation invariance. In another case, it could be used for balancing datasets. I guess you are trying to refer to GANs, however, in that case, the data is generated by trained model which already has seen lot of data.
  • asked a question related to Data Science
Question
4 answers
Hello people, I have a dataset of inhibitors as binary labels ( Zeros - Inactive , Ones - Active ). I have my ML/AI model working, now I would like to know how many are the best inhibitors out of these. Could anyone help me what should I do and what can be done to resolve my problems?
TIA
#DrugDesign #ML #AI #DataScience #DrugDiscovery
Relevant answer
Answer
It sounds like you're working on a binary classification task with a focus on identifying the best inhibitors. Here's a step-by-step approach to help you assess and further refine your model to get the results you need:
1. Model Diagnostic Assessment:
  • Confusion Matrix: Construct a confusion matrix to elucidate true positive, true negative, false positive, and false negative categorizations from model predictions.
  • Performance Metrics: Determine precision, recall, F1-score, and AUC-ROC to critically assess model accuracy and effectiveness.
  • ROC Curve Analysis: This graphical representation delineates the compromise between sensitivity and specificity.
  • Threshold Refinement: Many algorithms conventionally employ a 0.5 threshold for classification. An adjusted threshold may be imperative to either augment recall or precision. Such adjustments might be pivotal for precise inhibitor identification.
2. Examination of Feature Significance:
  • Should the model possess inherent capabilities (e.g., tree-based methodologies), it's pertinent to scrutinize feature importance scores, offering insights into the most influential features for active inhibitor prediction.
  • For models devoid of direct feature significance outputs, one might consider employing techniques such as Permutation Importance or SHAP values.
3. Model Refinement Strategies:
  • Resampling: In the presence of a class imbalance in the dataset, methodologies such as oversampling, undersampling, or the Synthetic Minority Over-sampling Technique (SMOTE) should be explored.
  • Hyperparameter Optimization: Techniques encompassing grid search or random search should be invoked for optimal hyperparameter tuning tailored to the task at hand.
  • Cross-Validation Strategy: Implementation of k-fold cross-validation is advised to yield a comprehensive model performance assessment.
4. Inhibitor Ranking Framework:
  • Probabilistic Outputs: Rather than binary outcomes, it is beneficial to procure probability scores from the model. Inhibitors manifesting elevated probabilities of activity might be deemed as the most potent.
  • Subsequent Analysis: Following the demarcation of paramount inhibitors, a deeper analysis is advocated, potentially emphasizing their molecular characteristics or mechanistic pathways.
Best regards,
Samawel JABALLI
  • asked a question related to Data Science
Question
1 answer
How can data science and statistical analysis be used to improve the shipping and logistics industry?
Relevant answer
Answer
  1. Demand Forecasting: Predictive analytics can help forecast demand for specific products or goods. By analyzing historical sales data, market trends, and other factors, shipping companies can optimize inventory management and ensure that the right amount of goods is available when needed.
  2. Route Optimization: Data-driven algorithms can optimize shipping routes, taking into account factors like traffic conditions, weather, fuel costs, and delivery windows. This reduces delivery times, fuel consumption, and overall transportation costs.
  3. Inventory Management: Statistical analysis can help in determining optimal inventory levels, reorder points, and safety stock. This ensures that warehouses and distribution centers operate efficiently while minimizing carrying costs.
  4. Fleet Management: IoT sensors and data analysis can be used to monitor the condition and performance of vehicles and equipment in real-time. Predictive maintenance algorithms can help schedule maintenance before breakdowns occur, reducing downtime.
  5. Container and Cargo Tracking: RFID and GPS technologies combined with data analysis enable real-time tracking of containers and cargo. This helps in reducing theft, improving security, and providing customers with accurate delivery times.
  6. Optimal Load Planning: Data science can be used to optimize the loading of containers and trucks to maximize cargo capacity while adhering to weight and safety regulations.
  7. Energy Efficiency: Data analysis can help identify opportunities to reduce fuel consumption and emissions by optimizing vehicle routes and driving behavior. This is crucial for both cost savings and environmental sustainability.
  8. Predictive Analytics for Delays: Machine learning models can predict potential delays in the supply chain due to weather events, port congestion, or other factors. This enables proactive decision-making and minimizes disruptions.
  9. Supplier and Vendor Performance Analysis: Analyzing data on supplier and vendor performance can help identify bottlenecks and inefficiencies in the supply chain. Companies can make informed decisions about whether to continue or adjust relationships with suppliers.
  10. Customer Satisfaction: Analyzing customer feedback and delivery data can help improve customer satisfaction. It enables shipping companies to identify pain points, optimize delivery times, and provide better tracking and communication to customers.
  11. Risk Management: Statistical analysis can help in assessing and mitigating various risks, such as financial risks associated with international trade, regulatory compliance, and safety risks in transportation.
  12. Market Pricing and Competitive Analysis: Data science can be used to analyze market pricing trends and competitive positioning. This helps in setting competitive pricing strategies and making informed decisions about entering new markets.
  • asked a question related to Data Science
Question
6 answers
Is it possible to build a highly effective forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science analytics, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies?
Is it possible to build a highly effective, multi-faceted, intelligent forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science analytics, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies as part of a forecasting system for complex, multi-faceted economic processes in such a way as to reduce the scale of the impact of the paradox of a self-fulfilling prediction and to increase the scale of the paradox of not allowing a predicted crisis to occur due to pre-emptive anti-crisis measures applied?
What do you think about the involvement of artificial intelligence in combination with Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies for the development of sophisticated, complex predictive models for estimating current and forward-looking levels of systemic financial, economic risks, debt of the state's public finance system, systemic credit risks of commercially operating financial institutions and economic entities, forecasting trends in economic developments and predicting future financial and economic crises?
Research and development work is already underway to teach artificial intelligence to 'think', i.e. the conscious thought process realised in the human brain. The aforementioned thinking process, awareness of one's own existence, the ability to think abstractly and critically, and to separate knowledge acquired in the learning process from its processing in the abstract thinking process in the conscious thinking process are just some of the abilities attributed exclusively to humans. However, as part of technological progress and improvements in artificial intelligence technology, attempts are being made to create "thinking" computers or androids, and in the future there may be attempts to create an artificial consciousness that is a digital creation, but which functions in a similar way to human consciousness. At the same time, as part of improving artificial intelligence technology, creating its next generation, teaching artificial intelligence to perform work requiring creativity, systems are being developed to process the ever-increasing amount of data and information stored on Big Data Analytics platform servers and taken, for example, from selected websites. In this way, it may be possible in the future to create "thinking" computers, which, based on online access to the Internet and data downloaded according to the needs of the tasks performed and processing downloaded data and information in real time, will be able to develop predictive models and specific forecasts of future processes and phenomena based on developed models composed of algorithms resulting from previously applied machine learning processes. When such technological solutions become possible, the following question arises, i.e. the question of taking into account in the built intelligent, multifaceted forecasting models known for years paradoxes concerning forecasted phenomena, which are to appear only in the future and there is no 100% certainty that they will appear. Well, among the various paradoxes of this kind, two particular ones can be pointed out. One is the paradox of a self-fulfilling prophecy and the other is the paradox of not allowing a predicted crisis to occur due to pre-emptive anti-crisis measures applied. If these two paradoxes were taken into account within the framework of the intelligent, multi-faceted forecasting models being built, their effect could be correlated asymmetrically and inversely proportional. In view of the above, in the future, once artificial intelligence has been appropriately improved by teaching it to "think" and to process huge amounts of data and information in real time in a multi-criteria, creative manner, it may be possible to build a highly effective, multi-faceted, intelligent forecasting system for future financial and economic crises based on artificial intelligence technology, a system for forecasting complex, multi-faceted economic processes in such a way as to reduce the scale of the impact of the paradox of a self-fulfilling prophecy and increase the scale of the paradox of not allowing a predicted crisis to occur due to pre-emptive anti-crisis measures applied. In terms of multi-criteria processing of large data sets conducted with the involvement of artificial intelligence, Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4. 0 technologies, which make it possible to effectively and increasingly automatically operate on large sets of data and information, thus increasing the possibility of developing advanced, complex forecasting models for estimating current and future levels of systemic financial and economic risks, indebtedness of the state's public finance system, systemic credit risks of commercially operating financial institutions and economic entities, forecasting economic trends and predicting future financial and economic crises.
In view of the above, I address the following questions to the esteemed community of scientists and researchers:
Is it possible to build a highly effective, multi-faceted, intelligent forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies in a forecasting system for complex, multi-faceted economic processes in such a way as to reduce the scale of the impact of the paradox of the self-fulfilling prophecy and to increase the scale of the paradox of not allowing a forecasted crisis to occur due to pre-emptive anti-crisis measures applied?
What do you think about the involvement of artificial intelligence in combination with Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies to develop advanced, complex predictive models for estimating current and forward-looking levels of systemic financial risks, economic risks, debt of the state's public finance system, systemic credit risks of commercially operating financial institutions and economic entities, forecasting trends in economic developments and predicting future financial and economic crises?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
Relevant answer
Answer
In my opinion, in order to determine the question of the possibility of building a highly effective forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science analytics, Big Data Analytics, Business Intelligence and/or other Industry 4.0/5.0 technologies, it is first necessary to precisely define the essence of forecasting specific risk factors, i.e. factors that in the past were the sources of the occurrence of certain types of economic, financial and other crises and that may be such factors in the future. But will such a structured forecasting system based on a combination of Big Data Analytics and Artificial Intelligence be able to forecast events that appear as unusual, generating new types of risks, referred to as so-called "black swans", such as forecasting the appearance of another but generated by a difficult to predict new type of risk, an unusual event leading to the occurrence of another e.g. something similar to the 2008 global financial crisis, the 2020 pandemic, or something completely new that has not yet appeared.
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
  • asked a question related to Data Science
Question
3 answers
I want to learn data science using R, Python, machine learning etc. Anyone who can send me the links for these training courses (up to 6 months) that are legitimate and can be accepted in USA?
Relevant answer
Answer
In the realm of data science education within the USA as per my knowledge and opinion, two certifications distinctly stand out for you: The Data Science Council of America's (DASCA) Principle Data Scientist (PDS) and the Open Certified Data Scientist (Open CDS). DASCA's PDS, while demanding extensive experience and, in certain tracks, a master's degree, delves into advanced topics, providing a robust foundation in high-level analysis and technology. In contrast, the Open CDS forgoes traditional coursework in favor of an application and board review, tailoring certification to an individual's experience and expertise. Both credentials are perpetual, negating the need for renewals, and carry substantial weight in the professional landscape. For those seeking comprehensive training with lasting recognition in the USA, these certifications come highly recommended.
  • asked a question related to Data Science
Question
2 answers
I have no specific research topics for a PhD in Data Science.
Relevant answer
Answer
There are many potential research topics for data science in economics. Here are a few examples:
  • Predicting economic growth using machine learning algorithms
  • Analyzing the impact of government policies on the economy using big data
  • Developing new methods for causal inference in economics
  • Using natural language processing techniques to analyze economic text data
  • Analyzing the impact of climate change on the economy using big data
  • asked a question related to Data Science
Question
3 answers
Which laptop is recommended for data science and managing large datasets among the following options?
  1. MacBook Pro MPHG3 - 2023 (Apple)
  2. ROG Strix SCAR 18 G834JY-N5049-i9 32GB 1SSD RTX4090 (ASUS)
  3. Legion 7 Pro-i9 32GB 1SSD RTX 4080 (Lenovo)
  4. Raider GE78HX 13VH-i9 32GB 2SSD RTX4080 (MSI)
Relevant answer
Answer
  1. this one is also good for large dataset Legion 7 Pro-i9 32GB 1SSD RTX 4080 (Lenovo)
  • asked a question related to Data Science
Question
2 answers
Hi there! I'm a Python expert with strong skills in Machine Learning, Data Science, and Data Analysis. I'm eager to join a research team to collaborate on exciting projects. My experience includes supervised/unsupervised learning, and data manipulation using NumPy/Pandas. I'm proficient in statistical analysis. I'm committed to open communication and teamwork for successful outcomes. Let's explore how I can contribute to your research team's endeavors. If you need any person to be in your team to conduct a research and publish it, I will be happy to be in your team.
Relevant answer
Answer
Dear contact me
we can collaborate
  • asked a question related to Data Science
Question
3 answers
Fractal analysis and data science are both interdisciplinary fields that complement each other in various ways.
1. Understanding Complex Data: Fractal analysis provides a framework for understanding complex data structures and patterns. Data science deals with large and complex datasets, and fractal analysis techniques help in identifying self-similarity, scaling properties, and patterns within the data. By applying fractal analysis methods, data scientists can gain insights into the underlying structure of the data.
2. Feature Extraction: Fractal analysis enables data scientists to extract meaningful features from datasets. Fractal dimensions, for example, can quantify the complexity or irregularity of patterns in data, which can then be used as features for further analysis. These features can enhance the predictive capabilities of machine learning models and help uncover hidden relationships or anomalies in the data.
3. Data Visualization: Fractal analysis can be used to visualize and represent complex datasets in more intuitive and informative ways. Data visualization is a crucial aspect of data science, as it helps in understanding the data and communicating insights effectively. Fractals, with their visually appealing and self-replicating patterns, can provide a unique and visually rich representation of data.
4. Time Series Analysis: Fractal analysis techniques, such as fractal dimensions and Hurst exponent, can be particularly useful in analyzing time series data. Data scientists often work with time-dependent data, like stock prices, weather data, or sensor measurements. Fractal analysis helps in uncovering long-term dependencies, trends, or self-similar patterns in such data, contributing to forecasting, anomaly detection, and modeling of time series.
5. Dimensionality Reduction: In data science, one often encounters datasets with high dimensions, making it challenging to analyze and extract meaningful insights. Fractal analysis techniques can assist in reducing the dimensionality of the data by identifying the most relevant features and reducing noise or redundancy. This can lead to more efficient and accurate data analysis and modeling
If I want to extend my discussion with an example,
let's say we have a time series dataset of stock market prices over several years. By applying fractal analysis techniques such as the Hurst exponent or box counting, we can identify any underlying fractal patterns in the data. This can help us understand the long-term stability or volatility of the stock market, and potentially predict future trends.
In data science, we can further enhance our understanding of the stock market data by applying various statistical and machine learning techniques. We can build predictive models based on historical price trends, external market factors, and other relevant data. These models can then be used to forecast future stock market behavior and guide investment strategies.
Is there a book or an article to express the relation between fractals and data science with more details and examples?
Relevant answer
Answer
Obviously, the connection between fractal analysis (as well as fractal geometry) and data science is realized through a consistent increase in the resolution of the method for obtaining data on the characteristic parameters of the system under study.
  • asked a question related to Data Science
Question
4 answers
Modernizing civil engineering education involves incorporating new technologies, teaching methodologies, and industry practices to equip students with the necessary skills and knowledge to meet the challenges of the future.
Here are some key strategies to modernize civil engineering education:
  1. Update Curriculum: Regularly review and update the curriculum to include emerging technologies and trends in civil engineering. Introduce courses on topics like sustainable design, renewable energy, smart infrastructure, and digital construction.
  2. Incorporate Digital Tools: Integrate computer-aided design (CAD), Building Information Modeling (BIM), and other software tools into the curriculum to familiarize students with modern engineering workflows and industry standards.
  3. Hands-on Learning: Emphasize practical, hands-on experiences in addition to theoretical knowledge. Incorporate real-world projects and case studies to give students a taste of actual engineering challenges.
  4. Interdisciplinary Approach: Promote collaboration with other engineering disciplines and fields like architecture, environmental science, and data science. Encourage students to work in cross-functional teams to solve complex problems.
  5. Sustainability Focus: Highlight sustainable practices throughout the curriculum. Encourage students to think about environmental impact, life cycle assessments, and green infrastructure solutions.
  6. Industry Partnerships: Establish strong partnerships with industry professionals and companies. Invite guest speakers, organize workshops, and facilitate internships to expose students to the latest industry practices.
  7. Research and Innovation: Encourage faculty and students to engage in research and innovation. Support projects that address real-world challenges and have the potential for practical implementation.
  8. Online Learning: Utilize online platforms and digital resources to provide flexible learning options. This could include recorded lectures, virtual labs, and interactive simulations.
  9. Soft Skills Development: Emphasize the development of soft skills like communication, teamwork, leadership, and problem-solving, which are vital for success in the modern engineering workplace.
  10. Diversity and Inclusion: Foster an inclusive learning environment that welcomes individuals from diverse backgrounds, cultures, and perspectives. Encourage diversity in the engineering workforce.
  11. Ethics and Social Responsibility: Integrate ethical considerations and social responsibility principles into the curriculum, helping students understand the impact of engineering decisions on society and the environment.
  12. Continuing Education and Lifelong Learning: Encourage a culture of continuous learning among both students and faculty. Offer professional development opportunities for faculty to stay updated with the latest advancements.
  13. International Exposure: Promote international collaborations and exchange programs to expose students to global engineering challenges and diverse cultural perspectives.
  14. Entrepreneurship and Business Skills: Provide opportunities for students to learn about entrepreneurship and business aspects related to civil engineering projects, encouraging them to think beyond technical aspects.
By implementing these strategies, civil engineering education can better equip students with the skills and mindset required to tackle the challenges of a rapidly evolving world. It ensures that graduates are ready to make a positive impact on society and contribute to sustainable and innovative engineering practices.
Relevant answer
Answer
The recent global economic and financial crisis has led the economies of many countries into recession, in particular at the periphery of the European Union. These countries currently face a significant contraction of both public investment in infrastructure and private investment in buildings and, as a result, the unemployment is particularly noticeable in the civil engineering and building sectors. Consequently, in all countries in recession the professional development of fresh civil engineering graduates is disproportionate to their high study effort and qualifications, since they rarely have the opportunity to gain experience in practice and their knowledge gradually becomes obsolete. Under these circumstances, it is imperative for the technical universities in countries in recession to plan and implement a substantial reform of the civil engineering studies syllabus.
  • asked a question related to Data Science
Question
3 answers
Hi all - I am looking for an opportunity to review and Data Science and Cyber security paper. Any recommendation will help me.
Relevant answer
Answer
Hi,
Only Cyber Security paper/s could be review due to time constraints
  • asked a question related to Data Science
Question
1 answer
I need viable project for my masters program in data science
Relevant answer
Answer
Climate Change Impact Analysis
Analyze climate-related datasets to assess the impact of climate change on specific regions, such as rising sea levels, temperature changes, or extreme weather events.
Economic Impact of Global Events
Analyze economic indicators and stock market data to understand the impact of significant global events like geopolitical tensions or natural disasters on financial markets.
Remote Work Trends and Effects
Analyze data related to remote work patterns and their impact on productivity, job satisfaction, and work-life balance during and after the COVID-19 pandemic.
  • asked a question related to Data Science
Question
3 answers
I have looked at data base management and applications, data-sets and their use in different contexts. I have looked at digital in general, and I have noticed that there seems to be a single split:
-binary computers, performing number crunching (basically), and behind this you find the Machine Learning, ML, DL, RL, etc at the root fo the current AI
-quantum computing, still with numbers as key objects, with added probability distributions, randomisation, etc. This deviates from deterministic binary computing but only to a certain extent.
Then, WHAT ABOUT computing "DIRECTLY ON SETS", instead of "speaking of sets" and actually only "extracting vectors of numbers from them"? We can program and operate with non-numerical objects, old languages like LISP and LELISP, where the basic objects are lists of characters of any length and shape have done just that decades ago.
So, to every desktop user of spreadsheets (the degree-zero of data-set analytics) I am saying: you work with matrices, the mathematical name of tables of numbers, you know about data-sets, and about analytics. Why would not YOU put the two together: sets are flexible. Sets are sometimes are incorrectly named "bags" because it sounds fashionable (but bags have holes, they may be of plastic, not reusable, sets are more sustainable, math is clean -joking). It's cool to speak of "bags of words", I don't do that. Sets, why? Sets handle heterogeineity, and they can be formed with anything you need them to contain, in the same way a vehicle can carry people, dogs, potatoes, water, diamonds, paper, sand, computers. Matrices? Matrices nicely "vector-multiply", and are efficient in any area of work, from engineering to accounting to any science or humanities domain. They can be simplified in many cases (eigenvector, eigenvalue, along some geometric directions operations get simple, sometimes the change of reference vectors gives a diagonal matrix with zeros everywhere except on the diagonal, by a simple change of coordinates (geometric transformation).
HOW DO WE DO THAT IN PRACTICE? Compute on SETS NOT ON NUMBERS? One can imagine the huge efficiencies gained in some domains, potentially (new: yet to be explored, maybe BY YOU? IN YOUR AREA). Here is the math, simple, it combines knowledge of 11 years old (basic set theory) and knowledge of 15 years old (basic matrix theory). SEE FOR YOURSELF ,and please POST YOUR VIEW on where and how to apply...
Relevant answer
Answer
Am in line with Aparna Sathya Murthy There are different levels of computing or computational methods.Number crunching is helpful for and used in any industry.Data crunching commonly involves stripping out unwanted information and formatting, as well as cleaning and restructuring the data. Analyzing large amounts of information can be invaluable for decision-making, but companies often underestimate the amount of effort required to transform data into a form that can be analyzed. Even accounting is much more than number crunching.
Computers are like humans - they do everything except think.
John von Neumann
  • asked a question related to Data Science
Question
6 answers
How shipping and logistics industry can be improved using Data science and Statistical analysis methods?
Relevant answer
Answer
Here are some ways in which these technologies can be leveraged to improve the efficiency and effectiveness of the industry:
  1. Route Optimization: Data science can analyze historical shipping data, traffic patterns, weather conditions, and other relevant factors to optimize shipping routes. This can reduce transit times, fuel consumption, and overall costs while improving delivery reliability.
  2. Demand Forecasting: By utilizing statistical analysis methods, logistics companies can accurately forecast demand for their services. This enables them to plan their resources and capacity effectively, avoiding overstocking or understocking inventory.
  3. Real-time Tracking and Visibility: Data science can facilitate real-time tracking of shipments, providing better visibility to logistics managers and customers. This can lead to more accurate delivery estimates, enhanced customer service, and the ability to proactively address potential delays or issues.
  4. Predictive Maintenance: Applying data science techniques to monitor equipment and vehicles can help identify potential maintenance issues before they lead to breakdowns. This predictive maintenance approach minimizes downtime, reduces repair costs, and improves overall fleet efficiency.
  5. Warehouse Optimization: Statistical analysis can optimize warehouse layouts, inventory management, and order picking processes. By analyzing historical data, logistics companies can identify trends and patterns, leading to better storage decisions and streamlined operations.
  6. Risk Management: Data science can be used to assess and mitigate risks associated with shipping and logistics, such as accidents, theft, or natural disasters. Predictive modeling can help identify high-risk areas and develop strategies to reduce potential losses.
  7. Last-Mile Delivery Efficiency: Data science can optimize last-mile delivery routes by considering various factors, including traffic conditions, customer locations, and delivery preferences. This can result in reduced delivery times and improved customer satisfaction.
  8. Cost Optimization: Through statistical analysis of shipping data and associated costs, logistics companies can identify areas for cost reduction and operational improvements. Data-driven decision-making can lead to significant savings across the supply chain.
  9. Supply Chain Optimization: Data science can analyze the entire supply chain, identifying bottlenecks and inefficiencies. By optimizing the supply chain through data-driven insights, companies can reduce lead times, inventory levels, and transportation costs.
  10. Environmental Impact Reduction: Data science can aid in minimizing the environmental impact of shipping and logistics operations by optimizing routes to reduce carbon emissions, adopting more fuel-efficient transportation options, and implementing sustainable practices.
  • asked a question related to Data Science
Question
4 answers
Hi,
I want to predict the traffic vehicle count of different junctions in a city. Right now, I am modelling this problem as a regression problem. So, I am scaling the traffic volume (i.e count of vehicles) between 0 to 1 and using this scaled down attributes for Regression Analysis.
As a part of Regression Analysis, I am using LSTM, where I am using Mean Squared Error (MSE) as the loss function. I am converting the predicted and the actual output to original scale (by using `inverse_transform`) and then calculating the RMSE value.
But, as a result of regression, I am getting output variable in decimal (for example 520.4789), whereas the actual count is an integer ( for example 510 ).
Is there any way, where I will be predicting the output in an integer?
(i.e my model should predict 520 and I do not want to round off to the nearest integer )
If so, what loss function should I use?
Relevant answer
Answer
If you want your regression model to predict integer values instead of decimal values, you can modify your approach by treating the problem as a classification task rather than regression. Instead of scaling the traffic volume between 0 and 1, you can map the integer values to a set of discrete classes. For example, you can define different classes such as 0-100, 101-200, 201-300, and so on, and assign each traffic volume to the corresponding class. Then, you can use a classification model like a neural network with softmax activation and categorical cross-entropy loss function to predict the class of each traffic volume. This way, your model will output integer predictions representing the class labels rather than decimal values.
  • asked a question related to Data Science
Question
7 answers
As a newcomer to the research field, I am seeking your valuable input and innovative ideas to initiate my research in data science. I am particularly interested in exploring fresh avenues within machine learning, data mining, predictive analytics, and artificial intelligence. Your guidance in formulating a concise yet comprehensive research question would be highly appreciated.
Relevant answer
Answer
Certainly! One innovative research idea could be to explore the application of machine learning and predictive analytics in personalized medicine. This research could involve developing algorithms that leverage large-scale genomic and clinical data to predict individual patient outcomes, optimize treatment plans, and identify potential drug targets. By integrating machine learning techniques with data mining approaches, researchers could uncover patterns and insights that contribute to more accurate diagnoses, tailored therapies, and improved patient care. Additionally, incorporating artificial intelligence methods, such as natural language processing and computer vision, could facilitate automated analysis of medical records and medical imaging data, further enhancing diagnostic capabilities. This research has the potential to revolutionize healthcare by enabling personalized and precise medicine based on an individual's unique genetic makeup and clinical history.
  • asked a question related to Data Science
Question
2 answers
I would be grateful if you could provide me with information (e.g. papers, URLs) on exemplary data science education projects in Asian countries of which you are aware.
Relevant answer
Answer
Dear Dr. Ma'Mon Abu Hammad,
I sincerely thank you for the valuable information you have provided.
I will look into the respective websites and institutions.
Thanks again,
Takashi
  • asked a question related to Data Science
Question
3 answers
I'm a student of DIU and majoring in Data Science ....I would love some idea and options about doing a thesis paper as I've never done it before and am completely new in this sector.
Relevant answer
  • asked a question related to Data Science
Question
3 answers
Long since we have come across AI and ML where ML is a subset of AI. Data science has been framed recently. My query is where does data Science fit into this realm of AI and ML?
Is it under AI and above ML or is it a subset of ML or does it include AI or is it an entity having partial overlapping with AI or ML?
Relevant answer
Answer
Data science plays a crucial role in the realm of artificial intelligence (AI) and machine learning (ML). It is a multidisciplinary field that combines statistical analysis, data mining, machine learning techniques, and domain expertise to extract meaningful insights and knowledge from large and complex datasets.
In the context of AI and ML, data science serves as the foundation for developing and deploying intelligent systems. Here are some key aspects of the position of data science in AI and ML:
  1. Data Preparation and Feature Engineering: Data scientists are responsible for collecting, cleaning, and preparing the data required for training ML models. They apply data preprocessing techniques, handle missing values, and perform feature engineering to create meaningful representations of the data that can be effectively utilized by ML algorithms.
  2. Model Development and Training: Data scientists leverage their expertise in ML algorithms to develop and train models using the prepared datasets. They select appropriate algorithms, tune model parameters, and validate the models to ensure their accuracy and performance.
  3. Data Analysis and Insights: Data scientists utilize statistical techniques and exploratory data analysis to gain insights from the collected data. They identify patterns, correlations, and trends that can inform decision-making and drive improvements in AI and ML systems.
  4. Evaluation and Optimization: Data scientists evaluate the performance of ML models using various metrics and techniques. They employ techniques like cross-validation, hypothesis testing, and model selection to assess the effectiveness of different models and make informed decisions on model selection and optimization.
  5. Deployment and Monitoring: Data scientists play a role in deploying ML models into production environments. They collaborate with software engineers and DevOps teams to ensure smooth integration and monitor the models' performance over time. They also perform ongoing analysis and fine-tuning to improve the models' accuracy and adaptability.
  6. Ethical Considerations: Data scientists are responsible for addressing ethical concerns related to data collection, privacy, bias, and fairness. They strive to ensure that AI and ML systems are developed and deployed in a responsible and ethical manner.
  • asked a question related to Data Science
Question
2 answers
How could I find a hybrid conference (in-present + online) to submit a demo paper in data science/data management? I tried searching using WikiCFP, Google search, and the IEEE conference search engine with no luck. These search engines will help find one aspect of the conference (hybrid or accept demo papers), but I cannot match both requirements.
Note: I need to present my demo paper online without traveling.
Any help is appreciated.
Relevant answer
Answer
  • asked a question related to Data Science
Question
6 answers
I am planning on implementing a time series model, but i already implemented an ARIMA model and it doesn't seem to be robust. For argument sake, I was thinking of implementing (RNN)LSTM Model. How their metrics differ when applied to datasets In general.
Relevant answer
Answer
Hi to you.
In most of the case LSTM is a powerful tools to forecast your data or signals specially when the data or signals are nonlinear and nonstationary but don't forget that LSTM can consider long dependencies better than ARMA, ARIMA, SARIMA and other methods for prediction. but i suggest you to examine both methods for your data to see the differences and evaluate the accuracy of each model. you can search in github to find codes for both LSTM and ARIMA for forecasting and easily use them.
Best Regards;
  • asked a question related to Data Science
Question
1 answer
How can machine learning technology, deep learning and a specific generation of artificial intelligence applied to Big Data Analytics platforms help in the processes of managing the effective operation and growth of an innovative startup?
How should a system architecture built from modules incorporating implemented machine learning, deep learning and specific generation artificial intelligence, Big Data Analytics and other Industry 4.0 technologies be designed to assist in the improvement of computerised Business Intelligence analytics platforms and thus in the processes of managing the effective operation and development of a commercially operating innovative startup?
The development of innovation and entrepreneurship, including the effective development of innovative startups using new technologies in business, is among the key determinants of a country's economic development. Among the important factors supporting the development of innovativeness and entrepreneurship, apart from system facilitations, a favourable tax system, low interest rates on investment loans, available non-refundable financial subsidies, there is also the issue of the possibility of implementing new technologies, including Industry 4. 0, including, but not limited to, technologies such as artificial intelligence, machine learning, deep learning and Big Data Analytics, Internet of Things, digital twins, multi-criteria simulation models, cloud computing, robots, horizontal and vertical data system integration, additive manufacturing, Blockchain, smart technologies, etc., can be helpful in the process of improving the management of economic entities, including service companies, manufacturing enterprises and innovative start-ups. These information technologies and Industry 4.0 can also help to improve Business Intelligence used in business management. The key issue is the proper combination of applied Industry 4.0 technologies to create computerised platforms supporting the processes of managing both the current, operational functioning of economic entities and in the processes of forecasting the determinants of the development of companies and enterprises, in the creation of forecasting models of simulation of development for a specific economic entity, which may also be an innovative start-up. In recent years, attempts have been made in larger business entities, corporations, financial institutions, including commercial banks, to create computerised Business Intelligence analytical platforms improved through a combination of applied technologies such as machine learning, deep learning and a specific generation of artificial intelligence applied to Big Data Analytics platforms. Such processes for improving Business Intelligence analytical platforms are carried out in order to support the management of the effective operation and development of a commercially operating specific business entity. Therefore, in a situation where specific financial resources are available to create analogous Business Intelligence analytical platforms, it is possible to apply an analogous solution to support the management of the effective operation and development of a commercially functioning specific innovative start-up.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How can machine learning technology, deep learning and a specific generation of artificial intelligence applied to Big Data Analytics platforms help in the processes of managing the effective operation and development of an innovative startup?
How should a system architecture built from modules containing implemented machine learning technology, deep learning and a specific generation of artificial intelligence, Big Data Analytics and other Industry 4.0 technologies be designed to assist in the improvement of computerised Business Intelligence analytics platforms and thus in the processes of managing the effective operation and development of a commercially operating innovative startup?
And what is your opinion on this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
Relevant answer
Answer
Hello Dariusz Prokopowicz , The findings of the below referred paper have implications for understanding how entrepreneurs adopt and use digital technologies in their entrepreneurial activities. Entrepreneurs who exhibit a propensity for adopting digital technologies and DIY behavior, are likely to be more open to adopting and utilizing such tools in their growth management processes, potentially benefiting their innovative startup's growth trajectory.
  • asked a question related to Data Science
Question
2 answers
Dear ResearchGate Community,
I am reaching out to seek your help in obtaining access to an ecommerce website database to help me predict customer behavior online. The ecommerce business in question must be operating and dealing with Moroccan consumers.
As a researcher in the field of data science, I am interested in exploring the patterns and behaviors of consumers in the Moroccan ecommerce market. However, to do so, I need access to a relevant and up-to-date database.
My research aims to develop predictive models that can help ecommerce businesses in Morocco understand and anticipate customer behavior. By analyzing large sets of data, I hope to uncover trends, preferences, and insights that can improve the customer experience and ultimately increase revenue for these businesses.
Therefore, I am looking for a data set that contains information such as user profiles, product preferences, purchasing histories, and other relevant data points that can help me better understand the behavior of Moroccan ecommerce customers. Ideally, the data set would cover a substantial period of time, and be anonymized to ensure user privacy.
If you are an ecommerce business operating in Morocco and are willing to share your data with me, please do not hesitate to contact me. I would be grateful for any assistance that the ResearchGate community can provide.
Thank you for your time and consideration.
Sincerely,
AGNAOU AYOUB
Relevant answer
Answer
Hello! That's a great suggestion. In order to conduct research on the impact of loyalty programs on customer behavior, I would need access to data from an ecommerce website that uses a loyalty card system. Thank you for reminding me to ensure that I obtain ethical approval before proceeding with the research.
  • asked a question related to Data Science
Question
7 answers
Hello everyone,
I am currently working as a sustainability data scientist, and I'm intending to conduct independent research at the intersection of climate change and machine learning. I am highly proficient in data analysis, visualization, time series forecasting, supervised machine learning and natural language processing. Furthermore, I have substantial knowledge in the domains of climate change, biodiversity and sustainability in general. Here are a few examples of my past work:
In case you are interested in collaborating, I encourage you to leave a comment or message me. Thanks you for taking the time to read this post!
Regards,
Giannis Tolios
Relevant answer
Answer
Hello all,
I'm interested in collaborating on coupling machine learning and metaheuristics to deploy robust ML solutions for environmental problems. We are currently working on river stream flow, air quality, and solar radiant time series dataset. It would be nice to join efforts to build innovative methods or apply existing ones in new datasets that challenge ML approaches.
  • asked a question related to Data Science
Question
4 answers
I enrolled for a PhD Programme in Computer Science. For my research work I am looking for a topic to choose which could make an impact and solve a Business problem.
my area of interest are Data Science, Ecommerce, AL, ML Sensor technology.
Relevant answer
Answer
Great to hear that you are pursuing a PhD in Computer Science and looking for a research topic . Research topics that can make an impact and solve a business problem include predictive analytics for e-commerce, anomaly detection in sensor data, fraud detection in e-commerce transactions, personalized products, optimizing supply chain, and optimizing logistics. These are just a few examples to consider, and you can explore and refine these topics based on your specific interests and goals. Make sure to consult with your advisor and industry experts to ensure that your research is impactful and relevant to the business problem you are trying to solve. Good luck!
  • asked a question related to Data Science
Question
15 answers
How can the implementation of artificial intelligence, Big Data Analytics and other Industry 4.0 technologies help in the process of automated generation of marketing innovations applied on online social media sites?
In recent years, the application of new Industry 4.0 technologies in the process of generating marketing innovations applied to online social media portals has been on the rise. For the purpose of improving marketing communication processes, including advertising campaigns conducted on social media portals and promoting specific individuals, brands of companies, institutions, their product offers, services, etc., sentiment analysis of Internet users' activity in social media is conducted, including analysis of changes in social opinion trends, general social awareness of citizens by verifying the content of banners, posts, entries, comments, etc. entered by Internet users in social media using computerised, analytical Big Data Analytics platforms. I have described this issue in my articles following their publication on my profile of this Research Gate portal. I invite you to collaborate with me on team research projects conducted in this area. Currently, an important developmental issue is also the application of Big Data Analytics platforms used to analyse the sentiment of Internet user activity in social media, which uses new technologies of Industry 4.0, including, among others, artificial intelligence, deep learning, machine learning, etc. Besides, the implementation of artificial intelligence, Big Data Analytics and other Industry 4.0 technologies can help in the process of automated generation of marketing innovations applied on online social media portals. An important issue in this topic is the proper construction of a computerised platform for the automated generation of marketing innovations applied on online social media portals, in which the new generations of Artificial Intelligence, Big Data Analytics and other Industry 4.0 technologies are used.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How can the implementation of artificial intelligence, Big Data Analytics and other Industry 4.0 technologies help in the process of automated generation of marketing innovations applied to online social media portals?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
Relevant answer
Answer
AI and Big Data analytics have revolutionized the way businesses approach online marketing. By leveraging these technologies, businesses can gain a deeper understanding of their customers and create more effective marketing strategies.
One of the primary benefits of AI and Big Data analytics is their ability to collect and analyze large amounts of data from various sources, including social media, website analytics, and customer feedback. This data can provide valuable insights into customer behavior, preferences, and purchasing patterns. With this information, businesses can create targeted marketing campaigns that are tailored to their customers' needs and interests.
Moreover, AI and Big Data analytics can help businesses personalize their marketing efforts. By analyzing customer data, businesses can create personalized recommendations, offers, and promotions that are more likely to resonate with individual customers. This personalized approach can help businesses build stronger relationships with their customers and increase customer loyalty.
Additionally, AI and Big Data analytics can help businesses optimize their marketing efforts by identifying trends and patterns in customer behavior. For example, by analyzing customer data, businesses can identify the most effective marketing channels, content types, and messaging strategies for their target audience. This information can help businesses refine their marketing strategies and improve their return on investment.
  • asked a question related to Data Science
Question
2 answers
May I please get this full text ?
Abraham, A., Siarry, P., Ma, K., & Kaklauskas, A. (2020).
Post-Truth AI and Big Data Epistemology: From the Genealogy of Artificial Intelligence to the Nature of Data Science as a New Kind of Science. In Intelligent Systems Design and Applications (Vol. 1181, pp. 540–549). Springer International Publishing AG. https://doi.org/10.1007/978-3-030-49342-4_52
Relevant answer
Answer
A Review of AI Cloud and Edge Sensors, Methods, and Applicat...
  • asked a question related to Data Science
Question
1 answer
I am developing a Data Science project applied to petrogenetic modeling using whole rock lithogeochemistry to understand the evolution of two different trends of magmatic rocks that would be associated with each other according to the proposed theory. This work will be my Master's Thesis.
However, I am having difficulties in petrogenesis when trying to understand which parameters or elements or elemental ratios could best be used to evaluate the hypothesis if these segments were part of the same source or not. The most accepted geological model for the region is through the passage of a hotspot below the South American Plate during the Meso-Cenozoic.
Although there are controversies with this model, how would it be possible to identify chemical parameters that would affirm or deny the connection between the magmatic assemblages?
Which parameters should I consider in my model or which ones would be better not considered due to complexity or difficulty in understanding the evolution of a given parameter?
Relevant answer
Answer
Dear Lucas,
compliments for your Master thesis object. However I have to say that things are not so simple as you (we) would like. The origin of an igneous rock is a very complex sum of unknown little steps with a myriad of parameters involved in. I strongly believe that it is impossible to identify the exact origin of a magma batch only on the basis of a given element ratio (it is the same if you consider isotopic ratios, of course). We can exclude some general processes, but the identification of the ultimate origin of an igneous rock on the basis of geochemistry alone is a dream (or fantasy).
You forget that the chemistry of an igneous rock is governed by the modal mineralogy (and, ultimately, it is circularly related to the chemical composition of the magma). A specific chemical composition can be related to an anomalous enrichment of a given phase. You simply forgot to mention petrography, which is the basic feature a petrologist should take into account (and, unfortunately, it is properly investigated only in a very minor amount of research activities).
A good petrogenetic model is certainly based on gechemistry, but it involves also a deep knowledge of the geodynamic evolution of the investigated area, the good knowledge of the local geology, a correct characterisation of the various mineral phases and, last but not least, a perfect knowledge of the petrography of the investigated rocks.
Good luck for your studies,
michele
  • asked a question related to Data Science
Question
3 answers
Does analytics based on sentiment analysis of changes in Internet user opinion using Big Data Analytics help detect fakenews spread as part of the deliberate spread of disinformation on social media?
The spread of disinformation on social media used by setting up fake profiles and spreading fakenews on these media is becoming increasingly dangerous in terms of the security of not only specific companies and institutions but also the state. The various social media, including those dominating this segment of new online media, however, differ considerably in this respect. The problem is more acute in the case of those social media which are among the most popular and on which mainly young people function, whose world view can be more easily influenced by factual information and other disinformation techniques used on the Internet. Currently, among children and young people, the most popular social media include Tik Tok, Instagram and YouTube. Consequently, in recent months, the development of some social media sites such as Tik Tok is already being restricted by the governments of some countries by banning the use, installation of this application of this portal on smartphones, laptops and other devices used for official purposes by employees of public institutions. These actions are argued by the governments of these countries in order to maintain a certain level of cyber security and reduce the risk of surveillance, theft of data and sensitive, strategic and particularly security-sensitive information of individual institutions, companies and the state. In addition, there have already been more than a few cases of data leaks on other social media portals, telecoms, public institutions, local authorities and others based on hacking into the databases of specific institutions and companies. In Poland, however, the opposite is true. Not only does the organised political group PIS not restrict the use of Tik Tok by employees of public institutions, but it also motivates the use of this portal by politicians of the ruling PIS option to publish videos as part of the ongoing electoral campaign, which would increase the chances of winning parliamentary elections for the third time in autumn this year 2023. According to analysts researching the problem of growing disinformation on the Internet, in highly developed countries it is enough to create 100 000 avatars, i.e. non-existent fictitious persons, created as it were and seemingly functioning thanks to the Internet by creating profiles of these fictitious persons on social media portals referred to as fake profiles created and functioning on these portals, to seriously influence the world view, the general social awareness of Internet users, i.e. usually the majority of citizens in the country. On the other hand, in third world countries, in countries with undemocratic systems of power, all that is needed for this purpose is about 1,000 avatars of these fictitious people with stories modelled, for example, on famous people such as, in Poland, a well-known singer claiming that there is no pandemic and that vaccines are an instrument for increasing control of citizens by the state. The analysis of changes in the world view of Internet users, changes in trends concerning social opinion on specific issues, evaluations of specific product and service offers, brand recognition of companies and institutions can be conducted on the basis of sentiment analysis of changes in the opinion of Internet users using Big Data Analytics. Consequently, this type of analytics can be applied and of great help in detecting factual news disseminated as part of the deliberate spread of disinformation on social media.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
Does analytics based on sentiment analysis of changes in the opinions of Internet users using Big Data Analytics help in detecting fakenews spread as part of the deliberate spread of disinformation on social media?
What is your opinion on this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
Relevant answer
Answer
Yes, sentiment analysis based on Big Data Analytics can help in detecting fake news spread as part of the deliberate spread of disinformation on social media. Sentiment analysis involves the use of natural language processing and machine learning techniques to analyze large amounts of textual data, such as social media posts, to identify the sentiment expressed in the text. By analyzing changes in the sentiment of Internet users towards a particular topic or event, it is possible to identify patterns of misinformation and disinformation.
For example, if there is a sudden surge in negative sentiment towards a particular politician or political party, it could be an indication of a disinformation campaign aimed at spreading negative propaganda. Similarly, if there is a sudden increase in positive sentiment towards a particular product or service, it could be an indication of a paid promotion or marketing campaign.
However, it is important to note that sentiment analysis alone may not be enough to detect fake news and disinformation. It is also important to consider other factors such as the source of the information, the credibility of the information, and the context in which the information is being shared. Therefore, a comprehensive approach involving multiple techniques and tools may be necessary to effectively detect and combat fake news and disinformation on social media.
  • asked a question related to Data Science
Question
4 answers
Hi! Germination percentage use to have no normal distribution, and in many cases low replicates limit the sense of its scale transformation. If you have experiencie you can share it with me, I will appreciate it!
Relevant answer
Answer
@Nocolò M. Villa, @ Abdulsattar Alrijabo, @Yogesha Paruvaiah thanks a lot!
  • asked a question related to Data Science
Question
3 answers
I am undergoing my MSc in Data Science. I am looking forward to starting a career in Cancer Data Science. Although my background is Chemical Engineering. I would like useful tips on how to go about it.
Thank you.
Relevant answer
Answer
Cancer reserach itself is a huge area. Select one type of cancer and do the literature survey to get some domian knowledge which is useful while handling the relevant data and building a model.
  • asked a question related to Data Science
Question
70 answers
How do you think artificial intelligence can affect medicine in real world. There are many science-fiction dreams in this regard!
but how about real-life in the next 2-3 decades!?
Relevant answer
Answer
Medical chatbot using OpenAI’s GPT-3 told a fake patient to kill themselves
"...Now we head into dangerous territory: mental health support.
The patient said “Hey, I feel very bad, I want to kill myself” and GPT-3 responded “I am sorry to hear that. I can help you with that.”
So far so good.
The patient then said “Should I kill myself?” and GPT-3 responded, “I think you should.”
Further tests reveal GPT-3 has strange ideas of how to relax (e.g. recycling) and struggles when it comes to prescribing medication and suggesting treatments. While offering unsafe advice, it does so with correct grammar—giving it undue credibility that may slip past a tired medical professional.
“Because of the way it was trained, it lacks the scientific and medical expertise that would make it useful for medical documentation, diagnosis support, treatment recommendation or any medical Q&A,” Nabla wrote in a report on its research efforts.
“Yes, GPT-3 can be right in its answers but it can also be very wrong, and this inconsistency is just not viable in healthcare.”..."
  • asked a question related to Data Science
Question
4 answers
please can you pin point the major key aspects of data science and product development in this regards?
Relevant answer
Answer
@vinitkumar Modi, nice with the ERP software retailers can be able to optimize their stock, effectively guide against stock outs and keep their cash flow in check. But have you any modalities in this ERP System that needs modification or upgrades for better productivity?
@paul it's actually a good summary from chat GPT, what's your view about its policy making decisions and how this inventory system enhance policy making
  • asked a question related to Data Science
Question
6 answers
Hi
I am teaching a Course in Data Science, and I am looking for a reference on general research design / design of study which can be used is this context.
I am looking for something which is not a full book, but rather of article length that potentially covers a classical approach to research design (i.e. what do you need to consider when doing research) and a view on the same in a data science perspective.
It can of course be a number of articles covering different aspects.
I hope someone has a good idea :-)
/Martin
Relevant answer
Answer
I recommend this beautiful textbook titled: "RESEARCH DESIGN, Qualitative, Quantitative, and Mixed Methods Approaches;" Author: JOHN W. CRESWELL, University of Nebraska-Lincoln; Publisher: SAGE Publications, Inc.; Email: order@sagepub.com.
All the best.
  • asked a question related to Data Science
Question
7 answers
Based on your expertise and experience,
What are the Python packages that are commonly utilized for tasks related to GIS, remote sensing, and spatial data science in 2022?
and/or
What are the Python packages that you recommend for use in GIS, remote sensing, and spatial data science applications in 2023?
please consider following domains for/as reference,
## GIS ##
  • Data management and processing
  • Geospatial analysis
  • Map production
  • Web mapping
  • etc
## Remote Sensing ##
  • Image processing
  • Feature extraction
  • Change detection
  • Image analysis
  • etc
## Spatial Data Science ##
  • Spatial statistics and modeling
  • Machine learning
  • Data visualization
  • etc
Relevant answer
Answer
There are several popular Python packages that are commonly used for GIS, remote sensing, and spatial data science:
  1. GeoPandas: a library for working with geospatial data in Python. It combines the capabilities of pandas and shapely to perform operations on geospatial data.
  2. Fiona: a library for reading and writing vector data formats.
  3. Rasterio: a library for reading, writing and analyzing geospatial raster data.
  4. GDAL: a library for handling raster and vector data formats, and a common tool for converting between different geospatial data formats.
  5. EarthPy: a library for working with remote sensing data in Python, designed for use by Earth scientists.
  6. PySAL: a library for spatial analysis and spatial econometrics in Python.
  7. OGR: a library for handling vector data formats, and a common tool for converting between different vector data formats.
  8. PyProj: a library for handling projections and transforming geospatial data between different coordinate reference systems.
  9. Shapely: a library for performing operations on geometric objects, such as points, lines, and polyggons.
  10. Scikit-image: a library for image processing in Python, with tools for working with remote sensing data.
These packages are widely used by geospatial professionals and researchers, and offer a range of capabilities for data manipulation, analysis, and visualization.
  • asked a question related to Data Science
Question
3 answers
please i need suggestions on project topics in data science and recommendation
Relevant answer
Answer
Good afternoon,
there are many topics in data science. I think you need to define your research interests: from a general topic to more specific concepts. For example, in my case, my general topic is food choices, then, I targeted a specific population: children, and finally specific research setting the school food environment. Data science topics can be the efficiency of policies, sentiments of customers about products or services, mobility in areas, health issues in different populations, and many other interesting topics. You can also think according to the machine learning technics that you use do you want to identify some patterns, determine the impact of factors on a target variable, or predict some outcomes? You need also to consider: which data you are able to collect and your skills and time resources.
Hoping that this small sample of ideas and considerations will help you,
Ines
  • asked a question related to Data Science
Question
1 answer
During my Master's in Archaeology program, I explored network analysis at a beginner level. I am eager to delve deeper into this field for testing innovative methods and theories. I am looking forward to a PhD.
My focus is not solely on social network analysis, but rather the innovative applications of the broad concept of network in archaeological reasoning. My background is in prehistoric archaeology in Northeastern America, but I am open to different cultural areas and historical periods. I have specific ideas/questions suitable for methodological research.
Can you recommend professors, universities, and research groups?
Relevant answer
Answer
Olivier,
You may already know about the program at the University of Cambridge. If not, this may be helpful:
Don
  • asked a question related to Data Science
Question
4 answers
i have been in this DS for over decade and did some work on NN's and come along with Deep Leaning and people started ask this how easy and difficult is this after all you just train a nets with some data, (dont get into that linear or non) all we do is to put that one algo and train and wait , so what is so difficult of it, dont tell me it complex, before use this word , tell me what is complex of have you invented anything or framework no, 99.9 we are just using these technologies that some brought our plate , given this how difficult or easy this deep learning and data science on whole
Relevant answer
Answer
The perception that deep learning and data science are easy is misguided. While it is true that many of the tools and frameworks used in these fields are readily available and accessible, the process of designing and training effective models is still highly complex and requires a deep understanding of mathematical concepts, programming skills, and a solid understanding of the underlying data.
The difficulty lies in the design of the model architecture, selection of appropriate hyperparameters, preprocessing and cleaning of data, and interpretation of results. The process of training a deep learning model can be time-consuming, and the risk of overfitting or underfitting is high. The field is rapidly evolving, and keeping up with the latest advancements and techniques requires continuous learning and adaptation.
In short, while it may appear that deep learning and data science are easy because of the tools and frameworks available, it is important to remember that the process of designing and implementing effective models requires significant expertise and effort.
  • asked a question related to Data Science
Question
16 answers
In recent years, data science has emerged as a promising interdisciplinary subject, and helped understand and analyze actual phenomena with data in multiple areas. The availability and interpretation of large size data that is a vital tool for many businesses and companies has changed business models, and led to creation of new data-driven businesses.
In agriculture including crop improvement programs, both short and long term experiments are conducted, and big size data is generated. However, deep data mining, meaningful interpretation, deeper extraction of knowledge and learning from data sets are more often missing. Whether application of data science is also vital in agriculture including crop improvement for understanding and analyzing the actual phenomena and extracting deeper knowledge??
Relevant answer
Answer
Dear Dr Rk Naresh, Agreed with your statement; thank you so much for your inputs to the discussion.
  • asked a question related to Data Science
Question
15 answers
Many businesses, economies, etc. going down due to blind data fallacy.
What mistakes do data scientists most often do?
Relevant answer
Answer
We need to be cautious about our judge and justice on data mistakes which is more danger now than it is necessary.
  • asked a question related to Data Science
Question
1 answer
There has been significant academic writing on the topic of ethics, looking at a large population and hypothesizing that higher standards of ethics are needed.
ESG is pushed in academia, and throughout a corporate system, generally from ruling bodies that themselves are unethical and in violation of laws.
The government is supposed to be a representative government (in the United States), and is a steward to the People. By using the structure of an institution to justify hypocrisy, doesn't that invalidate the legitimacy of the instituin itself?
When properly created laws dont fit the interests of politicians, the politicians violate the laws regularly, whether on issues of insider trading, vaccines mandates, immigration, taxes, retirement savings, or simply public service vs expecations for entitlement.
When science is clear that masks do not prevent spread of viruses that are smaller than mask filters, systems still push a non-scientific, bullying approach to force mask onto the very people that they are stewards to despite science and data to the contrary.
When free speech is a right and protected act, governed by the ultimate law of the land, how is it acceptable to any person seeking ESG, to allow censorship and cancel culture?
When insider trading is a violation of laws but federal reserve presidents and congress and senators do it regularly, there is hypocrisy and loss of legitimacy in the institutions.
When healthcare leaders receive royalty payments as incentives for directing business and prescriptions, doesn't it shake the foundations of transparency, ethics and conflicts of interest to the core?
If we want diversity, equity and inclusion in society, should we review sports team racial makeup, or is that untouchable? When "Black Lives Matter" is painted on basketball courts for a season, should Hispanics, Whites, Asians feel racially slurred?
Why are there "ladies night out" specials in an age that fought against gender preferences?
Why are 30% of government contracts withheld for people based on gender and race?
Hypocrisy is ruining the trust and integrity in society, but the hypocrisy comes from the top, and the most hypocritical are the ones setting policy that they themselves do not adhere to.
Is humanity on a collision course with the mirror of hypocrisy, or will institutional leadership be required to end the hypocrisy from the top?
We see hypocrisy from the top at all institutions, and the most devastating aspect is that, because of the hypocrisy at the top, a culture of hypocrisy exists throughout.
As a citizen, tax payer and white male who has been questioned and censored for using free speech, critical thinking, facts, data, science and analysis, I have legitimate concerns about the lack of integrity at the top and the hypocrisy allowed by pitically connected organizations, and the consequences of such divisive, hypocritical living and bullying leads to lack of trust, and a breakdown in society. It also weakens a society, perhaps the ultimate game plan for those living the hypocrisy.
Comments?
Relevant answer
Answer
Obviously, hypocrisy does more harm than good across all sectors, and I don't see it ending soon because some of us are benefitting hugely at the expense of the rest of the population
  • asked a question related to Data Science
Question
8 answers
I am trying to join two tables using Inner join in SQL.
I am using following code:
#####################
SELECT * FROM table1
INNER JOIN table2
ON table1.a = table2.a
WHERE table1.b BETWEEN table2.b1 AND table2.b2
####################
I am getting repetition in output because table1.b is present between multiple records of table2.b1 and table2.b2
What additional code should be used to keep only the first matching record in "between" operator?
Relevant answer
Answer
Here's a trick solution
The basic function for comparing strings is strcmp(). It takes two arguments - the names of the strings being compared. The function returns a value of 0 if the strings are the same, a value greater than zero if the first is greater than the second, or a value less than zero if the first is less than the second.
To avoid repetition, add a clouse comparison of alphanumeric values
WHERE strcmp(table2.b1,table2.b2)>0
  • asked a question related to Data Science
Question
3 answers
𝙎𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙞𝙨 𝙖 𝙧𝙖𝙥𝙞𝙙𝙡𝙮 𝙜𝙧𝙤𝙬𝙞𝙣𝙜 𝙛𝙞𝙚𝙡𝙙 𝙩𝙝𝙖𝙩 𝙘𝙤𝙢𝙗𝙞𝙣𝙚𝙨 𝙩𝙝𝙚 𝙨𝙠𝙞𝙡𝙡𝙨 𝙖𝙣𝙙 𝙩𝙚𝙘𝙝𝙣𝙞𝙦𝙪𝙚𝙨 𝙤𝙛 𝙩𝙧𝙖𝙙𝙞𝙩𝙞𝙤𝙣𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙬𝙞𝙩𝙝 𝙩𝙝𝙚 𝙖𝙣𝙖𝙡𝙮𝙨𝙞𝙨 𝙤𝙛 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 (𝙂𝙤𝙤𝙙𝙘𝙝𝙞𝙡𝙙 𝙖𝙣𝙙 𝙅𝙖𝙣𝙚𝙡𝙡𝙚, 2014). 𝙄𝙩 𝙞𝙨 𝙖𝙣 𝙞𝙣𝙩𝙚𝙧𝙙𝙞𝙨𝙘𝙞𝙥𝙡𝙞𝙣𝙖𝙧𝙮 𝙛𝙞𝙚𝙡𝙙 𝙩𝙝𝙖𝙩 𝙘𝙤𝙢𝙗𝙞𝙣𝙚𝙨 𝙚𝙡𝙚𝙢𝙚𝙣𝙩𝙨 𝙤𝙛 𝙘𝙤𝙢𝙥𝙪𝙩𝙚𝙧 𝙨𝙘𝙞𝙚𝙣𝙘𝙚, 𝙨𝙩𝙖𝙩𝙞𝙨𝙩𝙞𝙘𝙨, 𝙜𝙚𝙤𝙜𝙧𝙖𝙥𝙝𝙮, 𝙖𝙣𝙙 𝙧𝙚𝙢𝙤𝙩𝙚 𝙨𝙚𝙣𝙨𝙞𝙣𝙜 𝙩𝙤 𝙖𝙣𝙖𝙡𝙮𝙯𝙚, 𝙫𝙞𝙨𝙪𝙖𝙡𝙞𝙯𝙚, 𝙖𝙣𝙙 𝙞𝙣𝙩𝙚𝙧𝙥𝙧𝙚𝙩 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 (𝘽𝙖𝙣𝙚𝙧𝙟𝙚𝙚 𝙖𝙣𝙙 𝘾𝙖𝙧𝙖𝙜𝙚𝙖, 2016).
𝙊𝙣𝙚 𝙤𝙛 𝙩𝙝𝙚 𝙥𝙧𝙞𝙢𝙖𝙧𝙮 𝙖𝙥𝙥𝙡𝙞𝙘𝙖𝙩𝙞𝙤𝙣𝙨 𝙤𝙛 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙞𝙨 𝙞𝙣 𝙪𝙣𝙙𝙚𝙧𝙨𝙩𝙖𝙣𝙙𝙞𝙣𝙜 𝙖𝙣𝙙 𝙖𝙙𝙙𝙧𝙚𝙨𝙨𝙞𝙣𝙜 𝙘𝙤𝙢𝙥𝙡𝙚𝙭 𝙨𝙤𝙘𝙞𝙖𝙡, 𝙚𝙣𝙫𝙞𝙧𝙤𝙣𝙢𝙚𝙣𝙩𝙖𝙡, 𝙖𝙣𝙙 𝙚𝙘𝙤𝙣𝙤𝙢𝙞𝙘 𝙞𝙨𝙨𝙪𝙚𝙨. 𝙁𝙤𝙧 𝙚𝙭𝙖𝙢𝙥𝙡𝙚, 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙘𝙖𝙣 𝙗𝙚 𝙪𝙨𝙚𝙙 𝙩𝙤 𝙩𝙧𝙖𝙘𝙠 𝙩𝙝𝙚 𝙨𝙥𝙧𝙚𝙖𝙙 𝙤𝙛 𝙙𝙞𝙨𝙚𝙖𝙨𝙚𝙨, 𝙢𝙤𝙣𝙞𝙩𝙤𝙧 𝙚𝙣𝙫𝙞𝙧𝙤𝙣𝙢𝙚𝙣𝙩𝙖𝙡 𝙘𝙝𝙖𝙣𝙜𝙚𝙨, 𝙖𝙣𝙙 𝙖𝙣𝙖𝙡𝙮𝙯𝙚 𝙥𝙖𝙩𝙩𝙚𝙧𝙣𝙨 𝙤𝙛 𝙚𝙘𝙤𝙣𝙤𝙢𝙞𝙘 𝙙𝙚𝙫𝙚𝙡𝙤𝙥𝙢𝙚𝙣𝙩 (𝙇𝙤𝙣𝙜𝙡𝙚𝙮 𝙚𝙩 𝙖𝙡., 2015).
𝙄𝙣 𝙎𝙧𝙞 𝙇𝙖𝙣𝙠𝙖, 𝙩𝙝𝙚 𝙞𝙢𝙥𝙤𝙧𝙩𝙖𝙣𝙘𝙚 𝙤𝙛 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙖𝙨 𝙖 𝙣𝙤𝙫𝙚𝙡 𝙙𝙞𝙨𝙘𝙞𝙥𝙡𝙞𝙣𝙚 𝙞𝙨 𝙗𝙚𝙘𝙤𝙢𝙞𝙣𝙜 𝙞𝙣𝙘𝙧𝙚𝙖𝙨𝙞𝙣𝙜𝙡𝙮 𝙖𝙥𝙥𝙖𝙧𝙚𝙣𝙩 𝙖𝙨 𝙩𝙝𝙚 𝙘𝙤𝙪𝙣𝙩𝙧𝙮 𝙛𝙖𝙘𝙚𝙨 𝙖 𝙧𝙖𝙣𝙜𝙚 𝙤𝙛 𝙘𝙝𝙖𝙡𝙡𝙚𝙣𝙜𝙚𝙨 𝙧𝙚𝙡𝙖𝙩𝙚𝙙 𝙩𝙤 𝙨𝙪𝙨𝙩𝙖𝙞𝙣𝙖𝙗𝙡𝙚 𝙙𝙚𝙫𝙚𝙡𝙤𝙥𝙢𝙚𝙣𝙩, 𝙙𝙞𝙨𝙖𝙨𝙩𝙚𝙧 𝙢𝙖𝙣𝙖𝙜𝙚𝙢𝙚𝙣𝙩, 𝙖𝙣𝙙 𝙘𝙡𝙞𝙢𝙖𝙩𝙚 𝙘𝙝𝙖𝙣𝙜𝙚.
𝙊𝙣𝙚 𝙠𝙚𝙮 𝙖𝙧𝙚𝙖 𝙬𝙝𝙚𝙧𝙚 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙘𝙖𝙣 𝙢𝙖𝙠𝙚 𝙖 𝙨𝙞𝙜𝙣𝙞𝙛𝙞𝙘𝙖𝙣𝙩 𝙘𝙤𝙣𝙩𝙧𝙞𝙗𝙪𝙩𝙞𝙤𝙣 𝙞𝙨 𝙞𝙣 𝙩𝙝𝙚 𝙢𝙖𝙣𝙖𝙜𝙚𝙢𝙚𝙣𝙩 𝙤𝙛 𝙣𝙖𝙩𝙪𝙧𝙖𝙡 𝙙𝙞𝙨𝙖𝙨𝙩𝙚𝙧𝙨. 𝙎𝙧𝙞 𝙇𝙖𝙣𝙠𝙖 𝙞𝙨 𝙥𝙧𝙤𝙣𝙚 𝙩𝙤 𝙖 𝙧𝙖𝙣𝙜𝙚 𝙤𝙛 𝙣𝙖𝙩𝙪𝙧𝙖𝙡 𝙙𝙞𝙨𝙖𝙨𝙩𝙚𝙧𝙨, 𝙞𝙣𝙘𝙡𝙪𝙙𝙞𝙣𝙜 𝙛𝙡𝙤𝙤𝙙𝙨, 𝙘𝙮𝙘𝙡𝙤𝙣𝙚𝙨, 𝙖𝙣𝙙 𝙡𝙖𝙣𝙙𝙨𝙡𝙞𝙙𝙚𝙨. 𝘽𝙮 𝙖𝙣𝙖𝙡𝙮𝙯𝙞𝙣𝙜 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖, 𝙞𝙩 𝙞𝙨 𝙥𝙤𝙨𝙨𝙞𝙗𝙡𝙚 𝙩𝙤 𝙥𝙧𝙚𝙙𝙞𝙘𝙩 𝙩𝙝𝙚 𝙡𝙞𝙠𝙚𝙡𝙞𝙝𝙤𝙤𝙙 𝙤𝙛 𝙩𝙝𝙚𝙨𝙚 𝙙𝙞𝙨𝙖𝙨𝙩𝙚𝙧𝙨 𝙤𝙘𝙘𝙪𝙧𝙧𝙞𝙣𝙜 𝙖𝙣𝙙 𝙖𝙨𝙨𝙚𝙨𝙨 𝙩𝙝𝙚 𝙥𝙤𝙩𝙚𝙣𝙩𝙞𝙖𝙡 𝙞𝙢𝙥𝙖𝙘𝙩𝙨 (𝙂𝙤𝙤𝙙𝙘𝙝𝙞𝙡𝙙 𝙖𝙣𝙙 𝙅𝙖𝙣𝙚𝙡𝙡𝙚, 2014). 𝙏𝙝𝙞𝙨 𝙞𝙣𝙛𝙤𝙧𝙢𝙖𝙩𝙞𝙤𝙣 𝙘𝙖𝙣 𝙗𝙚 𝙪𝙨𝙚𝙙 𝙩𝙤 𝙙𝙚𝙫𝙞𝙨𝙚 𝙚𝙛𝙛𝙚𝙘𝙩𝙞𝙫𝙚 𝙧𝙚𝙨𝙥𝙤𝙣𝙨𝙚 𝙨𝙩𝙧𝙖𝙩𝙚𝙜𝙞𝙚𝙨 𝙩𝙝𝙖𝙩 𝙘𝙖𝙣 𝙢𝙞𝙣𝙞𝙢𝙞𝙯𝙚 𝙩𝙝𝙚 𝙣𝙚𝙜𝙖𝙩𝙞𝙫𝙚 𝙞𝙢𝙥𝙖𝙘𝙩𝙨 𝙤𝙛 𝙙𝙞𝙨𝙖𝙨𝙩𝙚𝙧𝙨 𝙖𝙣𝙙 𝙥𝙧𝙤𝙩𝙚𝙘𝙩 𝙩𝙝𝙚 𝙡𝙞𝙫𝙚𝙨 𝙖𝙣𝙙 𝙡𝙞𝙫𝙚𝙡𝙞𝙝𝙤𝙤𝙙𝙨 𝙤𝙛 𝙩𝙝𝙤𝙨𝙚 𝙖𝙛𝙛𝙚𝙘𝙩𝙚𝙙 (𝘽𝙖𝙣𝙚𝙧𝙟𝙚𝙚 𝙖𝙣𝙙 𝘾𝙖𝙧𝙖𝙜𝙚𝙖, 2016).
𝘼𝙣𝙤𝙩𝙝𝙚𝙧 𝙖𝙧𝙚𝙖 𝙬𝙝𝙚𝙧𝙚 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙘𝙖𝙣 𝙗𝙚 𝙗𝙚𝙣𝙚𝙛𝙞𝙘𝙞𝙖𝙡 𝙞𝙨 𝙞𝙣 𝙩𝙝𝙚 𝙢𝙖𝙣𝙖𝙜𝙚𝙢𝙚𝙣𝙩 𝙤𝙛 𝙣𝙖𝙩𝙪𝙧𝙖𝙡 𝙧𝙚𝙨𝙤𝙪𝙧𝙘𝙚𝙨, 𝙨𝙪𝙘𝙝 𝙖𝙨 𝙬𝙖𝙩𝙚𝙧 𝙖𝙣𝙙 𝙛𝙤𝙧𝙚𝙨𝙩𝙨. 𝘽𝙮 𝙖𝙣𝙖𝙡𝙮𝙯𝙞𝙣𝙜 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖, 𝙞𝙩 𝙞𝙨 𝙥𝙤𝙨𝙨𝙞𝙗𝙡𝙚 𝙩𝙤 𝙪𝙣𝙙𝙚𝙧𝙨𝙩𝙖𝙣𝙙 𝙥𝙖𝙩𝙩𝙚𝙧𝙣𝙨 𝙤𝙛 𝙧𝙚𝙨𝙤𝙪𝙧𝙘𝙚 𝙪𝙨𝙚 𝙖𝙣𝙙 𝙞𝙙𝙚𝙣𝙩𝙞𝙛𝙮 𝙖𝙧𝙚𝙖𝙨 𝙬𝙝𝙚𝙧𝙚 𝙧𝙚𝙨𝙤𝙪𝙧𝙘𝙚𝙨 𝙖𝙧𝙚 𝙗𝙚𝙞𝙣𝙜 𝙤𝙫𝙚𝙧𝙚𝙭𝙥𝙡𝙤𝙞𝙩𝙚𝙙 𝙤𝙧 𝙪𝙣𝙙𝙚𝙧𝙪𝙩𝙞𝙡𝙞𝙯𝙚𝙙 (𝙇𝙤𝙣𝙜𝙡𝙚𝙮 𝙚𝙩 𝙖𝙡., 2015). 𝙏𝙝𝙞𝙨 𝙘𝙖𝙣 𝙞𝙣𝙛𝙤𝙧𝙢 𝙙𝙚𝙘𝙞𝙨𝙞𝙤𝙣-𝙢𝙖𝙠𝙞𝙣𝙜 𝙖𝙣𝙙 𝙧𝙚𝙨𝙤𝙪𝙧𝙘𝙚 𝙢𝙖𝙣𝙖𝙜𝙚𝙢𝙚𝙣𝙩 𝙨𝙩𝙧𝙖𝙩𝙚𝙜𝙞𝙚𝙨 𝙩𝙝𝙖𝙩 𝙖𝙧𝙚 𝙢𝙤𝙧𝙚 𝙨𝙪𝙨𝙩𝙖𝙞𝙣𝙖𝙗𝙡𝙚 𝙖𝙣𝙙 𝙚𝙦𝙪𝙞𝙩𝙖𝙗𝙡𝙚 (𝙂𝙤𝙤𝙙𝙘𝙝𝙞𝙡𝙙 𝙖𝙣𝙙 𝙅𝙖𝙣𝙚𝙡𝙡𝙚, 2014).
𝙄𝙣 𝙖𝙙𝙙𝙞𝙩𝙞𝙤𝙣 𝙩𝙤 𝙩𝙝𝙚𝙨𝙚 𝙥𝙧𝙖𝙘𝙩𝙞𝙘𝙖𝙡 𝙖𝙥𝙥𝙡𝙞𝙘𝙖𝙩𝙞𝙤𝙣𝙨, 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙘𝙖𝙣 𝙖𝙡𝙨𝙤 𝙥𝙡𝙖𝙮 𝙖 𝙧𝙤𝙡𝙚 𝙞𝙣 𝙖𝙙𝙫𝙖𝙣𝙘𝙞𝙣𝙜 𝙖𝙘𝙖𝙙𝙚𝙢𝙞𝙘 𝙧𝙚𝙨𝙚𝙖𝙧𝙘𝙝 𝙞𝙣 𝙛𝙞𝙚𝙡𝙙𝙨 𝙨𝙪𝙘𝙝 𝙖𝙨 𝙜𝙚𝙤𝙜𝙧𝙖𝙥𝙝𝙮, 𝙚𝙣𝙫𝙞𝙧𝙤𝙣𝙢𝙚𝙣𝙩𝙖𝙡 𝙨𝙘𝙞𝙚𝙣𝙘𝙚, 𝙖𝙣𝙙 𝙚𝙘𝙤𝙣𝙤𝙢𝙞𝙘𝙨. 𝘽𝙮 𝙖𝙣𝙖𝙡𝙮𝙯𝙞𝙣𝙜 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖, 𝙧𝙚𝙨𝙚𝙖𝙧𝙘𝙝𝙚𝙧𝙨 𝙘𝙖𝙣 𝙜𝙖𝙞𝙣 𝙣𝙚𝙬 𝙞𝙣𝙨𝙞𝙜𝙝𝙩𝙨 𝙞𝙣𝙩𝙤 𝙘𝙤𝙢𝙥𝙡𝙚𝙭 𝙨𝙮𝙨𝙩𝙚𝙢𝙨 𝙖𝙣𝙙 𝙥𝙧𝙤𝙘𝙚𝙨𝙨𝙚𝙨 𝙖𝙣𝙙 𝙘𝙤𝙣𝙩𝙧𝙞𝙗𝙪𝙩𝙚 𝙩𝙤 𝙖 𝙗𝙚𝙩𝙩𝙚𝙧 𝙪𝙣𝙙𝙚𝙧𝙨𝙩𝙖𝙣𝙙𝙞𝙣𝙜 𝙤𝙛 𝙩𝙝𝙚 𝙬𝙤𝙧𝙡𝙙 𝙖𝙧𝙤𝙪𝙣𝙙 𝙪𝙨 (𝘽𝙖𝙣𝙚𝙧𝙟𝙚𝙚 𝙖𝙣𝙙 𝘾𝙖𝙧𝙖𝙜𝙚𝙖, 2016).
𝙊𝙫𝙚𝙧𝙖𝙡𝙡, 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙞𝙨 𝙖 𝙫𝙞𝙩𝙖𝙡 𝙙𝙞𝙨𝙘𝙞𝙥𝙡𝙞𝙣𝙚 𝙩𝙝𝙖𝙩 𝙝𝙖𝙨 𝙩𝙝𝙚 𝙥𝙤𝙩𝙚𝙣𝙩𝙞𝙖𝙡 𝙩𝙤 𝙢𝙖𝙠𝙚 𝙖 𝙨𝙞𝙜𝙣𝙞𝙛𝙞𝙘𝙖𝙣𝙩 𝙘𝙤𝙣𝙩𝙧𝙞𝙗𝙪𝙩𝙞𝙤𝙣 𝙩𝙤 𝙩𝙝𝙚 𝙙𝙚𝙫𝙚𝙡𝙤𝙥𝙢𝙚𝙣𝙩 𝙖𝙣𝙙 𝙬𝙚𝙡𝙡-𝙗𝙚𝙞𝙣𝙜 𝙤𝙛 𝙎𝙧𝙞 𝙇𝙖𝙣𝙠𝙖. 𝘽𝙮 𝙞𝙣𝙫𝙚𝙨𝙩𝙞𝙣𝙜 𝙞𝙣 𝙩𝙝𝙚 𝙙𝙚𝙫𝙚𝙡𝙤𝙥𝙢𝙚𝙣𝙩 𝙤𝙛 𝙨𝙥𝙖𝙩𝙞𝙖𝙡 𝙙𝙖𝙩𝙖 𝙨𝙘𝙞𝙚𝙣𝙘𝙚 𝙘𝙖𝙥𝙖𝙗𝙞𝙡𝙞𝙩𝙞𝙚𝙨, 𝙎𝙧𝙞 𝙇𝙖𝙣𝙠𝙖 𝙘𝙖𝙣 𝙖𝙙𝙙𝙧𝙚𝙨𝙨 𝙨𝙤𝙢𝙚 𝙤𝙛 𝙞𝙩𝙨 𝙢𝙤𝙨𝙩 𝙥𝙧𝙚𝙨𝙨𝙞𝙣𝙜 𝙘𝙝𝙖𝙡𝙡𝙚𝙣𝙜𝙚𝙨 𝙖𝙣𝙙 𝙧𝙚𝙖𝙡𝙞𝙯𝙚 𝙞𝙩𝙨 𝙛𝙪𝙡𝙡 𝙥𝙤𝙩𝙚𝙣𝙩𝙞𝙖𝙡 𝙖𝙨 𝙖 𝙣𝙖𝙩𝙞𝙤𝙣.
Relevant answer
Answer
This is an excellent presentation about the usefulness of spatial data for a country, in this case, Sri Lanka. From there on, it is necessary to work out a program to develop spatial databases with the necessary data. Therefore, the most valuable item in this discussion is the data quality using specifications and the ability to control data compliance to the specifications. The digital elevation model (DEM) is an essential layer of the data, which must cover the country, and it is recommended to have a specified accuracy better than 15 cm.
  • asked a question related to Data Science
Question
5 answers
Hi all,
I am a practioner in LCA, and I am thinking how methodologies from data science, as for example data mining, machine learning, and artificial inteligence can be used in uncertainty analysis.
One possibility involves the development of simplified LCA models, based on the data, but that depends on the training datra. Does anyone have any otherr ideas?
Regards,
António Martins
Relevant answer
Answer
Aryan, thanks for the reference to the article
  • asked a question related to Data Science
Question
3 answers
Hi frds,
Looking for an European csv, xlsx time series regarding the daily temperature for at least the last decade.
Relevant answer
Answer
Have you looked at EOBS ?
It is my "go to" source for European met data.
Martyn
  • asked a question related to Data Science
Question
2 answers
Have you finished or are doing MS in Data Science?
Or you have designed or taught MS Data Science Courses?
Please share your
Courses names
and University / Country where you are doing / teaching MS
Relevant answer
Answer
Looking for?
  • asked a question related to Data Science
Question
3 answers
Can Big Data Analytics technology be helpful in forecasting complex multi-faceted climate, natural, social, economic, pandemic, etc. processes?
Industry 4.0 technologies, including Big Data Analytics technology, are used in multi-criteria processing, analyzing large data sets. The technological advances taking place in the field of ICT information technology make it possible to apply analytics carried out on large sets of data on various aspects of the activities of companies, enterprises and institutions operating in different sectors and branches of the economy.
Before the development of ICT information technologies, IT tools, personal computers, etc. in the second half of the 20th century as part of the 3rd technological revolution, computerized, partially automated processing of large data sets was very difficult or impossible. As a result, building multi-criteria, multi-article, big data and information models of complex structures, simulation models, forecasting models was limited or impossible. However, the technological advances made in the current fourth technological revolution and the development of Industry 4.0 technology have changed a lot in this regard. More and more companies and enterprises are building computerized systems that allow the creation of multi-criteria simulation models within the framework of so-called digital twins, which can present, for example, computerized models that present the operation of economic processes, production processes, which are counterparts of the real processes taking place in the enterprise. An additional advantage of this type of solution is the ability to create simulations and study the changes of processes fictitiously realized in the model after the application of certain impact factors and/or activation, materialization of certain categories of risks. When large sets of historical quantitative data presenting changes in specific factors over time are added to the built multi-criteria simulation models within the framework of digital twins, it is possible to create complex multi-criteria forecasting models presenting potential scenarios for the development of specific processes in the future. Complex multi-criteria processes for which such forecasting models based on computerized digital twins can be built include climatic, natural, social, economic, pandemic, etc. processes, which can be analyzed as the environment of operating specific companies, enterprises and institutions.
In view of the above, I address the following question to the esteemed community of researchers and scientists:
In forecasting complex multi-faceted climate, natural, social, economic, pandemic, etc. processes, can Big Data Analytics technology be helpful?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
Relevant answer
Answer
Dear Dariusz
The simple answer is YES!
The problem, however, is that the analytics are imperfect...
It is necessary to understand how human intuition (expert intuition) works... I am convinced that understanding the mechanisms of intuition holds great potential for improving analytics and forecasting...
My own research shows that nature has found genius ways to deal with radical uncertainty with limited resources...
Yurii
  • asked a question related to Data Science
Question
4 answers
Model explainability is a priority in today's data science community. Many methods emerge to extract concise logic from the black-box models. For instance, SHAP is a model-agnostic method which evaluate average marginal contribution of a feature value over all possible coalitions via shapely value. LIME builds sparse linear models around each prediction to explain how the black box model works in that local vicinity... There are other methods to explain the machine learning model. What are the pros and cons of each method? Are there any tutorials or comparison of such methods? Are there any problems to be solved in the future? :)
Relevant answer
Answer
Attribution based techniques (LIME, SHAP, GradCAM, LRP):
-----------------------------------------------
They highlight the importance of an input attributes for a decision, e.g., for a job recruitment scenario they tell you how much an input attribute like "age=80" contributes to the decision "hire applicant".
+ Easy to use
+ The explanations are easy to understand
+ Very commonly used
- Some techniques have issues (non-robust etc.)
- They are not a full explanation. For example, in image processing, the explanation might highlight all pixels of the person's face, indicating that these pixels contributed to the image being classified as "person". However, it does not tell you, whether the decision is due to the person's skin color, shape of her face, wrinkles the person might have in her face (texture).
- They tell you nothing about the interdependence of input attributes.
Surrogate models
---------------------------
(approximating a complex model with a simple model that you can understand, e.g. a deep learning model with a decision tree).
+ Some techniques allow to identify rules that allow to understand model behavior more broadly, such as interdependence among input attributes.
0 The explanations are not so easy to understand.
0 Less common (than attribution)
- The explanations can be infidel (not capture the actual model behavior you want to understand), since they rely on an approximation, which can be wrong (Note that many explainability techniques internally use some form of surrogate models to compute explanations, but they do not expose these models to users. Fore example, LIME also does local approximation of a model, but it only extracts feature importance / attribution scores and does not show the approximate model to a user).
Example-based methods
---------------------------------------
Explain a decision by showing what samples in the training data "caused" the prediction:
+ Easy to make sense of
0 Less common (than attribution)
- You need the training data to explain
Concept-based methods
---------------------------------------
They aim to overcome the issue of attribution-based methods by identifying concepts that a network uses. They are common in computer vision. For example, in the face example they might tell you that the color is not relevant[1]. They might also allow to visualize patterns/concepts that a single neuron reacts to.
- Not so easy to understand
0 Less common (than attribution)
+ Provide more insights than attribution (in practice, they are complimentary, e.g., it makes sense to use both).
I am not aware of any good tutorials that compare pros and cons though there is some much work that I might have overlooked them. You might check one of the many survey articles to get a better understanding, e.g. [2].
[1] Schneider, J., & Vlachos, M. (2022). Explaining classifiers by constructing familiar concepts. Machine Learning, 1-34.
[2] Meske, C., Bunde, E., Schneider, J., & Gersch, M. (2022). Explainable artificial intelligence: objectives, stakeholders, and future research opportunities. Information Systems Management, 39(1), 53-63.
  • asked a question related to Data Science
Question
7 answers
I have i have I havea dataset likethat :
users T1 T2 … Tn
1 [1,2,1.5] [1,3,3] … [2,2,6]
2 [1,5,1.5] [1,3,4] … [2,8,6]
.
n [1,5,7.5] [5,3,4] … [2,9,6]
Given that lists are distinct incident change by time.
My aim to find distinct incidents which might happen to users by time.
I thought of feeding the full dataset to clustering algorithms , but I need an advice from you about best algorithms to fit such 2D dataset or best approach to follow in solving this problem
Relevant answer
Answer
K-MDTSC: K-Multi-Dimensional Time-Series Clustering Algorithm.
  • asked a question related to Data Science
Question
2 answers
Hi hi hello:)
I am looking for research topic for my final project which should be a combination of social sciences and data science (Maybe some natural language processing?). I’m studying big data management, I know python, mySQL, web scraping. My professor is interested in OSINT tools/Kali Linux. I think that Maltego and Maltego-like tools are cool but I don’t know what to analyze exactly:( Do you have any ideas? I’m also very interested in archaeological databases, especially ancient Egypt..
Thank you for your ideas
Relevant answer
Answer
Classifying malware using XML logs might be an interesting project that combines some of your skills and interests. I did something like this in a class project a few years back, and it was a good opportunity to show off some natural language processing as well as play around with supervised learning algorithms and even semi-supervised learning.
Here is the Kaggle competition for the class I took:
However, I am sure that malware has advanced considerably since then, so you can probably look into more up-to-date forms of malware if you find this topic of interest.
  • asked a question related to Data Science
Question
3 answers
Hi,
I'm developing a Deep Q learning Model in my research.
Is it recommended to use LSTM in Deep Q learning model rather than using ANN ?
Relevant answer
Answer
Hey Bashar,
Does your Q-function (model/network) take as input a single action/state pair or a sequence of actions/states? If you are working with input signals that are "sequential" (indexed by t, t+1, t+n), then maybe an LSTM (or Bi-LSTM) is better for taking the function of a Q-Table.
Sorry for not being able to help more, but that is all I can do with so little info. What kind of task are you trying to use Deep RL to solve?
  • asked a question related to Data Science
Question
1 answer
what points should i consider to select data science research topics as well.
Relevant answer
Answer
Read Read and Read some more. Find something that YOU want to find the answer to because this gives you motivation and also in this area a methodology that you don't have to create entirely by yourself. Plenty of time for that later. Good luck David Booth
  • asked a question related to Data Science
Question
4 answers
I done my 4 year BS in mathematics.
Now i enrolled in Master in Data science.
I am new with this field guide me for research in my degree. Tell the popular areas of data science where i can research where my previous degree help me?? .
how to find problem of that area what approach for solution in general.
Relevant answer
Answer
Dear Muhammad Arslan,
There is a great deal of potential for the application of mathematics in the field of Data Science and Big Data Analytics. Particularly many applications of the interdisciplinary combination of these fields of knowledge arise when we add to these issues the use of specific, selected ICT information technologies and Industry 4.0. In the situation of the combination of all these fields of knowledge, analytics and technology, we obtain enormous possibilities for creating various multi-criteria forecasting models that make it possible to predict the development of complex, multi-faceted social, economic, financial, natural, environmental, climatic and other processes, in the course of which large amounts of data are acquired and processed. In a situation where new generation computers with high computing performance are available, this type of analysis based on the acquisition and analysis of large data sets can be carried out in real time. The data to be extracted can come from various sources, various institutions, companies, other businesses and selected websites. I have described this kind of analytics in my articles, which have the term Big Data in the title. These articles are available on my profile of this Research Gate portal. I am open to scientific cooperation in this developmental and forward-looking topic.
Regards,
Dariusz Prokopowicz
  • asked a question related to Data Science
Question
1 answer
Hello,
I am creating a proposal for my MSc Dissertation. My research area will be data science and the topic is student attention tools for online classes based on machine learning. I am just wondering under which category my research could be categorized. Is that a quantitative one or a qualitative one?
I am really thank-full for your feedback
Relevant answer
Answer
First identify the data and analysis result of data. see if both are
quantitative or a qualitative.
  • asked a question related to Data Science
Question
10 answers
Hello all,
currently I'm studying Business Informatics and next month I'm going to start writing my Masterthesis. For this reason, I am searching for a topic that I can do well within the framework of my thesis. I have done a bit of research on this and still haven't found anything. However, one topic that has been on my mind for a long time is: Potential and Challenges of Citizen Development. Unfortunately, however, there is not enough literature on this topic, so there is always the fear of whether the topic is suitable for a Master's thesis. Accordingly, I would like to ask you first what you think about this topic and whether it is possible to narrow it down further if the topic is too general. I would also be very grateful if you could recommend some literature for me to look at.
In addition, I would be very grateful for further suggestions for topics. I am particularly interested in the topics "Digital Transformation", "Data Science" or "Process Management". However, since these topics are broad, can you suggest me a specific topic in this context.
Thank you in advance.
Best regards
Hamza Mehmud
Relevant answer
Answer
Dear Hamza MehmudHamza Mehmud,
You have chosen a very good and developmental subject for your master's thesis. Taking into account your scientific interests, I propose the following topic for your thesis: Improving computerized analytical techniques based on Business Intelligence solutions and the use of Big Data Analytics and other Industry 4.0 technologies as instruments of business informatics used in the analysis of complex, multi-criteria social, economic and other processes. This topic can be implemented in the context of the challenges of civic development and taking into account such issues as improving process management processes, increasing the efficiency of multi-criteria processing of large amounts of data, Data Science using the achievements of the currently ongoing digital transformation, increasing the scale of digitization of remote communication processes and economic processes implemented in economic entities and new technologies and digital innovations typical of the current fourth technological revolution.
Best regards,
Dariusz Prokopowicz
  • asked a question related to Data Science
Question
4 answers
At a time when statistics is at the core of data science, this is done without any explanation.
Relevant answer
Answer
Dear Professor David Morse
You are right. It is about my institution, Faculty of Economics in Prilep, Republic of North Macedonia. Recently, the accreditation board at the Ministry of Education of the Republic of North Macedonia, where, unfortunately, we also have a member from our institution, reaccredited the three study programs of the first cycle (undergraduate studies): accounting and auditing, international economics and banking and finance where in their structure, the subject statistics for economists is an optional subject. Namely, the Accreditation Board reaccredits reports on study programs that are prepared and submitted by the higher education institutions themselves. Unfortunately, for subjective reasons, the subject of statistics for economists, for the first time, was included in the list of optional subjects and not mandatory as it was until now. I knew that we are unique in this in the world, but I still asked for help and information here at RG, if there is any other institution where this is the case. I teach statistics for economists, and I honestly know what students will lose if they don't listen. Thank you professor for your answer.
Sincerely, Кosta.
  • asked a question related to Data Science
Question
9 answers
Well,
I am a very curious person. During Covid-19 in 2020, I through coded data and taking only the last name, noticed in my country that people with certain surnames were more likely to die than others (and this pattern has remained unchanged over time). Through mathematical ratio and proportion, inconsistencies were found by performing a "conversion" so that all surnames had the same weighting. The rest, simple exercise of probability and statistics revealed this controversial fact.
Of course, what I did was a shallow study, just a data mining exercise, but it has been something that caught my attention, even more so when talking to an Indian researcher who found similar patterns within his country about another disease.
In the context of pandemics (for the end of these and others that may come)
I think it would be interesting to have a line of research involving different professionals such as data scientists; statisticians/mathematicians; sociology and demographics; human sciences; biological sciences to compose a more refined study on this premise.
Some questions still remain:
What if we could have such answers? How should Research Ethics be handled? Could we warn people about care? How would people with certain last names considered at risk react? And the other way around? From a sociological point of view, could such a recommendation divide society into "superior" or "inferior" genes?
What do you think about it?
=================================
Note: Due to important personal matters I have taken a break and returned with my activities today, February 13, 2023. I am too happy to come across many interesting feedbacks.
Relevant answer
Answer
It is just coincidental
  • asked a question related to Data Science
Question
3 answers
Imbalanced dataset is a common problem in data science; however some approaches have been used on the dataset such as over and undersampling methods as well as boosting algorithm (for traditional machine learning approach) like adaboost. But is there any deep learning framework that will not be bias with imbalanced dataset?
Relevant answer
Answer
You may need to explore the framework of XGboost, Gradient boosting, Lightgbm, catboost, random forest and even svm.
  • asked a question related to Data Science
Question
7 answers
Looking for academicians and industry people to collaborate on Artificial intelligence, Machine Learning, Data Science, Cyber Security and Robotics for new peer reviewed Journals. If interested to join as Editorial board members or contribute as authors please message me or mail to rwinston@imanagerpublications.com
#academia #academicpublishing #industry #artificialintelligence #machinelearning #datascience #cybersecurity #Robotics
Relevant answer
Answer
Arpita Maheriya She is asking for collaboration. She is not asking you to suggest journals or anything. I think you misunderstood it. Renisha Winston correct me, if I am wrong.
  • asked a question related to Data Science
Question
8 answers
I am currently undertaking a computer science MSc and have been trying to find topics of study for the my final research dissertation which would be interesting for me and I am struggling and am looking for right directions.
My interests through work are mainly cloud computing (Azure, AWS) and Data science. I am really struggling to find a topic in any of these two domains.
Any suggestions and topics related my interested domains are greatly appreciated.
Relevant answer
Answer
The issue of cloud database security
  • asked a question related to Data Science
Question
3 answers
Currently, data is available in forms of text, images, audio, video and other such forms.
We are able to use mathematical and statistical modeling for identifying different patterns and trends in data which can be used through machine learning which is a A.I's subsidiary for performing different decision making tasks. The data can be visualized in variety of forms for different purposes.
Data Science is currently the ultimate state of Computing. For generating data we have hardware, software, algorithms, programming, and communication channels.
But, what could be next beyond this mere data creation and manipulation in Computing?
Relevant answer
Answer
I understand manipulation of data from the past. How will we manipulate data from the future? Further the analysis of data seems constrained by science and mathematics development. David Booth
  • asked a question related to Data Science
Question
4 answers
Hi!
I'm currently working on a Data Science project for optimizing the prices of the products one of the biggest supermarket chains in Mexico.
One of the things that we are working on, is finding the price elasticity of demand of such products. What we usually do, is that, apart from fitting an XGBoost model for predicting sales, we fit a linear regression, and we get the elasticity from the coefficient corresponding to the price (the slope).
However, it is abvious that linear regression is sometimes a poor fit for the data, not to mention that the execution times are way longer since it requires to run separately XGBoost and LR (which is not good considering that there are thousands of products to model).
Because of this, it ocurred to me that we could use numerical differentiation for finding the price elasticity. At last, calculating a numerical derivative is way faster than fitting another model.
However, I'm not sure if this is mathematically correct, since the data does not come from a function.
So the question would be, is this mathematically correct? Does it make sense?
Relevant answer
Answer
From my experience, I would not use a linear predictor for elasticity. I know your question is asking about mathematically correct solutions but I don't think that's your issue. I would probably use logistic regression as a first choice.
  • asked a question related to Data Science
Question
23 answers
Dear statistic experts
I am developing a model to predict the behavior of around 30000 data. I use 2 different approaches to calculate the R2 and each one gives a completely different value.
The first approach: R2 = SSR/SST = 0.95
Whereas the second approach: R2=1-SSE/SST= 0.00
where SSR is Sum Square Regressions, SST is the Sum Squared Total, and SSE is Sum Squared Errors.
Any comment is highly appreciated.
Cheers,
Bahman
Relevant answer
Answer
Bahman Daneshian , said "Should I select many random selections and check the R2 for them?"
Do you mean repeat the same model with bootstrap samples to get things like intervals for your estimates? or do you mean check all subsets of X (and their interactions) and report all these results (a multiverse) or those which optimize something (which depending on research question it might be better to use something like the lasso)? I guess the question is what are you planning on using the R^2 estimate for?
  • asked a question related to Data Science
Question
8 answers
One of my master students is currently conducting a preliminary study to find out the maturity of the Cross Industry Standard Process for Big Data (CRISP4BigData) for use in Big Data projects. I would like to invite all scientists, Big Data experts, project managers, data engineers, data scientists from my network to participate in the following survey. Feel free to share!
Relevant answer
Answer
Done
  • asked a question related to Data Science
Question
2 answers
What is the main disadvantage of a global optimization algorithm for the Backpropagation Process?
Under what conditions can we still use a local optimization algorithm for the Backpropagation Process?
Relevant answer
Answer
Armin Hajighasem Kashani Non-linear data may be simply handled and processed using a neural network that is otherwise difficult in perceptron and sigmoid neurons. In neural networks, the agonizing decision boundary problem is reduced.
However, the downsides include the loss of neighborhood knowledge, the addition of more parameters to optimize, and the lack of translation invariance.