Science topic

Predictive Modeling - Science topic

Explore the latest questions and answers in Predictive Modeling, and find Predictive Modeling experts.
Questions related to Predictive Modeling
  • asked a question related to Predictive Modeling
Question
6 answers
  • Machine Learning Integration: AI enhances predictive modelling in biostatistics.
  • Big Data Analytics: Analysing massive datasets improves disease detection.
  • Bayesian Statistics: Prior knowledge refines personalized treatment analysis.
  • Bioinformatics Fusion: Genomic data analysis reveals disease mechanisms.
  • Predictive Modeling: Forecasts guide personalized healthcare and treatments.
  • Cloud Computing: Enables real-time, scalable data sharing solutions.
  • Data Privacy Focus: Compliance ensures secure biomedical data management.
Relevant answer
Answer
Dear Anitha Roshan Sagarkar – thank you for giving me a moment of amusement. Bruce Weaver – I guess your response was the best out of two, which is some kind of distinction…
I do have to tell you, however, Anitha Roshan Sagarkar , that researchgate metrics are no longer influenced by posting nonsense questions. There was an epidemic of this – people using ChatGPT even, when they were clearly not clever enough to think of their own questions, if you can imagine this. As a result of this epidemic of junk questions, RG eliminated their metric that was based on the quality of your community participation. I'm sure you will be disappointed by this, but in the end the forum became overrun by people – often operating as a cartel – who were recommending each other's rubbish questions and answers.
So you can stop now, long story short.
  • asked a question related to Predictive Modeling
Question
6 answers
As engineering and material science increasingly adopt data-driven approaches, I am intrigued by the rapid advancements in supervised Machine Learning (ML) and Deep Learning (DL). These tools have proven transformative, but I am eager to explore a question at the heart of innovation in this space:
What are the most recent inventions, techniques, and tools in AI that are driving meaningful improvements in predictive modelling?
Specifically, I am inviting AI researchers and practitioners to share insights into:
  1. Breakthrough innovations in algorithms or architectures that have emerged recently and are demonstrating real-world impact.
  2. New feature extraction or data mining techniques that enhance model performance, especially in multidisciplinary fields like engineering and material science.
  3. Practical strategies for improving the accuracy and robustness of predictive models, whether through data preprocessing, hyperparameter tuning, or novel methodologies.
In my concrete durability and sustainability research, I aim to leverage AI tools for academic insights and to provide actionable solutions for designing better, longer-lasting materials. To achieve this, it is critical to understand and integrate the most cutting-edge tools available today. For example:
  • What are the emerging trends in handling complex or imbalanced data in engineering applications?
  • How are advancements in explainable AI helping bridge the gap between model predictions and practical decisions?
  • Are there innovative ways to adapt state-of-the-art techniques like graph neural networks or transformer models for real-world engineering challenges?
I am curious to hear from the community:
  • What recent advancements in AI have you found most impactful for improving model performance?
  • Do specific feature extraction, data augmentation, or optimization techniques stand out?
  • What innovations do you see shaping the future of predictive modelling in multidisciplinary fields?
This is not just a call to share tools or techniques. It is an invitation to discuss how these advancements can be meaningfully applied to solve practical problems in engineering and beyond. I look forward to hearing about your experiences, discoveries, and perspectives on the evolving role of AI in research and practice.
Let’s connect and explore how we can drive innovation with purpose and impact!
Relevant answer
Answer
Arguments for genuine advancement:
  • Accelerated discovery: ML algorithms can sift through vast datasets of experimental and computational results, identifying patterns and trends that might be missed by human researchers. This can significantly speed up the process of discovering new materials with desired properties.
  • Improved material design: ML models can be used to predict the properties of materials based on their composition and structure, enabling researchers to design new materials with tailored characteristics.
  • Enhanced materials characterization: ML techniques can analyze complex experimental data, such as microscopy images and spectroscopy results, to extract valuable insights into the microstructure and properties of materials.
  • Optimization of manufacturing processes: ML can be used to optimize manufacturing processes, such as 3D printing and chemical synthesis, leading to improved efficiency, reduced costs, and enhanced product quality.
Arguments for chasing trends:
  • Hype and overpromising: The field of ML in materials science is sometimes characterized by hype and overpromising, with some claims of revolutionary breakthroughs that may not be fully substantiated.
  • Data limitations: Many ML algorithms rely on large datasets of high-quality data, which can be challenging to obtain in materials science, where experiments can be time-consuming and expensive.
  • Interpretability challenges: Some ML models, such as deep neural networks, can be difficult to interpret, making it challenging to understand how they arrive at their predictions and to gain insights into the underlying physics of materials.
  • Limited adoption: Despite the potential benefits, ML techniques are not yet widely adopted in the materials science community, with many researchers still relying on traditional methods.
Conclusion:
The use of machine learning in materials science is a promising area of research with the potential to revolutionize the field. However, it's important to approach this field with a critical and nuanced perspective, recognizing both the potential benefits and the challenges. By addressing the limitations and fostering interdisciplinary collaboration, researchers can harness the power of ML to accelerate materials discovery and innovation.
Here are some additional resources that you might find helpful:
  • asked a question related to Predictive Modeling
Question
1 answer
Artificial Intelligence (AI) is rapidly transforming industries across the globe, from healthcare and finance to education and entertainment. In particular, the fields of research and writing have been significantly impacted by AI’s capabilities. AI tools are helping researchers and writers optimize their workflows, boost productivity, and improve the quality of their work. Whether it's generating ideas, improving grammar, or assisting in data analysis, AI has revolutionized how research papers are written, edited, and managed.
In academic writing, AI tools are particularly beneficial for both novice and experienced researchers. These tools can assist in various stages of writing, from brainstorming topics to generating drafts, all the way to final editing. By automating repetitive tasks, AI allows researchers to focus more on the critical aspects of their work, such as data analysis, hypothesis formulation, and scholarly discourse. In this article, we will explore the role of AI in writing research papers and the tools that are transforming the research process.
AI Writing Tools for Research Papers
One of the most popular and impactful uses of AI in academic writing is through AI-powered writing tools. These tools help researchers improve their writing by offering grammar and style suggestions, sentence structure enhancements, and readability improvements. Some of the most widely used AI writing tools include Grammarly, Hemingway, and ProWritingAid.
  • Grammarly: This tool is a widely recognized writing assistant that not only corrects grammar and spelling mistakes but also offers suggestions for improving sentence structure and clarity. For researchers, this can be especially helpful when drafting lengthy research papers or manuscripts. Grammarly’s AI-powered suggestions ensure that writing is clear, concise, and free of grammatical errors, which is crucial in academic writing.
  • Hemingway: Hemingway is another AI tool that focuses on improving readability. It highlights overly complex sentences, passive voice, and adverbs that can make writing harder to follow. For researchers, Hemingway helps make technical writing more accessible, ensuring that the paper is easy to understand, even for readers outside of the specific academic field.
  • ProWritingAid: This tool is an all-in-one writing assistant that checks for grammar issues, style problems, and readability concerns. It also offers in-depth reports on writing style, including suggestions for sentence structure, word choice, and overall flow. For academic researchers, ProWritingAid can help refine research papers, dissertations, and articles by offering suggestions that improve both technical accuracy and overall readability.
In addition to improving grammar and style, these AI tools can also assist researchers in generating ideas and outlining papers. For instance, AI-powered systems like QuillBot can help researchers paraphrase text, generate content ideas, and even suggest keywords or phrases related to specific research topics. Some advanced AI systems can also help in suggesting relevant literature by analysing existing research and identifying gaps in the literature that can be explored in new papers.
AI in Literature Reviews
A literature review is an essential part of any research paper, dissertation, or thesis. It involves reviewing existing research to identify gaps, trends, and key findings that inform the research project. AI tools have significantly simplified the process of conducting literature reviews by enabling researchers to analyse vast databases of academic papers and summarize relevant findings.
  • AI-Powered Literature Review Tools: AI tools like Iris.ai and Ref-N-Write are designed to help researchers conduct comprehensive literature reviews. These tools can analyse large datasets of academic papers, identify key research trends, and summarize findings from multiple sources. Iris.ai, for example, uses Natural Language Processing (NLP) to understand the context of research papers and match them with the researcher’s topic. This reduces the time researchers spend manually searching for relevant studies and helps them identify papers they might otherwise have missed.
  • Literature Analysis and Trend Identification: AI systems can also help researchers track trends in the literature over time. By analysing research papers, AI tools can identify recurring themes, methodologies, or findings across multiple studies. This is particularly useful when conducting a systematic review of the literature, as AI can pinpoint trends or contradictions in existing research. AI can help organize the literature review, ensuring that it follows a logical structure, identifies key research gaps, and highlights major contributions in the field.
By automating the literature review process, AI tools can save researchers hours of manual work, allowing them to focus on synthesizing the findings and developing new research questions.
AI-Powered Research Assistance
AI is not only transforming the writing and review process but also plays a significant role in managing the research process itself. Tools like Zotero, EndNote, and Mendeley are widely used by researchers to manage citations, references, and research materials efficiently.
  • Zotero, EndNote, and Mendeley: These AI-powered reference management tools help researchers organize and store their sources, making it easier to create citations and bibliographies. These tools can automatically extract citation information from academic papers, articles, and books, significantly reducing the time spent formatting references manually. They also allow researchers to create collections of research materials, organize them by topic, and search for specific sources based on keywords or tags.
  • AI in Data Analysis: AI has also found its way into data analysis. Research tools like R and Python use AI to analyse large datasets, identify patterns, and generate insights. For example, R’s integration with AI libraries allows researchers to run complex statistical analyses with ease. Python’s AI libraries, such as TensorFlow and Scikit-learn, can be used for machine learning and predictive modelling, making them invaluable for researchers in fields like bioinformatics, economics, and social sciences.
AI-powered tools help researchers save time by automating tedious tasks like managing references, analysing data, and formatting papers. This allows them to focus on higher-level tasks like formulating hypotheses, designing experiments, and interpreting results.
AI’s Role in Writing Assistance
AI tools are also making strides in assisting researchers with the writing process itself. OpenAI’s GPT-3, for example, is an AI model capable of generating human-like text. GPT-3 can assist researchers by generating drafts, suggesting sentence structures, or even creating content for specific sections of a research paper.
  • GPT-3 for Draft Generation: GPT-3 can help researchers generate initial drafts for research papers, saving time in the early stages of writing. By providing a starting point for academic writing, GPT-3 can help researchers overcome writer’s block and get the process moving.
  • Improving Clarity and Coherence: GPT-3 and similar AI tools can also assist in improving the clarity and coherence of academic writing. These tools can suggest rewording sentences for better readability, removing redundant phrases, and ensuring that the writing is cohesive and logically structured. This is particularly valuable in academic writing, where clarity and precision are paramount.
However, researchers should be cautious when using AI tools like GPT-3. While these tools can be incredibly helpful in drafting content and improving readability, they may not fully understand the nuances of academic writing, particularly in specialized fields. Researchers should always review the generated content and ensure that it aligns with the specific requirements of their research.
Conclusion
The integration of AI in the research and writing process has revolutionized how academic work is completed. AI tools offer significant benefits, including improved writing quality, faster research, better organization, and enhanced productivity. Tools like Grammarly, Hemingway, Zotero, and GPT-3 can help researchers with everything from generating content to managing references, performing data analysis, and writing literature reviews.
While AI tools can greatly enhance the efficiency and quality of academic writing, it is important for researchers to retain control over their work. AI should be seen as an assistant, not a replacement, for human intelligence. The integrity of research and writing still depends on the researcher’s expertise, judgment, and understanding of the subject matter.
Researchers are encouraged to embrace AI tools for the efficiency they offer but to always ensure that their work maintains its originality, rigour, and scholarly value. If you're looking for personalized support in your research and writing journey, consider reaching out to Hamza Omullah for expert guidance, coaching, and consulting services.
Explore AI writing tools today and share your experiences with them! For personalized research and writing assistance, feel free to reach out to Hamza at HAMNIC Solutions.
#ResearchConsultant #WritingSupport #AcademicWriting #ResearchHelp #ContentCreation #ProfessionalWriting #ThesisWriting #DissertationHelp #AcademicSuccess #ClientSuccess #hamnicwriting #hamnicsolutions #AI
Relevant answer
Answer
Artificial Intelligence (AI) is transforming research paper writing by automating various tasks and enhancing efficiency. Key roles include:
Idea Generation: Tools like GPT assist in brainstorming research questions and hypotheses.
Literature Review: AI-powered search engines and summarization tools help quickly identify relevant studies and extract insights.
Drafting and Editing: AI tools streamline writing, suggest improvements, correct grammar, and ensure clarity.
Data Analysis: AI algorithms facilitate the analysis of complex datasets, generating results for inclusion in papers.
Plagiarism Detection: AI ensures originality by detecting and preventing plagiarism.
Formatting and Citations: Automates compliance with journal-specific guidelines and referencing styles.
While AI boosts productivity and reduces effort, ethical considerations around over-reliance and potential biases remain critical.
  • asked a question related to Predictive Modeling
Question
26 answers
The Schrödinger equation, as foundational as it is in quantum mechanics, fails to adequately describe the true nature of quantum particle motion, as demonstrated by my recent research ( DOI: 10.9790/4861-1505012633). This raises the critical question: What alternative frameworks can we use to better understand quantum mechanics? Given that the current models have been proven insufficient, it becomes crucial to explore different ways to model the behavior of particles. Can we ever truly predict the motion of quantum particles accurately, considering our limited understanding of both the inner workings of matter (with medical science only scratching the surface of human biology) and the physical universe (with only 7% of the observable matter understood)? The complexities of quantum motion are so vast, and our scientific knowledge so constrained, that predicting exact particle states may well remain unattainable.
This query invites the ResearchGate community to propose alternatives to the Schrödinger equation. What would a new model look like, and can we develop a formula that predicts quantum motion in a way that better aligns with the complexities and limitations of our current knowledge? Given the undeniable limitations of modern science, can we ever predict the precise mechanics of a particle? Your insights and suggestions are welcome.
Relevant answer
Answer
Alan Dennis Clark Jesse Daniel Brown & Research Gate Community
Dear Colleagues,
Thank you for your continued engagement in this profound discussion. After a thorough review of the extensive dialogues on ResearchGate, including the insights from Alan Dennis Clark and Jesse Daniel Brown, as well as excerpts from my paper titled "Schrödinger Equation unfit for fundamental law" (DOI: 10.9790/4861-1505012633), I aim to provide a response as per below that addresses all raised concerns and integrates perspectives from various research collaborations:
1. Limitations of the Schrödinger Equation
The Schrödinger equation has been foundational in quantum mechanics, offering a framework for understanding quantum systems. However, several limitations have been identified:
  • Real-Time Predictions: The referenced paper argues that the Schrödinger equation, being a second-order linear differential equation, is inadequate for describing the real-time motion of quantum particles, particularly when considering waveforms like square waves. This suggests a fundamental limitation in predicting quantum particle behavior in real-time.
  • Relativistic Constraints: The equation does not account for relativistic effects, making it inadequate for particles moving at speeds comparable to light.
  • Quantum Scarring: Phenomena such as quantum scarring, where quantum eigenstates exhibit enhanced probability densities along classical periodic orbits in classically chaotic systems, are not explicitly predicted by the Schrödinger equation.
2. Alternative Frameworks and Models
o address these limitations, several alternative and extended frameworks have been proposed:
  • Dirac Equation: A relativistic wave equation that accounts for spin-½ particles, providing a more comprehensive description of fermions.
  • Quantum Field Theory (QFT): A theoretical framework that extends quantum mechanics to fields, accommodating particle creation and annihilation processes.
  • De Broglie–Bohm Theory: Also known as the pilot-wave theory, it introduces deterministic trajectories for particles guided by a wave function, offering an alternative interpretation of quantum mechanics.Wikipedia
  • Quantum Geometry: Recent advancements, such as those by Carolina Figueiredo, propose frameworks where quantum events emerge from abstract geometric structures beyond traditional space-time, potentially addressing phenomena like quantum scarring.
3. Integrating Discrete Models and Computational Approaches
The concept of a discrete, frame-by-frame universe, as discussed in the paper "Integration and Refinement of Digital Physics, Unifying Quantum and Classical with a Calculation: A Formal Approach to Subparticles and Discrete Universe Frames," suggests a model where the universe is rendered in discrete frames, potentially offering explanations for phenomena like instantaneous communication transfer. While this approach is intriguing, it is essential to recognize that the Schrödinger equation, as a continuous differential equation, has been extensively validated through experimental results and remains a cornerstone of quantum mechanics. Discrete models must demonstrate empirical success and predictive power comparable to the Schrödinger equation to be considered viable alternatives.
While the Schrödinger equation has been instrumental in the development of quantum mechanics, its limitations in certain scenarios necessitate the exploration of alternative models and frameworks. Integrating insights from quantum geometry, discrete models, and other theoretical advancements can provide a more comprehensive understanding of quantum phenomena. It is through the synthesis of these diverse perspectives that we can aspire to develop a unified theory capable of accurately describing the complexities of the quantum realm.
I look forward to further discussions and collaborative explorations on this topic.
Best regards,
Sandeep Jaiswal
  • asked a question related to Predictive Modeling
Question
6 answers
“Does anyone have any reference or methodology for projecting an Urban Heat Island (UHI) layer into the future using GIS? I am interested in techniques that combine satellite data and predictive modeling tools to project changes in urban surface temperature. I welcome any input or related studies.”
Att:
Jhonsy O. Silva-López
Relevant answer
Answer
How about Trend Analysis and Extrapolation Methodology Using Statistical Analysis?
With this method, you’ll have to use the UHI data from 2010 and 2020 to calculate the average annual increase (or change) in UHI. You can apply linear regression to model the trend and project future UHI values for 2030.
  • asked a question related to Predictive Modeling
Question
4 answers
What is the basis for determining both H and K in the ARIMA models in determining the best predictive model?
Relevant answer
Answer
Let us first suppose that the series can be considered as generated by a stationary process. I recommend using the ACF and the PACF together with ARMA model fitting: detect if they can be seen as being truncated after a small lag k. If ACF seems better truncated than PACF, try a MA(k) model, so with k parameters. If PACF seems better truncated than ACF, try a AR(k) model, so with k parameters. Then fit the model that was selected and look at the ACF and PACF of the residuals. Repeat the procedure by combining the new conclusion with the previously selected model. For instance, if an AR(1) was first selected and fitted, and its residuals show a truncated ACF after lag 2, then go for an ARMA(1, 2) model. You can then omit insignificant parameters one by one. Very frequently, only 2 or 3 iterations of that procedure are enough. When looking at truncation, do not take into account (partial) autocorrelations smaller in absolute value than 2/sqrt(n), where n is the series length and lags that are big and for which you cannot find an explanation. And do not forget that autocorrelations are sensitive to outliers. For instance, a big autocorrelation at lag 7 can be explained by two outliers separated by 7 time units.
  • asked a question related to Predictive Modeling
Question
1 answer
I am looking for collaborators to work on a research project based on prediction models of tomato yield losses caused by tomato brown rugose fruit virus (ToBRFV). The study is being carried out in Mexico.
More information by this way, or via mail.
Relevant answer
Answer
Hola, a mi me interesa.
  • asked a question related to Predictive Modeling
Question
2 answers
I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points.
I would like to know how to approach building a predictive model given this incomplete dataset. Specifically:
  1. What methods can I use to handle the missing data for the remaining 10 points? Are there any standard techniques or best practices for dealing with missing data in such scenarios?
  2. How can I effectively incorporate the data from the 10 points I do have into the model? What strategies can I employ to ensure that the available data is utilized efficiently to make accurate predictions?
  3. Are there specific techniques or models that can help in making predictions despite having incomplete data? I am interested in methods that can manage and leverage incomplete data effectively.
Relevant answer
Answer
I would add to Ivan's already sufficient answer that in order to do the (linear?) interpolation, you have to be sure that those 20 influencing points follow the same "law", i.e. that interpolation as such is indeed possible.
Other than the concept of "interpolation" and "approximation", you can google also "imputation" - based on a probabilistic distribution, you fill the missing values by the most likely values. You even might get an associated probability for each of the 10 missing points, corresponding to the confidence you can put into it
  • asked a question related to Predictive Modeling
Question
3 answers
AI-driven technologies offer previously unheard-of capabilities to process enormous volumes of data, extract insightful knowledge and improve predictive models, according to the UN’s World Meteorological Organization (WMO). That means improved modelling and predicting climate change patterns that can help communities and authorities to draft effective adaptation and mitigation strategies.
Relevant answer
Answer
For the next five years, the Middle East might become an uninhabitable region, prompting significant migration. Currently, at DASSAT, we are conducting a study to predict which plant types (C3 or C4) will survive or perish under the conditions of severe climate change.
  • asked a question related to Predictive Modeling
Question
4 answers
As a natural ressources manager, I am exploring the viability of using slope as the primary indicator for assessing soil susceptibility to erosion. While slope is a significant factor, I am keen to understand its limitations and the additional variables that must be considered for a comprehensive evaluation.
Key Points for Consideration:
  1. Environmental Contexts: How does the reliability of slope as an indicator vary across different geographical regions and soil types?
  2. Complementary Factors: What other critical factors (e.g., soil composition, vegetation cover, rainfall patterns) should be integrated to enhance the accuracy of erosion risk assessments?
  3. Case Studies and Research: Can you provide examples of research where slope alone was insufficient or highly effective in predicting soil erosion? What methodologies were used to account for its limitations?
  4. Predictive Models: How do current erosion prediction models incorporate slope, and what advancements are being made to improve their predictive power?
I am looking for detailed insights, experiences, and references that can deepen the understanding of this complex issue and contribute to more robust soil erosion assessment methodologies.
Relevant answer
Answer
Of course, the steepness of the slope is one of the most important factors in the development of soil erosion, but far from the only one. There are many factors of erosion development. They also include the shape, exposure and length of slopes, the type of crop rotations and crops, the direction of slope treatment, the degree of erosion, the physical and chemical composition of soils. Climatic factors play an important role: The amount of precipitation, its frequency and intensity, for soil erosion from snowmelt, significant factors are previous moisture and the depth of soil freezing, the amount of water in the snow, the unevenness of the snow cover, and so on. Our research in the northern forest-steppe of Central Siberia has revealed the important role of slope length, namely: in conditions of low slope steepness, but large values of their length (500 m or more), soil erosion in the middle and lower parts of the slopes can reach very high values. So, when studying the processes of soil erosion, it is important to investigate and take into account the whole complex of influencing factors, and not just some of them.
  • asked a question related to Predictive Modeling
Question
3 answers
Project is to conduct a comprehensive analysis of online retail sales trends and customer behaviour using data collected from online sources. The project aims to derive actionable insights for optimising marketing strategies, improving customer experience, and enhancing overall business performance for e-commerce businesses. It will include the analysis of online retail data sourced from various e-commerce platforms, APIs, or web scraping techniques. The focus will be on understanding the factors influencing sales, identifying patterns in customer behaviour, and predicting future sales trends. Include customer segmentation and predictive modelling
Relevant answer
Answer
Predictive analysis of online retail sales trends and customer behavior involves using historical sales data, customer interactions, and advanced analytics techniques (like machine learning) to forecast future sales, identify purchasing patterns, and tailor marketing strategies to improve customer engagement and sales performance.
  • asked a question related to Predictive Modeling
Question
4 answers
I need any software that help me to calculate Harrell’s C-index for concordance of predictive model
Relevant answer
Answer
I think R Software
  • asked a question related to Predictive Modeling
Question
5 answers
Please
Relevant answer
Answer
I still haven't found the database
@
  • asked a question related to Predictive Modeling
Question
3 answers
I am using the CERES rice model in DSSAT to model CA rice production. The model is unable to predict lodging and damage from pests and diseases. I am modeling a cultivar that has been fairly extensively studied in California. The wealth of field trial data available for this cultivar allows a fairly accurate sense of losses due to lodging and disease. It seems logical that I could apply these loss rates to the total yield predicted by the model to get a more accurate yield output. However, I'm struggling to determine whether there is precedence for this kind of post-model adjustment? 
Relevant answer
Can you explain more about CERES rice model in DSSAT
  • asked a question related to Predictive Modeling
Question
10 answers
I understand that there are many machine learning predictive model, I just want to know the best model that can be used to predict customer behavior with the least amount of error .
Relevant answer
Answer
Predicting customer behavior can involve various factors, and the choice of machine learning model depends on the specific problem and data available. However, considering a general scenario, ensemble methods and deep learning models often perform well in customer behavior prediction tasks.
Extreme Gradient Boosting (XGBoost), Random Forest, Gradient Boosting Decision Trees and Recurrent Neural Networks (RNNs) can be some good cases for start your work.
The best model depends on the nature of your data and the specific problem you're trying to solve. You might need to experiment with multiple models and tuning hyperparameters to find the one that performs best with your specific dataset. Additionally, using a combination of models in an ensemble approach can often yield better results than relying on a single model.
  • asked a question related to Predictive Modeling
Question
4 answers
ROC curve analysis to detect the best cut-off of the independent variables in the prediction models
Relevant answer
Answer
If you simply want to know if the AUROC of two models or datasets is statistically significant, use the DeLong test.
  • asked a question related to Predictive Modeling
Question
2 answers
We are working on a predictive model using Lasso to evaluate the validation cohort using publicly available data.
1. ROC-AUC 0.800 (95%CI 0.551-1.000)
2. ROC-AUC 0.750 (95%CI 0.350-1.000)
3. ROC-AUC 0.575 (95%CI 0.317-1.000)
I am wondering how to describe the evaluation of (2): please let me know if I can describe (1) and (2) as being well predicted statistically. Since the lower 95% CI limit is 0.35, despite the high ROC-AUC of 0.75, shouldn't we describe the evaluation of the model positively? thank you so much for your support.
Relevant answer
Answer
As far as I know, does matter what method involving ML, if the AUC value is higher than 0.8, the model consider as has good predictive model.
  • asked a question related to Predictive Modeling
Question
3 answers
how to do calibration curves for the prediction models in SPSS, i think it is important in addition to the external validation
Relevant answer
Answer
Amount of bleeding, or presence of bleeding (yes/no)? If the latter, then the Cross Validated thread I mentioned earlier looks relevant.
  • asked a question related to Predictive Modeling
Question
1 answer
Subject: Request for Access to CEB-FIP Database (or similar) for Developing ML Predictive Models on Corroded Prestressed Steel
Dear ResearchGate Community,
I am in the process of developing a machine learning (ML) predictive model to study the degradation and performance of corroded prestressed steel in concrete structures. The objective is to utilize advanced ML algorithms to predict the long-term effects of corrosion on the mechanical properties of prestressed steel.
For this purpose, I am seeking access to the CEB-FIP database or any similar repository containing comprehensive data on corroded prestressed steel. This data is crucial for training and validating the ML models to ensure accurate predictions. I am particularly interested in datasets that include corrosion rates, mechanical property degradation, fatigue life, and other parameters critical to the structural performance of these materials.
If anyone has access to the CEB-FIP database or knows of similar databases that could serve this research purpose, I would greatly appreciate your assistance in gaining access.
Your support would be invaluable in furthering our understanding of material behavior in civil engineering and developing robust tools for predicting structural integrity.
I am open to collaborations and would be keen to discuss potential joint research initiatives that explore the application of machine learning in civil and structural engineering.
Thank you for your time and consideration. I look forward to any possible assistance or collaboration from the community.
Best regards,
M. Kovacevic
Relevant answer
Answer
Access to specific databases like the CEB-FIP database might require institutional or professional memberships. However, you can explore academic databases like Scopus, IEEE Xplore, or Web of Science for research papers and articles on corroded prestressed steel. Additionally, reaching out to relevant academic institutions or research organizations specializing in structural engineering or corrosion might provide access to valuable data and resources.
  • asked a question related to Predictive Modeling
Question
2 answers
I would like to know if there is a program or website (platform) that aims to create predictive models just by providing the database? That is, is there a free program with the standard molecular modeling (statistical tools) steps implemented and is easy to use?
Relevant answer
Answer
Sure, there are many free programs, such as Orange. Orange uses a friendly user interface to help non-expert programmers develop machine learning programs in a few simple steps using Python. link: https://orangedatamining.com
  • asked a question related to Predictive Modeling
Question
1 answer
From my understanding, the baseline hazard or baseline survival function is unkown because cox regression is semi-parametric model. So why and how can we use it as a prediction model, for example using it to predict the 10 years survival probability.
Relevant answer
Answer
We can. If you use Stata you can make predictive model after cox regression analysis by applying generalized linear model in poisson distribution.
  • asked a question related to Predictive Modeling
Question
2 answers
Hi i have estimated an armax model using python sippy library. The estimation gives me two transfer functions H and G. How can I combine them into a single one to predict model output for new input u(t)or to compute unit step response? I thought to somehow derive state space representation maybe...
Relevant answer
Answer
André Kummerow In your case, you've estimated two transfer functions, H and G, using the Python SIPPY library. These transfer functions represent different aspects of the ARMAX model:
  1. H: This transfer function typically represents the relation between the output and the noise in the system.
  2. G: This is the transfer function that relates the exogenous input u(t) to the output.
Now, to predict the model output for new inputs or to compute the unit step response, you need to combine these transfer functions. Here's a simple way to understand the combination process:
  1. Conceptual Understanding: Think of the ARMAX model as a system where your input signal u(t) passes through a filter (represented by G) and adds to a noise component (represented by H) to produce the output. In a more technical sense, the output of the system is the sum of the responses from each transfer function to their respective inputs.
  2. Mathematical Approach: To combine H and G, you'd typically use the principle of superposition, which is valid for linear systems like ARMAX. The total output y(t) of the system can be expressed as the sum of the output due to the input u(t) (processed by G) and the output due to the noise (processed by H).
  3. Implementation: In Python, using libraries like SIPPY or control systems libraries, you can simulate this behavior. For a given input u(t), you can simulate the response of G to this input and separately simulate the response of H to the noise input. Adding these two responses gives you the total system output.
  4. State-Space Representation: Converting to a state-space representation can be a good idea if you're comfortable with it. State-space models offer a more general framework for representing linear systems and can be more intuitive for simulation and control purposes. Each transfer function (H and G) can be represented in state-space form, and you can then combine these state-space models appropriately.
  5. Practical Tips: Ensure that the data you use for simulation is well-prepared, and the noise characteristics (for H) are well understood. The accuracy of your predictions heavily relies on the quality of your model and the data.
  6. Advanced Considerations: If you're delving deeper, consider the frequency response of your combined system and its stability. These are crucial for ensuring that your model behaves as expected in various conditions.
  • asked a question related to Predictive Modeling
Question
3 answers
Hi
I have a non-linear equation
I want to convert it into a linear equation using the predictive model method
Can anyone help?
and then optimize with linear programming method
please send an e-mail
Relevant answer
Answer
Roozbeh Mousavi: definitely, a great answer is provided by "Google Bard". To summarize, it is enough to take several courses in statistics, nonlinear programming, linear regression, probability theory, and the answer will appear/ reveal by itself. ...
  • asked a question related to Predictive Modeling
Question
5 answers
I want to develop a predictive model using random forest, SVM, and decision tree algorithms. How can I estimate my sample size? Is there any formula?
Relevant answer
Answer
Determining the sample size for machine learning-based predictive model development is crucial to ensure that your model is statistically sound and can generalize well to unseen data. The sample size depends on various factors, including the complexity of the problem, the desired level of statistical confidence, and the machine learning algorithm you plan to use. Here's a general process to help you calculate an appropriate sample size:
  1. Define Your Objectives:Clearly define the goals of your predictive model. Determine the minimum effect size or improvement you want to detect.
  2. Select Significance Level (Alpha):Choose a significance level (alpha) to represent the probability of making a Type I error (false positive). Common values are 0.05 or 0.01.
  3. Choose Power (1 - Beta):Select the desired statistical power (1 - beta) to represent the probability of correctly detecting a real effect. A typical value is 0.80, which implies an 80% chance of detecting a true effect if it exists.
  4. Estimate Expected Variability:Determine the expected variability in your data. This can be done by analyzing past data, conducting a pilot study, or consulting domain experts.
  5. Select the Appropriate Test:Choose the statistical test that matches your research question and data type. For predictive modeling, this may involve regression, classification, or other machine learning algorithms.
  6. Use Sample Size Formulas:Use sample size formulas specific to your chosen statistical test. For example, sample size formulas for t-tests, chi-square tests, or regression analysis. Sample size calculators and statistical software can also help determine the required sample size. Some popular tools include G*Power, R's pwr package, or online calculators.
  7. Account for Complexity:Take into account the complexity of your machine learning model. More complex models may require larger sample sizes to avoid overfitting.
  8. Cross-Validation:In machine learning, cross-validation is often used to assess model performance. The sample size used for training, validation, and testing subsets may differ, but the overall sample size should be sufficient to provide meaningful results.
  9. Consider Imbalanced Data:If your dataset is highly imbalanced (e.g., rare events), you may need to oversample the minority class or use techniques like SMOTE (Synthetic Minority Over-sampling Technique).
  10. Iterate and Validate:Once you have a preliminary sample size estimate, validate it through simulations or by collecting data from a smaller pilot study. Be prepared to iterate and adjust the sample size if necessary.
  11. Ethical Considerations:Ensure that your sample size respects ethical considerations, including privacy, consent, and data protection regulations.
  12. Consult a Statistician:If your project involves complex statistical methods or if you're unsure about the appropriate sample size, consider consulting a statistician for guidance.
The sample size calculation is a critical step in ensuring the validity and reliability of your machine learning-based predictive model. It's important to strike a balance between statistical rigor and practical constraints to obtain meaningful results.
  • asked a question related to Predictive Modeling
Question
6 answers
Is it possible to build a highly effective forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science analytics, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies?
Is it possible to build a highly effective, multi-faceted, intelligent forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science analytics, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies as part of a forecasting system for complex, multi-faceted economic processes in such a way as to reduce the scale of the impact of the paradox of a self-fulfilling prediction and to increase the scale of the paradox of not allowing a predicted crisis to occur due to pre-emptive anti-crisis measures applied?
What do you think about the involvement of artificial intelligence in combination with Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies for the development of sophisticated, complex predictive models for estimating current and forward-looking levels of systemic financial, economic risks, debt of the state's public finance system, systemic credit risks of commercially operating financial institutions and economic entities, forecasting trends in economic developments and predicting future financial and economic crises?
Research and development work is already underway to teach artificial intelligence to 'think', i.e. the conscious thought process realised in the human brain. The aforementioned thinking process, awareness of one's own existence, the ability to think abstractly and critically, and to separate knowledge acquired in the learning process from its processing in the abstract thinking process in the conscious thinking process are just some of the abilities attributed exclusively to humans. However, as part of technological progress and improvements in artificial intelligence technology, attempts are being made to create "thinking" computers or androids, and in the future there may be attempts to create an artificial consciousness that is a digital creation, but which functions in a similar way to human consciousness. At the same time, as part of improving artificial intelligence technology, creating its next generation, teaching artificial intelligence to perform work requiring creativity, systems are being developed to process the ever-increasing amount of data and information stored on Big Data Analytics platform servers and taken, for example, from selected websites. In this way, it may be possible in the future to create "thinking" computers, which, based on online access to the Internet and data downloaded according to the needs of the tasks performed and processing downloaded data and information in real time, will be able to develop predictive models and specific forecasts of future processes and phenomena based on developed models composed of algorithms resulting from previously applied machine learning processes. When such technological solutions become possible, the following question arises, i.e. the question of taking into account in the built intelligent, multifaceted forecasting models known for years paradoxes concerning forecasted phenomena, which are to appear only in the future and there is no 100% certainty that they will appear. Well, among the various paradoxes of this kind, two particular ones can be pointed out. One is the paradox of a self-fulfilling prophecy and the other is the paradox of not allowing a predicted crisis to occur due to pre-emptive anti-crisis measures applied. If these two paradoxes were taken into account within the framework of the intelligent, multi-faceted forecasting models being built, their effect could be correlated asymmetrically and inversely proportional. In view of the above, in the future, once artificial intelligence has been appropriately improved by teaching it to "think" and to process huge amounts of data and information in real time in a multi-criteria, creative manner, it may be possible to build a highly effective, multi-faceted, intelligent forecasting system for future financial and economic crises based on artificial intelligence technology, a system for forecasting complex, multi-faceted economic processes in such a way as to reduce the scale of the impact of the paradox of a self-fulfilling prophecy and increase the scale of the paradox of not allowing a predicted crisis to occur due to pre-emptive anti-crisis measures applied. In terms of multi-criteria processing of large data sets conducted with the involvement of artificial intelligence, Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4. 0 technologies, which make it possible to effectively and increasingly automatically operate on large sets of data and information, thus increasing the possibility of developing advanced, complex forecasting models for estimating current and future levels of systemic financial and economic risks, indebtedness of the state's public finance system, systemic credit risks of commercially operating financial institutions and economic entities, forecasting economic trends and predicting future financial and economic crises.
In view of the above, I address the following questions to the esteemed community of scientists and researchers:
Is it possible to build a highly effective, multi-faceted, intelligent forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies in a forecasting system for complex, multi-faceted economic processes in such a way as to reduce the scale of the impact of the paradox of the self-fulfilling prophecy and to increase the scale of the paradox of not allowing a forecasted crisis to occur due to pre-emptive anti-crisis measures applied?
What do you think about the involvement of artificial intelligence in combination with Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies to develop advanced, complex predictive models for estimating current and forward-looking levels of systemic financial risks, economic risks, debt of the state's public finance system, systemic credit risks of commercially operating financial institutions and economic entities, forecasting trends in economic developments and predicting future financial and economic crises?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
Relevant answer
Answer
In my opinion, in order to determine the question of the possibility of building a highly effective forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science analytics, Big Data Analytics, Business Intelligence and/or other Industry 4.0/5.0 technologies, it is first necessary to precisely define the essence of forecasting specific risk factors, i.e. factors that in the past were the sources of the occurrence of certain types of economic, financial and other crises and that may be such factors in the future. But will such a structured forecasting system based on a combination of Big Data Analytics and Artificial Intelligence be able to forecast events that appear as unusual, generating new types of risks, referred to as so-called "black swans", such as forecasting the appearance of another but generated by a difficult to predict new type of risk, an unusual event leading to the occurrence of another e.g. something similar to the 2008 global financial crisis, the 2020 pandemic, or something completely new that has not yet appeared.
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
  • asked a question related to Predictive Modeling
Question
2 answers
By combining the technologies of quantum computers, Big Data Analytics, artificial intelligence and other Industry 4.0 technologies, is it possible to significantly improve the predictive analyses of various multi-faceted macroprocesses?
By combining the technologies of quantum computers, Big Data Analytics, big data analytics and information extracted from e.g. large numbers of websites and social media sites, cloud computing, satellite analytics etc. and artificial intelligence in joint applications for the construction of integrated analytical platforms, it is possible to create systems for the multi-criteria analysis of large quantities of quantitative and qualitative data and thus significantly improve predictive analyses of various multi-faceted macro-processes concerning local, regional and global climate change, the state of the biosphere, natural, social, health, economic, financial processes, etc.?
Ongoing technological progress is increasing the technical possibilities of both conducting research, collecting and assembling large amounts of research data and their multi-criteria processing using ICT information technologies and Industry 4.0. Before the development of ICT information technologies, IT tools, personal computers, etc. in the second half of the 20th century as part of the 3rd technological revolution, computerised, semi-automated processing of large data sets was very difficult or impossible. As a result, the building of multi-criteria, multi-article, big data and information models of complex macro-process structures, simulation models, forecasting models was limited or practically impossible. However, the technological advances made in the current fourth technological revolution and the development of Industry 4.0 technology have changed a lot in this regard. The current fourth technological revolution is, among other things, a revolution in the improvement of multi-criteria, computerised analytical techniques based on large data sets. Industry 4.0 technologies, including Big Data Analytics technology, are used in multi-criteria processing, analysing large data sets. Artificial Intelligence (AI) can be useful in terms of scaling up the automation of research processes and multi-faceted processing of big data obtained from research.
The technological advances taking place are contributing to the improvement of computerised analytical techniques conducted on increasingly large data sets. The application of the technologies of the fourth technological revolution, including ICT information technologies and Industry 4.0 in the process of conducting multi-criteria analyses and simulation and forecasting models conducted on large sets of information and data increases the efficiency of research and analytical processes. Increasingly, in research conducted within different scientific disciplines and different fields of knowledge, analytical processes are carried out, among others, using computerised analytical tools including Big Data Analytics in conjunction with other Industry 4.0 technologies.
When these analytical tools are augmented with Internet of Things technology, cloud computing and satellite-implemented sensing and monitoring techniques, opportunities arise for real-time, multi-criteria analytics of large areas, e.g. nature, climate and others, conducted using satellite technology. When machine learning technology, deep learning, artificial intelligence, multi-criteria simulation models, digital twins are added to these analytical and research techniques, opportunities arise for creating predictive simulations for multi-factor, complex macro processes realised in real time. Complex, multi-faceted macro processes, the study of which is facilitated by the application of new ICT information technologies and Industry 4.0, include, on the one hand, multi-factorial natural, climatic, ecological, etc. processes and those concerning changes in the state of the environment, environmental pollution, changes in the state of ecosystems, biodiversity, changes in the state of soils in agricultural fields, changes in the state of moisture in forested areas, environmental monitoring, deforestation of areas, etc. caused by civilisation factors. On the other hand, complex, multifaceted macroprocesses whose research processes are improved by the application of new technologies include economic, social, financial, etc. processes in the context of the functioning of entire economies, economic regions, continents or in global terms.
Year on year, due to technological advances in ICT, including the use of new generations of microprocessors characterised by ever-increasing computing power, the possibilities for increasingly efficient, multi-criteria processing of large collections of data and information are growing. Artificial intelligence can be particularly useful for the selective and precise retrieval of specific, defined types of information and data extracted from many selected types of websites and the real-time transfer and processing of this data in database systems organised in cloud computing on Big Data Analytics platforms, which would be accessed by a system managing a built and updated model of a specific macro-process using digital twin technology. In addition, the use of supercomputers, including quantum computers characterised by particularly large computational capacities for processing very large data sets, can significantly increase the scale of data and information processed within the framework of multi-criteria analyses of natural, climatic, geological, social, economic, etc. macroprocesses taking place and the creation of simulation models concerning them.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
Is it possible, by combining the technologies of quantum computers, Big Data Analytics, big data analytics and information extracted from, inter alia, a large number of websites and social media portals, cloud computing, satellite analytics, etc., and artificial intelligence in joint applications of building integrated analytical platforms? and artificial intelligence in joint applications for the construction of integrated analytical platforms, is it possible to create systems for the multi-criteria analysis of large quantities of quantitative and qualitative data and thereby significantly improve predictive analyses of various multi-faceted macro-processes concerning local, regional and global climate change, the state of the biosphere, natural, social, health, economic, financial processes, etc.?
By combining the technologies of quantum computers, Big Data Analytics, artificial intelligence and other Industry 4.0 technologies, is it possible to significantly improve the predictive analyses of various multi-faceted macroprocesses?
By combining the technologies of quantum computers, Big Data Analytics, artificial intelligence, is it possible to improve the analysis of macroprocesses?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
The above text is entirely my own work written by me on the basis of my research.
I have not used other sources or automatic text generation systems such as ChatGPT in writing this text.
Copyright by Dariusz Prokopowicz
Dariusz Prokopowicz
Relevant answer
Answer
In my opinion, thanks to the combination of the above-mentioned technologies, there are new opportunities to expand research and analytical capabilities, to process large data sets within the framework of Big Data Analytics, to develop predictive models for various types of macro-processes.
What is your opinion on this topic?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
  • asked a question related to Predictive Modeling
Question
4 answers
I would like to construct principal components from 30+ financial ratios for a predictive model. I would then use logistic regression and support vector machines for the predictions, conditioned on the principal components. I have panel data with T << N. For PCA, the data should be iid. I am concerned that the time series are not independent. I have only 10 years of data, which precludes most time series statistical testing. I have seen several peer-reviewed papers constructing principal components using panel data, but the potential problems with using panel data for PCA are not discussed in the papers. The papers all seem to use the standard PCA approach one would normally use with a cross-section but with panel data. I have researched several means of doing a PCA with time series and a couple that use panel data as one of several examples of a general (usually very complicated) PCA procedure, but nothing seems to "fit the bill" for my standard panel dataset. I would greatly appreciate some direction as to where I might go to look for an adequate procedure or suggestions for a procedure that could possibly work. I am an applied econometrician, and it would be difficult for me to translate a complex procedure into code. So, ideally, I would like to find a procedure for which there is existing code (I use SAS and R). Thanks in advance for any insights provided,
Relevant answer
Answer
Principal component analysis (PCA) can be used to analyze panel data. However, the data must be balanced for PCA to be used.
  • asked a question related to Predictive Modeling
Question
22 answers
AI aids in climate change mitigation through various applications, such as optimizing renewable energy systems, analyzing climate data for predictive modeling, and enabling more efficient resource management, leading to reduced greenhouse gas emissions and more sustainable practices.
  • asked a question related to Predictive Modeling
Question
4 answers
Hi. I'm planning to conduct a multinomial logistic regression analysis for my predictive model (3 outcome categories). How can I estimate the sample size? I believe the EPV rule of thumb is suitable only for binary outcomes. Is there any formula/software that I can use?
Relevant answer
Answer
I'd use simulations. You can use any programming language for this. I recommend R, hat could also use later to analyze multinominal models.
  • asked a question related to Predictive Modeling
Question
2 answers
I already tried using pvisgam (itsadug library) and although it does include the color key (zlim code) this only works for partial effects. Thanks in advance.
Relevant answer
Answer
Thanks! I met the same problem in 2023. Thanks for your discussions.
  • asked a question related to Predictive Modeling
Question
4 answers
How does R programming support machine learning and predictive modelling in research applications? Can you highlight some popular R packages and algorithms that researchers can use for building and evaluating predictive models?
Relevant answer
Answer
To perform predictions, I use gradient boosting models, thus, i use the package gbm. It depends on the model that you want to use there are also for example rf for random forest xgboost for Extreme gradient boosting. Caret is a good package but for Linear or Logistic Regression that are more inference models.
  • asked a question related to Predictive Modeling
Question
3 answers
kindly want to know all model for financial model other yhen altman 1968- beaven 1966
kinda 1980- sherrod 1987 - --
how else
Relevant answer
Answer
ok ,
could you send to me all method assumption and application domain
  • asked a question related to Predictive Modeling
Question
2 answers
Illustrate the comparative analysis of VAN Method, Poison Regression or any other methods available?
Relevant answer
Answer
  1. Электромагнитное поле п
  • asked a question related to Predictive Modeling
Question
3 answers
Can you explain it or recommend some papers?
(1) For classification problem, we can use recall rate, precision rate and F1 score, AUC, balanced accuracy(although I don't know what is mean/ max balanced accuracy) in same time to compare different classifiers.
but I haven't see any paper to evaluate different numerical prediction models (forgive my ignorance).
(2) I only know r-squared, MAE, MAPE and RMSE could be used to assess models, but how to combine them to evaluate in a certain case, should add any other metrics?
(3) Could you help explain max/mean balanced accuracy, mean/max recall, mean/max accuracy, mean/max AUC, mean/max F1 if you are free. Why use mean or max, aren't these values ​​specific values? Why are they divided into mean and max?
Relevant answer
Answer
There are many different metrics that can be used to evaluate numerical prediction models. Some commonly used metrics include R-squared, mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE). Each of these metrics has its own strengths and weaknesses, and the choice of metric(s) to use will depend on the specific goals of the analysis.
One paper that discusses evaluating numerical prediction models is "PM10 and PM2.5 real-time prediction models using an interpolated convolutional neural network" by Chae et al. ¹. Another paper that discusses evaluating prediction models more generally is "Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (TRIPOD-SRMA)" by Snell et al. ².
Regarding your question about mean/max balanced accuracy, recall, accuracy, AUC, and F1 score: these are all metrics used to evaluate classification models. The choice between using the mean or max value of these metrics will depend on the specific goals of the analysis. For example, if you are interested in the average performance of a model across multiple classes or multiple runs, you might use the mean value. If you are interested in the best possible performance of a model, you might use the max value.
(1) PM10 and PM2.5 real-time prediction models using an ... - Nature. https://www.nature.com/articles/s41598-021-91253-9.
(2) Transparent reporting of multivariable prediction models for individual .... https://www.bmj.com/content/381/bmj-2022-073538.
(3) [1905.11744] Evaluating time series forecasting models: An empirical .... https://arxiv.org/abs/1905.11744.
  • asked a question related to Predictive Modeling
Question
3 answers
There are lots of metrics such r-square, MSE, MAE, MAPE and others. Some paper use r-square combining RMSE, some use EVS+MAPE+t-test+NSE, so if I want to evaluate a price prediction model or a conversion rate prediction model, what combination should I choose?
Relevant answer
Dear Ling i have answered your question already .
we are faced with different metrics and indices but it's highly depend on your data type but you can use confusion matrix or all of your above mentioned metrics . also you can use statistical distance functions like Euclidean Distance, Manhattan Distance, Minkowski Distance and so on . try to define some them in your MATLAB or python code to ensure that which one is more accurate and sensitive in your case .
  • asked a question related to Predictive Modeling
Question
3 answers
I am working on a project to develop a predictive model to predict the probability of tax revenue (continous value). Any previous model or experience in the field?
Relevant answer
Answer
The best method for calculating future tax revenue is to use a micro-analytical simulation model using the anonymized individual (panel) data of taxpayers. This allows an estimation accuracy of almost 100% to be achieved.
  • asked a question related to Predictive Modeling
Question
2 answers
I am working on a project to develop a predictive model to predict the probability of student success in completing a program. Any previous model or experience in the field?
Relevant answer
Answer
Yes, i am going to use a logistic regression model. What i want to know is the experience of others on using the model. Was it effective? What lessons learned? …
  • asked a question related to Predictive Modeling
Question
6 answers
I would like to know what regression algorithm works best with complex data with good accuracy and RMSE score.
Relevant answer
Answer
Maybe you can consider the recursive least squares algorithm (RLS). RLS is the recursive application of the well-known least squares (LS) regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. The method allows for the dynamical application of LS to time series acquired in real-time. As with LS, there may be several correlation equations and a set of dependent (observed) variables. For the recursive least squares algorithm with forgetting factor (RLS-FF), acquired data is weighted according to its age, with increased weight given to the most recent data.
A particularly clear introduction to RLS is found at: Karl J. Åström, Björn Wittenmark, "Computer-Controlled Systems: Theory and Design", Prentice-Hall, 3rd ed., 1997.
Years ago, while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS-FF algorithm to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, hence giving increased weight to most recent data. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). The power dissipated by agitation was accessed by a torque meter (pilot plant). The proposed (adaptive) control algorithm compared favourably with PID. Simulations assessed the effect of numerically generated white Gaussian noise (2-sigma truncated) and of first order delay. This investigation was reported at (MSc Thesis):
  • asked a question related to Predictive Modeling
Question
5 answers
Probabilistic modelling, Forecasting, Renewable energy uncertainty prediction, ARIMA
Relevant answer
Answer
The question is not clear because ARIMA model is itself probabilistic and not deterministic. Please clarify it better.
  • asked a question related to Predictive Modeling
Question
3 answers
Considering that deep learning models can automatically learn features from data, do they need other special feature engineering techniques to attain high accuracy given that it is challenging to extract relevant features using most feature engineering tools and that deep neural networks need more processing power and time to learn well when dealing with complex data?
If needed, in what context of application will doing this be required and how would this impact the model performance?
Contrary to the above, for better model performance what would be your recommendation of the most suited type of deep learning algorithm to be implemented for Image Recognition, Natural Language Processing (NLP), and different types of Predictive modeling projects from complex data without the use of additional feature engineering approach on dataset?
Relevant answer
Answer
It is true that deep learning models, especially Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have the ability to automatically learn features from data. However, feature engineering can still be a valuable approach in some cases to improve the performance of deep learning models. Feature engineering involves transforming the raw data into a more informative representation that can help the model learn more effectively. For example, in computer vision, hand-engineered features such as SIFT and HOG can be used to represent images and provide additional information to the model beyond the raw pixels.
Whether or not additional feature engineering is needed for a specific problem depends on the complexity of the data and the type of problem being solved. For example, if the data is relatively simple and the model is able to learn effective features from the raw data, additional feature engineering may not be necessary. On the other hand, if the data is complex and the model is struggling to learn effective features, additional feature engineering may be necessary to improve the performance of the model.
In terms of recommendations for deep learning algorithms, here is a rough guide:
  • Image Recognition: CNNs are the most commonly used type of deep learning algorithm for image recognition, as they are well-suited to handling image data. ResNet, Inception, and VGG are popular architectures for this task.
  • Natural Language Processing (NLP): RNNs and Transformer models are commonly used for NLP tasks, such as sentiment analysis and language translation. Bidirectional LSTM, GRU, and BERT are popular architectures for this task.
  • Predictive modeling: Deep feedforward neural networks (also known as multilayer perceptrons) can be used for a wide range of predictive modeling tasks, such as regression and classification. The architecture of the network can be adjusted to handle different types of data and different levels of complexity.
It's important to note that these are rough guidelines and the best algorithm for a specific problem will depend on the characteristics of the data and the problem being solved. In practice, it is often necessary to experiment with different algorithms and architectures to find the best solution.
  • asked a question related to Predictive Modeling
Question
1 answer
Hello, I have created a roadkill prediction model using MaxEnt and have inputted the .asc output into ArcPro. I would now like to perform further analysis using ArcPro 'Optimized hot spot analysis' tool. I have seen this can be done in multiple scientific papers.
I have tried to convert my .asc file into points using 'Raster to points' tool. Unfortunately, this did not work.
Does anyone know a way to transform the .asc file into point data or perform hot spot analysis on an .asc file without transformation.
Thank you for your time!
Relevant answer
Answer
MaxEnt .asc files can be used in ArcGIS Pro to perform hotspot analysis, identifying areas of the high occurrence of species or other phenomena of interest. Here are the steps to perform a hotspot analysis using a MaxEnt .asc file in ArcGIS Pro:
  1. Add the MaxEnt .asc file to the ArcGIS Pro map as a raster layer.
  2. Use the Raster to Point tool in the Spatial Analyst toolbox to convert the raster layer into a point layer, representing the locations of the species of interest.
  3. Use the Point Density tool in the Spatial Analyst toolbox to calculate the density of points within a specified search radius.
  4. Use the Reclassify tool in the Spatial Analyst toolbox to reclassify the density raster into a binary raster, where cells with a high density of points are classified as "hotspots," and cells with low density are classified as "non-hotspots."
  5. Use the Raster to Polygon tool in the Conversion toolbox to convert the binary raster into a polygon layer, representing the hotspot areas.
  6. Symbolize the polygon layer to highlight the hotspot areas.
  7. Perform further analysis as needed, such as calculating the area or perimeter of the hotspot polygons or identifying the habitats or land use types within the hotspot areas.
  • asked a question related to Predictive Modeling
Question
2 answers
I'm looking for long term load histories from real structures to training some AI prediction models. These must be real signals, not a simulation result. The measured quantities can be accelerations (e.g. piezoelectric sensors) or even better strains (e.g. strain gauges). By long-term runs, I mean a length of several months to several years, or many recordings of similar load cycles (e.g. an airplane flight cycle or a typical excavator working day). Maybe this will be the beginning of an interesting collaboration on a scientific article? Anyone know, anyone seen :-)
Relevant answer
Answer
At the moment I am not sure how AI can take-over the statistics. There can good information from the time series analysis of the data. So can't think how can AI add more information.
  • asked a question related to Predictive Modeling
Question
10 answers
Hello every body,
I need references that talk about predictive models of corrosion (uniform and localized) of oil and gas metal storage tanks.
Relevant answer
Answer
Dear Dr. Bilal Zerouali ,
as correctly said by Dr. Frank Druyts , the topic is very broad and I suggest you to have a first look at the following, interesting papers:
-Prediction and Modelling of Corrosion in Steel StorageTank Using Non-destructive Inspection
Mosaad Mohamed Sadawy1 and Eltohamy Rabie Elsharkawy
Journal of Materials Science and Engineering B, 3 (12), 785-792 (2013)
-Simplified Modelling of the Remaining Useful Lifetime of Atmospheric Storage Tanks in Major Hazard Establishments
Maria Francesca Milazzo, Giuseppa Ancione , Paolo Bragatto , Canio Mennuti
CHEMICAL ENGINEERING TRANSACTIONS, Vol. 83 (2020)
-Corrosion analysis and remaining useful life prediction for storage tank bottom
Yu Feng and Biqing Huang
International Journal of Advanced Robotic Systems (2019) https://doi.org/10.1177/172988141987
-Corrosion rate measurement for steel sheets of a fuel tank shell being in service
Mariusz Maslak and Janusz Siudut
My best regards, Pierluigi Traverso.
  • asked a question related to Predictive Modeling
Question
4 answers
Dear Colleagues I hope all is well. I have a classification project, the event outcome is imbalanced (25% of the subjects having the event) The data has almost 90 variables that become 250 after doing one hot encoding I tried oversampling technique and the accuracy, sensitivity are really very well in the oversampled training data (just excellent, with accuracy up to 95%). However, it is very bad, on the validation set (almost 25%). This happens when using Logistic regression, random Forest, decision tree, XGBOOST, gradient boost and Bagging. I wonder if this would be related to the big number of features (250 features?) Shall I do Recursive feature elimination Random Forest before running all these models on the oversampled data? Would this make a difference? Or Recursive feature elimination Random Forest is only used at the end to get a simplified prediction model?
Relevant answer
Answer
Hi Hatem,
this is really a strange situation and I am interested to know the cause of this problem once you solve it. Coming back to your question, I would recommend the BorutaShap method (based on Boruta algorithm and Shap values) to select the most relevant features/predictors for your classification task. It is available in Python and can also be easily adapted to your application. I would also suggest doing k-fold cross-validation to be sure it is not a matter of random effect in your data or parameters. Good luck!
  • asked a question related to Predictive Modeling
Question
2 answers
Given a business problem, there is no hard and fast rule to determine the exact number of neurons and hidden layers required to build a neural network architecture. The optimal size of the hidden layer in a neural network lies between the size of the output layers and the size of the input. However, here are some common approaches that have the advantage of making a great start to building a neural network architecture –
To address any specific real-world predictive modeling problem, the best way is to start with rough systematic experimentation and find out what would work best for any given dataset based on prior experience working with neural networks on similar real-world problems. Based on the understanding of any given problem domain and one’s experience working with neural networks, one can choose the network configuration. The number of layers and neurons used on similar problems is always a great way to start testing the configuration of a neural network.
It is always advisable, to begin with, simple neural network architecture and then go on to enhance the complexity of the neural network.
Try working with varying depths of networks and configure deep neural networks only for challenging predictive modeling problems where depth can be beneficial.
Relevant answer
Answer
I completely agree that there is no "one size fits all" Theoretical Solution yet for this problem. That being said, I would also say that since there is no generally accepted Theoretical Solution, you are reduced to finding the "Best Fit" empirically validated answer. As in all problem solution finding, the first step in understanding and validating the solution answer selected is to define what is the criteria that defines your particular "Best Fit". One consideration is compute power available and for how long. This will define the maximum depth and width of the solution neural network layers. Note these criteria are not mutually exclusive. By that I mean you can sometimes trade off depth and width to get an acceptable solution. The problem then becomes a classic tradeoff analysis with many well known solutions. Now the problem becomes one of validation of the generated proposed solutions, and that leads to analysis of what validation tools you have available to "validate". [If you have known "use cases" with acceptable solutions that makes the problem of selection of the "Best Fit" a lot easier, but that all depends on your specific environment.]
  • asked a question related to Predictive Modeling
Question
3 answers
I am working on predicting some parameters in soil using visible-near infrared reflectance data. My question is 'What is the minimum number of soil samples and their corresponding reflectance data required for generating a prediction model using ANN machine learning algorithm?'.
Relevant answer
Answer
It would surprise me if you can pH from spectral data, but it sounds interesting. Could you share some data? I could try and test some models with my software, see: www.lerenisplezant.be/fitting.htm
  • asked a question related to Predictive Modeling
Question
2 answers
for example, how can I calculate c-index or c-statistic (for investigation of my prediction model performance) at 5-year and 10-year follow-up in cohort sample?
Relevant answer
  • asked a question related to Predictive Modeling
Question
1 answer
I am using photo-fenton catalyst to degrade different types of Persistent Organic Pollutants. Upon degradation we have analysed the degraded product using GC-MS and LCMS. To further analyse the toxicological effect of degraded product we want to do ecotoxological analysis using ECOSAR predictive model. We are seeking collaboration who can help us on this.
Relevant answer
Answer
I have done quite a bit of work with ECOSAR. What is your specific question?
  • asked a question related to Predictive Modeling
Question
5 answers
I've built a data matrix that contains 7 independent variables. When I performed linear regression using this matrix, I noticed that one of the independent variables is displayed as an exponential function, whereas the rest is plotted linearly.
I'm aware (although not experienced) that it is possible to perform non-linear regression on this single non-linear independent variable, but I'm not sure how to combine the data of the linear and non-linear regression into a single predictive model.
I would be very grateful if anyone could share some insights into this problem!
Relevant answer
Answer
If Y(X) is something like a*bX, use Z = eX in your model (it would be linear in Z).
Also have a look at the residual diagnostic plots from the multivariate model.
  • asked a question related to Predictive Modeling
Question
1 answer
I want to build an LSTM-based model (where all the outputs are the same features as the input, four inputs, and four outputs) to forecast the next measurement of a sensor node. The scenario is: after receiving the first input (a multivariate data point), the model predicts the second, then gets the second input and predicts the third based on the first and the second values, and so on.
  • asked a question related to Predictive Modeling
Question
1 answer
I am aiming to assess habitat suitability for a fish species. I would like to understand multi-scale habitat variables and be able to create a predictive model/map from this data. It seems like Random Forest modeling would be a great application for this, however, I would like to account for imperfect detection. I have seen that random forest is better at dealing with non-linear effects compared to occupancy models. I have also read that imperfect detection can really obfuscate the predictions from a Random Forest model.
I have seen a few groups using them in 'conjunction' (comparing predicted results from RF and Occupancy models) but it never seems like method actually compliments the other. I could be wrong.
Any insight into clear advantages of one over the other for my application would be greatly appreciated. Additionally, if anyone has seen both used in tandem, in a sensible manner that would also be great.
Thanks so much!
Relevant answer
You can overcome imperfect detection using Distance Sampling and therefore use Random Forest :)
  • asked a question related to Predictive Modeling
Question
4 answers
Hello Experts,
Since our targeted species is found only in the 2 km region of the study site, we are planning to use 30 m spatial resolution climate data on our Species Distribution Model. But the problem is that my local weather station is capable of providing 20 km resolution data. On the other hand, if I use WorldClim data that is also 1 km.
My questions are
1. Can I use these downscaled data (from 1 km or 20 km) on my local study on SDM, which will be on 30 m resolution?
2. If I downscale, will be there any variational changes on climate data? Is it acceptable to do so?
Please note that I'm new to this field.
Thank you for your valuable time.
  • asked a question related to Predictive Modeling
Question
5 answers
Hello Experts,
My study site is relatively small and the targeted species is found as continuous patches. Do I need to consider Patch size/area in the MaxEnt model?
Does patch size have any meaningful measurable values that can be included in the MaxEnt model?
Thank you.
Relevant answer
Answer
The patch size can not be measured and given a value in the MaxEnt model. However, we can improve the representativeness of the sample sites within the MaxEnt algorithm.
  • asked a question related to Predictive Modeling
Question
4 answers
Hello Experts,
We are at the beginning of making predictive modelling of an invasive plant species using MaxEnt. The species is found as a patch over the study area. I am new at using this model, have a piece of limited knowledge about it. I have reviewed several papers where only point locations of the present occurrence had been used.
Since my target species occurs as a patch, How can I take the polygonal area of the species where it occurs, instead of point location data?
Or are there any other methods to cover the whole patch of the species into SDM?
Relevant answer
Answer
No, only occurrence points. Keep in mind that bioclimatic or environmental variables are the ones that could potentially represent the species, they are not always the traditional WorldClim ones. To do this you must study the behavior of the species!
I hope I have been useful to you.
  • asked a question related to Predictive Modeling
Question
4 answers
Dear colleagues.
I am thinking about the possibility of developing a predictive model applied to technology transfer processes in healthcare, its main intention in anticipating the success achieved by the parties involved. Computationally, is this possible?
Relevant answer
Answer
Bondar-Podhurskaya Oksana Hi, do you have any suggestions for methods?
  • asked a question related to Predictive Modeling
Question
9 answers
Predictive models that use ordinary least squares (OLS) for parameter estimation must show residuals with normal distribution and constant variance (homoscedastic).
However, in most scientific articles (in engineering-related areas, at least) I don't see a concern with meeting these assumptions. In your opinion, why does this happen? In the end, the results do not change that much when we make the necessary transformations so that these assumptions are met?
If you have had any experience with this topic, please feel free to share.
Relevant answer
Answer
The OLS is a short cut to the ML solution. It can be derived directly from the assumption that Y|X ~ N(µ(X), s²), but the solution is correct for any distribution model: justlike the ML estimate, the OLS estimate estimates the expected value (of the parameter) and this is independent of the assumed distribution model ("under some mild regularity conditions", eg. as long as it has a finite expectation and variance).
The difference, therefore, is not in the predicted value, but rather in the uncertainty attributed to this prediction. Depending on the research context, this uncertainty may or may not be relevant. If it is not relevant, then there is no need invest much mental work in figuring out a "most correct" or "least wrong" distribution model.
When models are really used for prediction, the "model inherent" uncertainty (that is determined by the chosen distribution model) associated with a prediction is usually not relevant. What is of much greater relevance and impact here is the differences in predictions between possible alternative models. This is particularily relevant when the predictions are forecasts. Provided there is sufficient data, this pleiotropy of possible alternative models can be adressed by heavily over-parametrized models where the impact of the assumed distributional model approaches zero (such models are nowadays called "deep-learning" models, neural networks, AI etc,). This is then very much on the side of getting most correct predictions at the cost of gaining the least amount of insight. But it works to get predictions with good or acceptable positive and negative predictive values.
  • asked a question related to Predictive Modeling
Question
4 answers
Respected all,
I am using regression modelling for crash prediction model. Can anyone tell me what should be the minimum value of Chi square for accepting in Publication?
Thank you all in advance.
Relevant answer
Answer
I'm afraid your question doesn't make a lot of sense. Could you be more specific eg. kind.of regression, type of DV, and the hypothesis of your chi square test. These should get us started..Best wishes, David Booth
  • asked a question related to Predictive Modeling
Question
3 answers
A case-control study has been used to construct a prediction model and there is no cohort study to calibrate it.
What solution do you suggest for calibration?
Relevant answer
Answer
Have you used the regression model? For example, logistic regression model?
  • asked a question related to Predictive Modeling
Question
4 answers
While working in both the software, after loading the training and validation data for the prediction of a single output using several input variables (say 10), it skips some of the inputs and delivered an explicit mathematical equation for future prediction of the specific parameter but it skips some of the input variables (say 2 or 3 or maybe greater). What criteria are these software uses in the back for picking the most influential parameters while providing a mathematical predictive model?
Relevant answer
Answer
First of all, was the fitness (error) zero (0) at the end of the evolution?
If yes, it means that the skipped variables are not important for the data being analyzed.
If not, it can either mean that some variables are not important or that the evolution is stuck in a local optimum.
Note, that for real-world data, it is unlikely to obtain fitness 0 because of noise or other imperfections (in data collection or measurement).
regards,
Mihai
  • asked a question related to Predictive Modeling
Question
3 answers
I have developed a logistic regression based prognostic model in Stata. Is there any way to develop an app using this logistic regression equation (from Stata)?
Most of the resources I found require me to develop the model from scratch in Python/R and then develop the app using streamlit/Shiny etc.
However, I am looking for a resource where I could use the coefficients and intercept values from Stata based model rather than build model from scratch in python.
Relevant answer
Answer
  • asked a question related to Predictive Modeling
Question
11 answers
Dear researchers
When the data set for a particular prediction is unbalanced( negative class is more than positive class OR negative class is less than positive class), Which evaluation parameter is appropriate to evaluate the performance of the predictive model? Why?
These parameters are including Specificity(SPE), Sensitivity(SEN), Receiver operating characteristics (ROC), Area Under Curve (AUC or AUROC), Accuracy(ACC), and f-measure(F-M)
Thanks for your guidance
I am waiting for your answer
Relevant answer
Answer
I suggest Precision-Recall Metrics. The F-Measure is a well-known statistic for unbalanced classification.
Traditional model assessment approaches, on the other hand, do not effectively quantify model performance when confronted with unbalanced datasets. Standard classifier techniques, such as Decision Tree and Logistic Regression, favor classes with a large number of occurrences. They are only good at predicting data from the majority of classes.
  • asked a question related to Predictive Modeling
Question
5 answers
I have the following dataset:
SQ - SEX - Weight - letter - duration - trail - quantity
1 - male - 15KG - abc - Year 1 - 2 - quantity 1
- Year 2 - 3 - quantity 2
2 female - 17KG - cde - Year X - 4 - quantityx
- 16KG - Year Y - 6 - quantityy
- Year Z - 3 - quantityz
.... etc...
I want to make a prediction model that predict the quantity, but using classic machine learning models ( not deep learning ones, like LSTM or RNN ), i.e. linear regression, SVM , .. such that:
predict quantity of n individual at a certain duration ( duration A) what will be the quantity ?
n - male - 25KG - xlm - 34 - A - ?
What is the best was to treat and pre-process duration , trail and quantity features before fitting them to preserve their correlation with the target quantity ?
Relevant answer
Answer
Aggrigation with rolling window may help you to rearrange your column values according
  • asked a question related to Predictive Modeling
Question
5 answers
I build an predictive machine learning model that generate the probability to default over the next 2 years for all the companies in a specific country. I used for training the algorithms financial data for all these companies and also the NACE codes (domains of activity) and I'm wondering if I will develop a better model if I somehow segment the population of B2B in segments and run distinct models on these segments.
Hope you can advice!
Lots of thanks in advance!
Relevant answer
Answer
You can work on different aspects starting as cited A demographic approach and it could be also a geographic one depending on what information the dataset includes also you can try to identify behavioral patterns within your data. Also, you could go further by focusing on Customer capabilities & the Nature of the existing relationships.
  • asked a question related to Predictive Modeling
Question
2 answers
I read some papers of risk prediction models, and generally there are three types of measures of the performance: a) Total model performance (e.g. R^2); b) Discrimination; c) Calibration.
However, many papers of risk prediction model only reported the Discrimination and Calibration, seldom reported the R^2. What confused me is that if R^2 is important in risk prediction models. And is there a threshold for the minimal value of R^2 in such models?
Relevant answer
Answer
Thanks Jim for your good answer, the example is really impressive.
Best Regards
Chang
  • asked a question related to Predictive Modeling
Question
6 answers
I previously installed minitab software. Tried to use it on non-linear regression but having trouble to find good starting values during the curve fitting process.
(primary models; Gompertz, Logistic, Weibull, Baranyi)
Relevant answer
Answer
Hi,
There are several Shiny App based curve fitting softwares that you can use, but the best one woudl depend on your usage.
https://microrisklab.shinyapps.io/english/ - is a great app with most models incorporated into it
https://foodmicrowur.shinyapps.io/biogrowth/ - another great software that has most of the models incorporated in it and allows simulation based predictions
IPMP Global Fit https://www.ars.usda.gov/northeast-area/wyndmoor-pa/eastern-regional-research-center/docs/ipmp-global-fit/- allows one step fitting of both primary and secondary models. But is limited compared to the previous two suggestions.
https://ubarron.shinyapps.io/CardinalFit/ - can be used to only it cardinal parameter model
Depending on your usage you can use the DMFit software,
I would recommend the first two. You will still have to guess the starting values in certain cases, but it is much easy to play around. For optimal results you can take the starting values obtained from these softwares and try fitting it with the nls package in R, curve fitting tool on MATLAB or minitab to see how they compare.
Hope this helps.
  • asked a question related to Predictive Modeling
Question
16 answers
The goal of predictive analysis is to develop predictions for the development of complex, multifaceted processes in various fields of science, industry, economy or other spheres of human activity. In addition, predictive analysis may refer to objectively performing processes such as natural phenomena, climate change, geological, cosmic etc.
Predictive analysis should be based on taking into account in the analytical methodology possible the most modern prognostic models and a large amount of data necessary to perform the most accurate predictive analysis. In this way, the result of the prediction analysis performed will be the least subject to the risk of analytical error, ie an incorrectly designed forecast.
Predictive analysis can be improved by using computerized modern information technologies, which include computing in the cloud of large data sets stored in Big Data database systems. In the predictive analysis, Business Intelligence analytics and other innovative information technologies typical of the current fourth technological revolution, known as Industry 4.0, can also be used.
The current technological revolution known as Industry 4.0 is motivated by the development of the following factors:
Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies. On the basis of the development of the new technological solutions in recent years, dynamically developing processes of innovatively organized analyzes of large information sets stored in Big Data database systems and computing cloud computing for the needs of applications in such areas as: machine learning, Internet of Things, artificial intelligence, Business Intelligence are dynamically developing.
For the abovementioned application examples, one can add predictive analyzes of subsequent, other fields of application of advanced technologies for the analysis of large data sets such as Medical Intelligence, Life Science, Green Energy, etc. Processing and multi-criteria analysis of large data sets in Big Data database systems is carried out according to V4 concepts, ie Volume (meaning a large number of data), Value (large values of certain parameters of the analyzed information), Velocity (high speed of new information) and Variety (high variety of information).
The advanced information processing and analysis technologies mentioned above are used more and more often for the needs of conducting predictive analyzes concerning, for example, marketing activities of various business entities that advertise their offer on the Internet or analyze the needs in this area reported by other entities, including companies, corporations, institutions financial and public. More and more commercial business entities and financial institutions conduct marketing activities on the Internet, including on social media portals.
More and more public institutions and business entities, including companies, banks and other entities, need to conduct multi-criteria analyzes on large data sets downloaded from the Internet describing the markets on which they operate, as well as contractors and clients with whom they cooperate.
On the other hand, there are already specialized technology companies that offer this type of analytical services, including offering predictive analysis services, develop custom reports, which are the result of multicriteria analyzes of large data sets obtained from various websites and from entries and comments. contained on social media portals based on sentiment analyzes of the content of entries in the comments of Internet users.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
How can you improve the process of predictive analysis?
Please reply
I invite you to discussion and scientific cooperation
Dear Colleagues and Friends from RG
The key aspects and determinants of the applications of modern computerized information technologies for data processing in Big Data and Business Intelligence database systems for the purpose of conducting predictive analyzes are described in the following publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Dear Alexander Kolker,
Thank you very much for your answer and pointing to the important aspects of predictive analytics in business and the use of Big Data Analytics in these analyzes.
Thank you very much,
Best regards,
Dariusz Prokopowicz
  • asked a question related to Predictive Modeling
Question
3 answers
We are trying to calibrate/validate our predictive productivity/erosion model (APEX) against multi-year productivity data for Kenya and Namibia. Erosion measurements also needed. FAO datasets do not provide raw values, so are not good for us. We are looking for ground measurements in any location throughout either Kenya or Namibia. Can anyone help? Thanks.
We are trying to calibrate/validate our predictive productivity/erosion model (APEX) against multi-year productivity data for Kenya and Namibia. Erosion measurements also needed. FAO datasets do not provide raw values, therefore they are not good for us. We are looking for ground measurements in any location throughout either Kenya or Namibia. Can anyone help? Thanks.
Relevant answer
Answer
This is a good question.
  • asked a question related to Predictive Modeling
Question
3 answers
How to introduce a covariate in a multivariable logistic prediction model when the event of interest always occurs in individuals exposed to this covariate. It is possible to determine the adjusted odds ratio with their 95% confidence interval?
Relevant answer
Answer
I attach information to exemplify the problem.
  • asked a question related to Predictive Modeling
Question
5 answers
I almost asked this as a "technical question" but I think it's more of a discussion topic. Let me describe where I get lost in this discussion, and what I'm seeing in practice. For context, I'm a quantitative social scientist, not "real statistician" nor a data scientist per se. I know how to run and interpret model results, and a little statistical theory, but I wouldn't call myself anything more than an applied statistician. So take that into account as you read. The differences between "prediction-based" modeling goals and "inference-based" modeling goals are just starting to crystalize for me, and I think my background is more from the "inference school" (though I wouldn't have thought to call it that until recently). By that I mean, I'm used to doing theoretically-derived regression models that include terms that can be substantively interpreted. We're interested in the regression coefficient or odds ratio more than the overall fit of the model. We want the results to make sense with respect to theory and hypotheses, and provide insight into the data generating (i.e., social/psychological/operational) process. Maybe this is a false dichotomy for some folks, but it's one I've seen in data science intro materials.
The scenario: This has happened to me a few times in the last few years. We're planning a regression analysis project and a younger, sharper, data-science-trained statistician or researcher suggests that we set aside 20% (or some fraction like that) of the full sample as test sample, develop the model on our test sample, and then validate the model on the remaining 80% (validation).
Why I don't get this (or at least struggle with it): My first struggle point is a conceptual/theoretical one. If you use a random subset of your data, shouldn't you get the same results on that data as you would with the whole data (in expectation) for the same reason you would with a random sample from anything? By that I mean, you'd have larger variances and some "significant" results won't be significant due to sample size of course, but shouldn't any "point estimates" (e.g., regression coefficients) be the same since it's a random subset? In other words, shouldn't we see all the same relationships between variables (ignoring significance)? If the modeling is using significance as input to model steps (e.g., decision trees), that could certainly lead to a different final model. But if you're just running a basic regression, why would anyone do this?
There are also some times when a test sample just isn't practical (i.e., a data set of 200 cases). And sometimes it's impractical because there just isn't time to do it. Let's set those aside for the discussion.
Despite my struggles, there are some scenarios where the "test sample" approach makes sense to me. On a recent project we were developing relatively complex models, including machine learning models, and our goal was best prediction across methods. We wanted to choose which model predicted the outcome best. So we used the "test and validate" approach. But I've never used it on a theory/problem-driven study where we're interested in testing hypotheses and interpreting effect sizes (even when I've had tens of thousands of cases in my data file). It always just seems like a step that gets in the way. FWIW, I've been discussing this technique in terms of data science, but I first learned about it when learning factor analysis and latent variable models. The commonality is how "model-heavy" these methods are relative to other kinds of statistical analysis.
So...am I missing something? Being naive? Just old-fashioned?
If I could phrase this as a question, it's "Why should I use test and validation samples in my regression analyses? And please answer with more than 'you might get different results on the two samples' " :)
Thanks! Looking forward to your insights and perspective. Open to enlightenment! :)
Relevant answer
Answer
As was mentioned above by Alexander Kolker , there are actually two scenarios for using your models. If you want to assess the reliability and the operability of your model or algorithm that produces predictions, you need to compare predictions and actuals. The design of this comparison should be as close as possible to what you'd like to implement in practice. There are also different setups that can be used, e.g., the rolling-origin evaluation when you re-estimate coefficients each time a new observation appears. When it gets to multiple objects/time series, the task becomes a bit more difficult as you need to use more sophisticated error metrics to evaluate your models. Please let me recommend our recent work on this topic where we consider the rolling-origin setup:
The key element in model validation is the choice of metrics or indicators to compare alternative models. For regression analysis if you use the test/validation approach it is usually MAE or MSE, but see the notes in the article above how to choose between the MAE and MSE. In the article above we also recommend the use of metrics for bias, such as the mean error (ME), median error (MdE), and the Overestimation Percentage corrected (OPc), which is the new we metric proposed. The idea is that if you obtain biased predictions, you theoretically can improve them and your model is not optimal.
But if you do not want to split your data, you can use information criteria, such as AICc or BIC. This then corresponds to the second scenario described by Alexander Kolker .
  • asked a question related to Predictive Modeling
Question
4 answers
Hello everyone, I am currently working on a predictive maintenance model to detect possible failures in a specific system or equipment. I wanted to ask you if you know where I can get real data from a device from a normal operating state to a failed state, such as valves, motors, switches, etc.
Relevant answer