Science topics: Data MiningPrediction
Science topic
Prediction - Science topic
Explore the latest questions and answers in Prediction, and find Prediction experts.
Questions related to Prediction
I have been studying a particular set of issues in methodology, and looking to see how various texts have addressed this. I have a number of sampling books, but only a few published since 2010, with the latest being Yves Tille, Sampling and Estimation from Finite Populations, 2020, Wiley.
In my early days of survey sampling, William Cochran's Sampling Techniques, 3rd ed, 1977, Wiley, was popular. I would like to know which books are most popularly used today to teach survey sampling (sampling from finite populations).
I posted almost exactly the same message as above to the American Statistical Association's ASA Connect and received a few recommendations, notably Sampling: Design and Analysis, Sharon Lohr, whose 3rd ed, 2022, is published by CRC Press. Also, of note was Sampling Theory and Practice, Wu and Thompson, 2020, Springer.
Any other recommendations would also be appreciated.
Thank you - Jim Knaub
I encountered an unusual observation while constructing a nomogram using the rms package with the Cox proportional hazards model. Specifically, when Karnofsky Performance Status (KPS) is used as a alone predictor, the nomogram points for KPS decrease from high to low. However, when KPS is combined with other variables in a multivariable model, the points for KPS increase from low to high. Additionally, I've noticed that the total points vary from low to high for all variables, while the 1-year survival probability shifts from high to low.
Could anyone help clarify why this directional shift in points occurs? Are there known factors, such as interactions, scaling differences, or confounding effects, that might explain this pattern?
According to event segmentation theory, high prediction errors let us perceive when there is an event boundary. While pattern separation is a mechanism that provides a way to distinguish similar memories/events from each other, while pattern completion is a mechanism that aids in retrieving complete memories/events from partial memories.
Is it fair to assume that a high prediction error causes pattern separation at event boundaries, while a low prediction error causes pattern completion within events.
Any feedback would be amazing
Modern physics because afterlife prediction is new. More specifically, exact and concrete quantum mechanics.
The afterlife is so unpredictable, empiricism is more accurate than rationalism. https://www.researchgate.net/publication/381108355_Quantum_mechanicsmore_exact_would_predict_the_afterlife_more_accurately_than_relativity_more_theoretical
Of course I sometimes doubt the afterlife is eternal salvation for all, so, I live and deduce what it might be...
A long pending challenge to predict in advance the possible earthquakes on this planet, so that our system can take appropriate corrective action to reduce the disaster of human life.
Research all over the globe continuously in progress. Global Society wish to know the status on this important issue.
What are the possibilities of applying AI-based tools, including ChatGPT and other AI applications in the field of predictive analytics in the context of forecasting economic processes, trends, phenomena?
The ongoing technological advances in ICT and Industry 4.0/5.0, including Big Data Analytics, Data Science, cloud computing, generative artificial intelligence, Internet of Things, multi-criteria simulation models, digital twins, Blockchain, etc., make it possible to carry out advanced data processing on increasingly large volumes of data and information. The aforementioned technologies contribute to the improvement of analytical processes concerning the operation of business entities, including, among others, in the field of Business Intelligence, economic analysis as well as in the field of predictive analytics in the context of forecasting processes, trends, economic phenomena. In connection with the dynamic development of generative artificial intelligence technology over the past few quarters and the simultaneous successive increase in the computing power of constantly improved microprocessors, the possibilities of improving predictive analytics in the context of forecasting economic processes may also grow.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
What are the possibilities of applying AI-based tools, including ChatGPT and other AI applications for predictive analytics in the context of forecasting economic processes, trends, phenomena?
What are the possibilities of applying AI-based tools in the field of predictive analytics in the context of forecasting economic processes?
And what is your opinion on this topic?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Can the application of artificial intelligence and Big Data Analytics technologies help improve system energy security management processes and enhance this security?
Probably yes if the issue of new green technologies, the development of emission-free clean energy is a priority in the energy policy shaped by the government. Efficient application of artificial intelligence and Big Data Analytics technologies can help improve system energy security management processes and increase this security. However, it is crucial to effectively combine the functionality of artificial intelligence and Big Data Analytics technologies and efficiently apply these technologies to manage the risk of energy emergencies, analyze the determinants shaping the development of energy and energy production, analyze the factors shaping the level of energy security, and forecast future energy production in the context of forecasting changes in the level of energy demand, energy production from specific types of energy sources and the possibility of energy production from specific types of energy sources determined by specific determinants.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
Can the application of artificial intelligence and Big Data Analytics technologies help improve the processes of systemic energy security management and enhance this security?
Can artificial intelligence and Big Data Analytics help improve systemic energy security management processes?
And what is your opinion on this topic?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Is it possible to build a highly effective forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science analytics, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies?
Is it possible to build a highly effective, multi-faceted, intelligent forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science analytics, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies as part of a forecasting system for complex, multi-faceted economic processes in such a way as to reduce the scale of the impact of the paradox of a self-fulfilling prediction and to increase the scale of the paradox of not allowing a predicted crisis to occur due to pre-emptive anti-crisis measures applied?
What do you think about the involvement of artificial intelligence in combination with Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies for the development of sophisticated, complex predictive models for estimating current and forward-looking levels of systemic financial, economic risks, debt of the state's public finance system, systemic credit risks of commercially operating financial institutions and economic entities, forecasting trends in economic developments and predicting future financial and economic crises?
Research and development work is already underway to teach artificial intelligence to 'think', i.e. the conscious thought process realised in the human brain. The aforementioned thinking process, awareness of one's own existence, the ability to think abstractly and critically, and to separate knowledge acquired in the learning process from its processing in the abstract thinking process in the conscious thinking process are just some of the abilities attributed exclusively to humans. However, as part of technological progress and improvements in artificial intelligence technology, attempts are being made to create "thinking" computers or androids, and in the future there may be attempts to create an artificial consciousness that is a digital creation, but which functions in a similar way to human consciousness. At the same time, as part of improving artificial intelligence technology, creating its next generation, teaching artificial intelligence to perform work requiring creativity, systems are being developed to process the ever-increasing amount of data and information stored on Big Data Analytics platform servers and taken, for example, from selected websites. In this way, it may be possible in the future to create "thinking" computers, which, based on online access to the Internet and data downloaded according to the needs of the tasks performed and processing downloaded data and information in real time, will be able to develop predictive models and specific forecasts of future processes and phenomena based on developed models composed of algorithms resulting from previously applied machine learning processes. When such technological solutions become possible, the following question arises, i.e. the question of taking into account in the built intelligent, multifaceted forecasting models known for years paradoxes concerning forecasted phenomena, which are to appear only in the future and there is no 100% certainty that they will appear. Well, among the various paradoxes of this kind, two particular ones can be pointed out. One is the paradox of a self-fulfilling prophecy and the other is the paradox of not allowing a predicted crisis to occur due to pre-emptive anti-crisis measures applied. If these two paradoxes were taken into account within the framework of the intelligent, multi-faceted forecasting models being built, their effect could be correlated asymmetrically and inversely proportional. In view of the above, in the future, once artificial intelligence has been appropriately improved by teaching it to "think" and to process huge amounts of data and information in real time in a multi-criteria, creative manner, it may be possible to build a highly effective, multi-faceted, intelligent forecasting system for future financial and economic crises based on artificial intelligence technology, a system for forecasting complex, multi-faceted economic processes in such a way as to reduce the scale of the impact of the paradox of a self-fulfilling prophecy and increase the scale of the paradox of not allowing a predicted crisis to occur due to pre-emptive anti-crisis measures applied. In terms of multi-criteria processing of large data sets conducted with the involvement of artificial intelligence, Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4. 0 technologies, which make it possible to effectively and increasingly automatically operate on large sets of data and information, thus increasing the possibility of developing advanced, complex forecasting models for estimating current and future levels of systemic financial and economic risks, indebtedness of the state's public finance system, systemic credit risks of commercially operating financial institutions and economic entities, forecasting economic trends and predicting future financial and economic crises.
In view of the above, I address the following questions to the esteemed community of scientists and researchers:
Is it possible to build a highly effective, multi-faceted, intelligent forecasting system for future financial and economic crises based on artificial intelligence technology in combination with Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies in a forecasting system for complex, multi-faceted economic processes in such a way as to reduce the scale of the impact of the paradox of the self-fulfilling prophecy and to increase the scale of the paradox of not allowing a forecasted crisis to occur due to pre-emptive anti-crisis measures applied?
What do you think about the involvement of artificial intelligence in combination with Data Science, Big Data Analytics, Business Intelligence and/or other Industry 4.0 technologies to develop advanced, complex predictive models for estimating current and forward-looking levels of systemic financial risks, economic risks, debt of the state's public finance system, systemic credit risks of commercially operating financial institutions and economic entities, forecasting trends in economic developments and predicting future financial and economic crises?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
Hello everyone,
I have a question regarding the use of the ANN tool in Matlab. I'm wondering if there is a specific model or final predictive equation that is generated when utilizing this tool. Any insights would be greatly appreciated. Thank you.
Performance prediction is required to optimally deploy workloads and inputs to a particular machine/accelerator in computing systems. Different predictors (e.g. AI predictors) come with different trade-offs, such as complexity, accuracy, and overheads. Which ones are the best?
In 2007 I did an Internet search for others using cutoff sampling, and found a number of examples, noted at the first link below. However, it was not clear that many used regressor data to estimate model-based variance. Even if a cutoff sample has nearly complete 'coverage' for a given attribute, it is best to estimate the remainder and have some measure of accuracy. Coverage could change. (Some definitions are found at the second link.)
Please provide any examples of work in this area that may be of interest to researchers.
Can artificial intelligence already predict our consumer behaviour and in a short while will it be able to predict which shop we will go to and what we will buy tomorrow?
With the help of artificial intelligence, how can systems for monitoring citizens' consumer behaviour based on GPS geolocalisation and information contained in smartphones be improved?
The lockdowns and national quarantines introduced during the coronavirus pandemic (Covid-19) caused a strong decline in sales and turnover generated in traditionally, physically functioning shops and service establishments. The lockdowns imposed on selected service industries and on traditionally operated trade also resulted in an acceleration of e-commerce, the sale of products and services conducted via the Internet. When the coronavirus pandemic was no longer interpreted in terms of high health and economic risk, a significant proportion of traditionally operated trade and physical service establishments also returned to traditionally operated business, customer service, product or service sales. On the other hand, emerging new ICT and Industry 4.0 solutions are being implemented and support the economic activities of companies, enterprises, service establishments and shops producing and/or offering their products or services in both traditional and Internet-based formats. when the pandemic was considered to be over and did not generate major risks for the economic activities of service establishments and shops, new ICT and Industry 4. 0, including artificial intelligence technologies, are being implemented in information systems to support the sales processes of product or service offerings, including improving tools for activating potential consumers, getting customers interested in new product or service offerings, and encouraging customers to visit stationary shops and service establishments. In this regard, startups have been rapidly developing over the past few years, which, using anonymous mobile user identifiers and accurate location and internet user data available in various applications installed on smartphones, are able to precisely locate where a smartphone user is at any given time and diagnose whether he or she is by chance making a purchase in a specific stationary shop, walking down the street passing by an establishment providing specific services and perhaps considering using those services. In a situation where a technology start-up has data on a specific Internet user downloaded from a number of different Internet applications and, on the basis of this data collected on Big Data Analytics information processing and analysis platforms, has drawn up information-rich characteristics of the interests and purchasing preferences of a kind of digital avatar equivalent to a specific Internet user, then, in combination with analysis of current customer behaviour and GPS-based geolocalisation, it is able to make real-time predictions about the subsequent behaviour and/or purchasing decisions of individual potential customers of specific product or service offerings. Some technology start-ups conducting this kind of analytics based on large sets of customer data and on geolocalisation, use of specific apps and social media available on the smartphone and knowledge of the psychology of consumer behaviour are first able to precisely locate consumers in real time with reference to specific shops, service establishments, etc. They are able to firstly locate consumers in real time and precisely identify specific shops, service providers, etc., and then display information on advertising banners appearing in specific applications on the smartphone about the current offer, including a price or other promotion for a specific product available for sale in the shop where the Internet user and potential customer is currently located. Thanks to this type of technological solutions, more and more often an Internet user available on a smartphone in a situation when he/she is in the vicinity, next to specific stands, shop shelves, specific shops in shopping centres, and is thinking about buying a specific product, then at that moment he/she receives information on the smartphone, an advertisement appears with information on a price or other promotion concerning that particular product or a similar, highly substitutable product. At the aforementioned point in time when the customer is in a specific shop or part of a shop, online advertisements are displayed on his or her smartphone, e.g. on social media, the Google ecosystem, third-party web browsers or other applications that the potential customer has installed on his or her smartphone.
When such technological solutions are complemented by artificial intelligence analysing the consumer behaviour of individual customers of different product and service offers, it is possible to create intelligent analytical systems capable of predicting who will visit a specific shop, when they will do so and what they plan to buy in that shop. Statistically, a citizen has several applications installed in his or her smartphone, which provide the technology-based analytical companies with data about their current location. Therefore, thanks to the use of artificial intelligence, it may not be long before Internet users receive messages, see online advertisements displayed on their smartphones showing the products and services they are about to buy or think about tomorrow. Perhaps the artificial intelligence involved in this kind of analytics is already capable of predicting our consumer behaviour in real time and will soon be able to predict which shop we will go to and what we will buy tomorrow.
In view of the above, I would like to address the following question to the esteemed community of scientists and researchers:
With the help of artificial intelligence, how can monitoring systems for citizens' consumer behaviour based on GPS geolocation and information contained in smartphones be improved?
Can artificial intelligence already predict our consumer behaviour and in a few moments will it be able to predict which shop we will go to and what we will buy tomorrow?
Can artificial intelligence already predict our consumer behaviour?
What do you think about this topic?
What is your opinion on this subject?
Please answer,
I invite you all to discuss,
The above text is entirely my own work written by me on the basis of my research.
I have not used other sources or automatic text generation systems such as ChatGPT in writing this text.
Copyright by Dariusz Prokopowicz
Thank you very much,
Best regards,
Dariusz Prokopowicz
A number of people have asked on ResearchGate about acceptable response rates and others have asked about using nonprobability sampling, perhaps without knowing that these issues are highly related. Some ask how many more observations should be requested over the sample size they think they need, implicitly assuming that every observation is at random, with no selection bias, one case easily substituting for another.
This is also related to two different ways of 'approaching' inference: (1) the probability-of-selection-based/design-based approach, and (2) the model-based/prediction-based approach, where "prediction" means estimation for a random variable, not forecasting.
Many may not have heard much about the model-based approach. For that, I suggest the following reference:
Royall(1992), "The model based (prediction) approach to finite population sampling theory." (A reference list is found below, at the end.)
Most people may have heard of random sampling, and especially simple random sampling where selection probabilities are all the same, but many may not be familiar with the fact that all estimation and accuracy assessments would then be based on the probabilities of selection being known and consistently applied. You can't take just any sample and treat it as if it were a probability sample. Nonresponse is therefore more than a problem of replacing missing data with some other data without attention to "representativeness." Missing data may be replaced by imputation, or by weighting or reweighting the sample data to completely account for the population, but results may be degraded too much if this is not applied with caution. Imputation may be accomplished various ways, such as trying to match characteristics of importance between the nonrespondent and a new respondent (a method which I believe has been used by the US Bureau of the Census), or, my favorite, by regression, a method that easily lends itself to variance estimation, though variance in probability sampling is technically different. Weighting can be adjusted by grouping or regrouping members of the population, or just recalculation with a changed number, but grouping needs to be done carefully.
Recently work has been done which uses covariates for either modeling or for forming pseudo-weights for quasi-random sampling, to deal with nonprobability sampling. For reference, see Elliott and Valliant(2017), "Inference for Nonprobability Samples," and Valliant(2019), "Comparing Alternatives for Estimation from Nonprobability Samples."
Thus, methods used for handling nonresponse, and methods used to deal with nonprobability samples are basically the same. Missing data are either imputed, possibly using regression, which is basically also the model-based approach to sampling, working to use an appropriate model for each situation, with TSE (total survey error) in mind, or weighting is done, which attempts to cover the population with appropriate representation, which is mostly a design-based approach.
If I am using it properly, the proverb "Everything old is new again," seems to fit here if you note that in Brewer(2014), "Three controversies in the history of survey sampling," Ken Brewer showed that we have been all these routes before, leading him to have believed in a combined approach. If Ken were alive and active today, I suspect that he might see things going a little differently than he may have hoped in that the probability-of-selection-based aspect is not maintaining as much traction as I think he would have liked. This, even though he first introduced 'modern' survey statistics to the model-based approach in a paper in 1963. Today it appears that there are many cases where probability sampling may not be practical/feasible. On the bright side, I have to say that I do not find it a particularly strong argument that your sample would give you the 'right' answer if you did it infinitely many times when you are doing it once, assuming no measurement error of any kind, and no bias of any kind, so relative standard error estimates there are of great interest, just as relative standard error estimates are important when using a prediction-based approach, and the estimated variance is the estimated variance of the prediction error associated with a predicted total, with model misspecification as a concern. In a probability sample, if you miss an important stratum of the population when doing say a simple random sample because you don't know the population well, you could greatly over- or underestimate a mean or total. If you have predictor data on the population, you will know the population better. (Thus, some combine the two approaches: see Brewer(2002) and Särndal, Swensson, and Wretman(1992).)
..........
So, does anyone have other thoughts on this and/or examples to share for this discussion: Comparison of Nonresponse in Probability Sampling with Nonprobability Sampling?
..........
Thank you.
References:
Brewer, K.R.W.(2002), Combined Survey Sampling Inference: Weighing Basu's Elephants, Arnold: London and Oxford University Press
Brewer, K.R.W.(2014), "Three controversies in the history of survey sampling," Survey Methodology, Dec 2013 - Ken Brewer - Waksberg Award:
Elliott, M.R., and Valliant, R.(2017), "Inference for Nonprobability Samples," Statistical Science, 32(2):249-264,
https://www.researchgate.net/publication/316867475_Inference_for_Nonprobability_Samples, where the paper is found at
https://projecteuclid.org/journals/statistical-science/volume-32/issue-2/Inference-for-Nonprobability-Samples/10.1214/16-STS598.full (Project Euclis, Open Access).
Royall, R.M.(1992), "The model based (prediction) approach to finite population sampling theory," Institute of Mathematical Statistics Lecture Notes - Monograph Series, Volume 17, pp. 225-240. Information is found at
https://www.researchgate.net/publication/254206607_The_model_based_prediction_approach_to_finite_population_sampling_theory, but not the paper.
The paper is available under Project Euclid, open access:
Särndal, C.-E., Swensson, B., and Wretman, J.(1992), Model Assisted Survey Sampling, Springer-Verlang
Valliant, R.(2019), "Comparing Alternatives for Estimation from Nonprobability Samples," Journal of Survey Statistics and Methodology, Volume 8, Issue 2, April 2020, Pages 231–263, preprint at
I am trying to use machine learning algorithms to predict whether a pipe has broken or not and I also want to predict the time to failure of a particular pipe. So, I need a dataset that contains the pipe installation year, the date of recorded failure for failed pipes and also some other parameters such as pipe length, operating pressure, type of material and pipe diameter among others.
In my country, more than a dozen years ago or more, there were real winters with snow and frost after the autumn. Whereas last winter, during the last few years it looked like autumn, without snow and positive temperatures. I think that the greenhouse effect, ie the warming of the Earth's climate, has already begun. This is also confirmed by numerous climatic cataclysms and weather anomalies, which in the current year 2018 appear in numerous places on the Earth. In some parts of the Earth there are fires of huge forest areas such as in Scandinavia, California in the USA, Australia, the Iberian Peninsula, Africa, etc. In addition, weather anomalies, e.g. snow and floods in October and November in the south of Europe.
In addition, tornadoes in many places on Earth and so on.
Perhaps these problems will get worse. It is necessary to improve security systems and anti-crisis services, improve the prediction of these anomalies and climatic cataclysms so that people can, have managed to shelter or cope with the imminent cataclysm. One of the technologies that can help in more precise forecasting of these cataclysms is the processing of large collections of historical and current information on this subject in the cloud computing technology in Big Data database systems.
Therefore, I am asking you: Will new data processing technologies in Big Data database systems allow for accurate prediction of climate disasters?
Please, answer, comments. I invite you to the discussion.
At the US Energy Information Administration (EIA), for various establishment surveys, Official Statistics have been generated using model-based ratio estimation, particularly the model-based classical ratio estimator. Other uses of ratios have been considered at the EIA and elsewhere as well. Please see
At the bottom of page 19 there it says "... on page 104 of Brewer(2002) [Ken Brewer's book on combining design-based and model-based inferences, published under Arnold], he states that 'The classical ratio estimator … is a very simple case of a cosmetically calibrated estimator.'"
Here I would like to hear of any and all uses made of design-based or model-based ratio or regression estimation, including calibration, for any sample surveys, but especially establishment surveys used for official statistics.
Examples of the use of design-based methods, model-based methods, and model-assisted design-based methods are all invited. (How much actual use is the GREG getting, for example?) This is just to see what applications are being made. It may be a good repository of such information for future reference.
Thank you. - Cheers.
I am working on landslide hazard and risk zonation. I trained some of factors for landslide in python/R and SPSS. I have calculated ROC/AUC and confusion matrix of the model. I want to get a solution about how can I generate the final Landslide prediction maps from those trained and evaluated Machine Learning (ML) models?
I use a conditional logit model with income, leisure time and interaction terms of the two variables with other variables (describing individual's characteristics) as independent variables.
After running the regression, I use the predict command to obtain probabilities for each individual and category. These probabilities are then multiplied with the median working hours of the respective categories to compute expected working hours.
The next step is to increase wage by 1%, which increases the variable income by 1% and thus also affects all interaction terms which include the variable income.
After running the modified regression, again I use the predict command and should obtain slightly different probabilities. My problem is now that the probabilities are exactly the same, so that there would be no change in expected working hours, which indicates that something went wrong.
On the attached images with extracts of the two regression outputs one can see that indeed the regression coefficients of the affected variables are very, very similar and that both the value of the R² and the values of the log likelihood iterations are exactly the same. To my mind these observations should explain why probabilities are indeed very similar, but I am wondering why they are exactly the same and what I did possibly wrong. I am replicating a paper where they did the same and where they were able to compute different expected working hours for the different scenarios.
I just starting to try out the google colab version for the alphafold2 for protein 3D structure prediction via this link:
Pretty much a newbie, so still trying to figure out how best to interpret the results and put them into proper words for a report/presentation. Also, is there a way to download the predicted 3D structure that is displayed?
Thanks in advance.
Like protein metal predictor or simulation programs
I have recently been working on using machine learning for yield prediction, however, I was exploring what inputs would be better at predicting yield. I am confused by only three papers that use historical yields as an input to predict yields for the new year. From the test results this does improve the prediction accuracy substantially. But does this count as data leakage? If not, what is the rationale for doing so? What are the limitations? (It seems that the three papers are from the same team.)
Three papers' links: https://www.sciencedirect.com/science/article/abs/pii/S0034425721001267
I am trying to tweaking my machine learning model optimizer, and i would love to test that in healthcare domain space, especially for rare illnesses.
Thus, do any one knows any deidentified electronic health records for Epilepsy, Parkinson , or other rare diseases patients (maybe those who are treated with warfarin) ?
Please guide me how to get these datasets.
I already spoke with many research authors, but yet no responses.
Hi, I want to predict post-transitional modification for phosphorylation. I found lots of websites like Phosida, PhosphoSite Plus. I am just curious about is there any python code for this phosphorylation prediction. If you have, could you share the GitHub link?
i have predicted the solubility of a compound using a webserver in units of mol/L and it was 0.00126 and i want to know whether this value means the compound is soluble in water or not and it would be better to compare it with other compounds in the market.
Thanks
I have been doing research on different issues in the Finance and Accounting discipline for about 5 years. It becomes difficult for me to find some topics which may lead me to do projects, a series of research articles, working papers in the next 5-10 years. There are few journals which have updated research articles in line with the current and future research demand. Therefore, I am looking for such journal(s) that can help me as a guide to design research project that can contribute in the next 5-10 years.
Dear collegues.
I would like to ask,if anybody works with neural networks,to check my loop for the test sample.
I've 4 sequences (with a goal to predict prov,monthly data,22 data in each sequence) and I would like to construct the forecast for each next month with using training sample size 5 months.
It means, I need to shift each time by one month with 5 elements:
train<-1:5, train<-2:6, train<-3:7...,train<-17:21. So I need to get 17 columns as a output result.
The loop is:
shift <- 4
number_forecasts <- 1
d <- nrow(maxmindf)
k <- number_forecasts
for (i in 1:(d - shift + 1))
{
The code:
require(quantmod)
require(nnet)
require(caret)
prov=c(25,22,47,70,59,49,29,40,49,2,6,50,84,33,25,67,89,3,4,7,8,2)
temp=c(22,23,23,23,25,29,20,27,22,23,23,23,25,29,20,27,20,30,35,50,52,20)
soil=c(676,589,536,499,429,368,370,387,400,423,676,589,536,499,429,368,370,387,400,423,600,605)
rain=c(7,8,2,8,6,5,4,9,7,8,2,8,6,5,4,9,5,6,9,2,3,4)
df=data.frame(prov,temp,soil,rain)
mydata<-df
attach(mydata)
mi<-mydata
scaleddata<-scale(mi$prov)
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
maxmindf <- as.data.frame(lapply(mydata, normalize))
go<-maxmindf
forecasts <- NULL
forecasts$prov <- 1:22
forecasts$predictions <- NA
forecasts <- data.frame(forecasts)
# Training and Test Data
trainset <- maxmindf()
testset <- maxmindf()
#Neural Network
library(neuralnet)
nn <- neuralnet(prov~temp+soil+rain, data=trainset, hidden=c(3,2), linear.output=FALSE, threshold=0.01)
nn$result.matrix
plot(nn)
#Test the resulting output
#Test the resulting output
temp_test <- subset(testset, select = c("temp","soil", "rain"))
head(temp_test)
nn.results <- compute(nn, temp_test)
results <- data.frame(actual = testset$prov, prediction = nn.results$net.result)
}
minval<-min(x)
maxval<-max(x)
minvec <- sapply(mydata,min)
maxvec <- sapply(mydata,max)
denormalize <- function(x,minval,maxval) {
x*(maxval-minval) + minval
}
as.data.frame(Map(denormalize,results,minvec,maxvec))
Could you tell me please,what can i add in trainset and testset (with using loop) and how to display all predictions using a loop so that the results are displayed with a shift by one with a test sample of 5?
I am very grateful for your answers
I would like to know whether there is a direct relationship between quantum computer technology and artificial intelligence. Can you provide your explanation with examples for more understanding?
Dear collegues.
I've 400 data (monthly) and I need to construct the forecast for each next month with using learning (training ) sample 50.
It means, I need to shift each time by one month with 50 elements.
train<-1:50, train<-2:51, train<-3:52,...,train<-351:400.
Could you tell me please,which function can I write in the program for automatic calculation?
Maybe, for() loop?
I am very grateful for your answers
I want to predict water in my project. I need to know which of them have more advantages.
There are different empirical equations and techniques like Fuzzy, ANN, etc.. for predicting Blast Induced Ground Vibration. In addition to these is there any software for predicting Blast Induced Ground Vibration
I have the following dataset:
SQ - SEX - Weight - letter - duration - trail - quantity
1 - male - 15KG - abc - Year 1 - 2 - quantity 1
- Year 2 - 3 - quantity 2
2 female - 17KG - cde - Year X - 4 - quantityx
- 16KG - Year Y - 6 - quantityy
- Year Z - 3 - quantityz
.... etc...
I want to make a prediction model that predict the quantity, but using classic machine learning models ( not deep learning ones, like LSTM or RNN ), i.e. linear regression, SVM , .. such that:
predict quantity of n individual at a certain duration ( duration A) what will be the quantity ?
n - male - 25KG - xlm - 34 - A - ?
What is the best was to treat and pre-process duration , trail and quantity features before fitting them to preserve their correlation with the target quantity ?
I am trying to predict peak demand using machine learning techniques. Current articles consider this as a time series prediction issue and consider a 7-day lag to predict peak demand. A ML model I am trying to apply considers new features for this prediction, and I applied it without a week prior value lag. I was challenged why I did not use lag values for time series prediction like this issue.
The objective of my project was to evaluate whether adding new features would improve the daily peak demand prediction and assess the effects of the new features. If I use new features to predict daily demand, should I also consider the previous seven days' lags as a new feature? Is it correct to combine several COVID-19 related features with the lag demand for peak demand prediction for an unstable situation like COVID-19?
Ps:
1- The model I used for prediction is LightGradient Boosting.
2- Data trained and tested during COVID-19 situation (2020 & 2021)
3- The weekly trends of my target value in 2020 and 2021 are as below figures.
I am using Qgis 2.8.3 version for molusce plugin to get the prediction landuse map. But got the error while creating a change map in area change tab.
Here attached the link for details.
Dated: 10-June-2020.
Perhaps!
Prefatory, it may be, because this year the radiations and greenhouse gases interaction feedback processes on different timescale (one of the main factor in monsoon dynamics) which makes the monsoon predictability erratic is not expected to add much uncertainty in the prediction system due to the substantial reduction in the greenhouse gas emissions. Implies, may be an upper hand for potential predictive models in the line. Recall that model ability to predict the SW monsoon is higher with initial conditions been used for the month of Feb., March, April (this years these are main lockdown month in the world when atmosphere is not invaded by atmospheric gases) than months closer to the SW monsoon. On other side, can be also be test bed for the models have near accurate long rage forecasting tendency with early months (as mentioned above) initial conditions.
Over all it may be also be manifested that NATURE can be predicted correctly if it is not disturbed. BUT if we keep on disturbing it then predictability may not be that easy and precise.
If yes, then "Commendations" to the accurate predictability of the monsoon system will be higher this year, I think. Good! This may also considered because of Nature natural tendency is higher this year apart from having well resolved and improved interannual and climate systems predictability aspects in the modelling systems, etc...
Nature is in NATURAL swing. Enjoy and try to be safe! But we should also be ready for the monsoon system predictability in the times to come or years to come when emissions will again be dumped in the earth system. It will certainly obstruct the prediction realities. Consistency is the accuracy in the prediction should be addressed responsibly.
What’s your take on that!
Let consider there is a selling factor like this:
Gender | Age | Street | Item 1 | Count 1 | Item 2 | Count 2 | ... | Item N | Count N | Total Price (Label)
Male | 22 | S1 | Milk | 2 | Bread | 5 | ... | - | - | 10 $
Female | 10 | S2 | Cofee | 1 | - | - | ... | - | - | 1 $
....
We want to predict the total price for a factor based on their buyer demographic information (like gender, age, job) and also their buying items and counts. It should be mentioned that we suppose that we don't know each item's price and also, the prices will be changed during the time (so, we although will have a date in our dataset).
Now it is the main question that how we can use this dataset that contains some transactional data (items) which their combination is not important. For example, if somebody buys item1 and item2, it is equal to other guys who buy item2 and item1. So, the values of our items columns should not have any differences for their value orders.
This dataset contains both multivariate and transactional data. My question is how can we predict the label more accurately?
Hi,
I am currently looking for a dataset in which I can get historical weather data (like temperature, precipitation, wind speed) for every day in every city from 2005 to today.
The data will be used for a prediction project.
Where can I find these kinds of data, or anything related?
Thank you very much.
p/s: To clarify, what I mean is I have a table with 2 columns, "date" and "city", and I want to fill the third(or how many it takes) column with weather information of that date+city combination. A lot of websites provide weather information but since my dataset is quite large, I need a way to automate the process, either a data set or a crawler-friendly website with enough information.
I am doing MS thesis. Title is "Time series crop yield estimation using satellite images". Below are my aims and objectives but supervisor said objectives are not correct. I dont know what should I change.Any one can help me to rewrite my objectives.
Aim: The aim of this study is to develop a model for wheat yield prediction using satellite imagery before the harvest time.
Objectives:
1.It is mandatory for the planners of any regime to have an accurate and precise estimate of a crop to cope with the shortage crises of the crop, as Pakistan faced a very serious crisis of wheat’s shortage in 2007
2.An accurate estimate of a crop gives a significant relief to the country’s exchequer in terms of saving foreign exchange
3.The main purpose of this research is, therefore, the scientific construction of a model employing all the information available via remote sensing in order to get a good and trustworthy estimate of wheat crops.
Which best way to classify table dataset using MATLAB?
How to predict or categorize text data using Convolutional Neural Network. Also, how to use deep learning for classification of text data in table dataset. (for example, numerical data or textual data)
Can we use regression or classification for prediction? Which is the best approach?
I have 27 features and I'm trying to predict continuous values. When I calculated the VIF (VarianceInflation Factors), only 8 features are less than 10 and the remaining features range from 10 to 250. Therefore, I am facing a multicollinearity issue.
My work is guided by two aims:
1- ML models should be used to predict the values using regression algorithms.
2- To determine the importance of features( interpreting the ML models).
A variety of machine learning algorithms have been applied, including Ridge, Lasso, Elastic Net, Random Forest Regressor, Gradient Boosting Regressor, and Multiple Linear Regression.
Random Forest Regressor and Gradient BoostingRegresso showing the best performance (Lowest RMSE), while using only 10 features (out of 27 features) based on the feature importance results.
As I understand it, if I face multicollinearity issues, I can fix them using regularized Regression models like LASSO. When I applied Lasso to my model, the evaluation result is not as good as Random Forest Regressor and Gradient BoostingRegresso. However, none of my coefficients become zero when I apply the feature importance.
Moreover, I want to analyse which feature is affecting my target value and I do not want to omit my features.
I was wondering if anyone could help me determine which of these algorithms would be good to use and why?
I calculated the Shapley values (using the R xgboost package, based on gbm regression) of several big actors in the cocoa market and received results which I cannot explain: it seems that Shapley increases (the trend, in general) for all of them. The same thing happened when I calculate it for other sectors.
Does it make sense? If it does, what stands behind these results?
If not, what could be my mistake?
Thanks a lot for any help!
Please suggest if any specific software is used.
In my current project, I want to answer if various cognition items (ratio, 30+ of them, may get reduced based on a separate factor analysis) predict moral outrage - in other words, do increases in item 1 (then item 2, item 3, etc) predict increases in outrage in a significant way. Normally, this would be a simple regression. But then I complicated my design, and I'm having a hard time wrapping my head around my potential analyses and whether it will actually answer my stated question, or if I'm over-thinking things.
Currently, I'm considering a set-up where participants will see a random selection of 3 vignettes (out of 5 options) and answer the cognition items and moral outrage about each. This complicates matters because 1) there is now a repeated measure component that may (or may not?) need to be accounted for and 2)I'm not sure how my analyses would work if the vignette selection is random (thus, all vignettes will show up the same number of times, but in different combinations to different people). I am anticipating that different vignettes will not be equal in their level of DV (which is on purpose - I want to see if these patterns are general, not just at very high or very low levels of outrage).
When originally designing this, I had wanted to average the 3 vignette scores together for each subject, treating them as single, averaged item values to use in a multiple regression. But I've been advised by a couple people that this isn't an option, because the variance between the vignettes needs to be accounted for (and the vignettes can't be shown to be equivalent, and thus can't be collapsed down in analysis).
One potential analysis to combat this is a nested, vignette-within-individual multilevel design, where I see if the pattern of cognition items to outrage is consistent between vignettes (level 1) and across subjects (level 2), to account for/examine any vignette-by-cognition/MO pattern interactions. And this makes sense, as MLMs can be used to compare patterns, rather than single scores.
But I can't wrap my head around what part of this set-up/the output I would look at to actually answer my question: generally, which, if any, of these cognition items predicts outrage (regardless of vignette, or across many scenarios)? And can this approach work when the vignettes combinations differ between subjects?
Or is this the incorrect analysis approach and another, simpler one would be more fitting? For example, is the averaging approach workable in another format? What if all vignettes were done by all subjects (more arduous on the subjects, but possible if the strength of the analysis/results would be compromised/overly-complicated)?
Confirmation that my current analysis approach will indeed work, help with what part of the output would answer my actual RQ, or suggestions for an alternative approach, would be appreciated.
Are you interested in the application of complex systems to the global history of humankind? I'm working on such a project, and I'm interested in discussions with like-minded people.
I published several articles on that in "The Complex Systems" journal (thecomplexsystems.com). A short overview of my work is in my blog (vtorvich.com) and the description of my book "Subsurface History of Humanity: Direction of History" on Amazon.
If artificial intelligence is implemented for the online mobile banking, can this banking segment be deprived of employing human capital altogether?
Please reply
Best wishes
After 30 years, much will change. 30 years is a long period for the continuation of the current fourth technological revolution, known as Industry 4.0.
The current technological revolution known as Industry 4.0 is motivated by the development of the following factors:
Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies.
On the basis of the development of the new technological solutions in recent years, dynamically developing processes of innovatively organized analyzes of large information sets stored in Big Data database systems and computing cloud computing for the needs of applications in such areas as: machine learning, Internet of Things, artificial intelligence are dynamically developing, Business Intelligence.
The development of information processing technology in the era of the current technological revolution defined by Industry 4.0 is determined by the application of new information technologies in the field of e-commerce and e-marketing.
Added to this are additional areas of application of advanced technologies for the analysis of large data sets, such as Medical Intelligence, Life Science, Green Energy, etc. Processing and multi-criteria analysis of large data sets in Big Data database systems is made according to the V4 concept, ie Volume (meaning number of data), Value (large values of specific parameters of the analyzed information), Velocity (high speed of new information appearing) and Variety (high variety of information).
The advanced information processing and analysis technologies mentioned above are used more and more often for marketing purposes of various business entities that advertise their offer on the Internet or analyze the needs in this area reported by other entities, including companies, corporations, financial and public institutions. More and more commercial business entities and financial institutions conduct marketing activities on the Internet, including on social media portals.
More and more companies, banks and other entities need to conduct multi-criteria analyzes on large data sets downloaded from the Internet describing the markets on which they operate, as well as contractors and clients with whom they cooperate. On the other hand, there are already specialized technology companies that offer this type of analytical services, develop customized reports that are the result of multicriteria analyzes of large data sets obtained from various websites and from entries and comments on social media portals.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
What are the known futurological visions of technology development until around 2050?
Please reply
I invite you to the discussion
Best wishes
My main goal is to use Neural Networks to forecast Sunspot Numbers. Requesting the option of ANN or RNN seems simple enough. However, which is best to learn and utilize for a complete beginner? If there is a GitHub repository for similar Space Science topics based on Neural Networks, please link me to it. I'd be extremely appreciative.
Will the development of computerized business analytics of large collections of economic information collected in Big Data database systems improve the forecasting of future economic processes?
Please reply
I invite you to the discussion
Thank you very much
Dear Colleagues and Friends from RG
The key aspects and determinants of applications of data processing technologies in Big Data database systems are described in the following publications:
I invite you to discussion and cooperation.
Best wishes
There has been a debate on the topic "Why the sunspot number needs re-examination?". What is the reason behind this controversial topic? Which model is currently the best model to predict the Sunspot Number in Solar Cycle 25?
There is a lot of research on AI-based air pollution forecasting, but very few have put up a reasonable explanation in this regard.
I want to know what might be the reasons for the performance drop ??
Is it a problem of data length or any other issue ??
Good day scholars, I have read a lot of articles on LSTM time-series forecasting capabilities.
However, I want to know if LSTM can be used for multi-output time-series forecasting. For example, I have x,y,z variables with 1000 time steps, and I want to use LSTM to forecast all the variables (x,y,z) in future time steps). Any recommendation or suggestion will be highly appreciated.
Thanks
The stock market prediction is a vibrant and exciting topic around the globe due to its ability to mint money with its magical prediction possibilities and furthermore rewarding academic appreciations associated with it.
But what is the feasibility of having a FUTURES* (derivative) prediction mechanism in place?
There is wide-ranging literature out there featuring Stocks, Indices and options. Why are there no articles related to futures market prediction, if there is a possibility let me know your insights.
Futures are restricted by their expiration. But, if that can be predicted, the chance of earning a handsome return is wide open as per the limited knowledge I possess.
Apparently, on the financial markets and in macroeconomic determinants of the economic situation in particular sectors and entire economies of developed countries, there are symptoms that suggest a high probability of economic slowdown from 2020 in individual countries and, consequently, in the entire global economy.
Therefore, I am asking you: Do you know the forecasts of the global economic development that would suggest a high probability of deceleration (or possibly acceleration) of economic growth from 2020 in individual countries and, consequently, in the entire global economy?
What are the symptoms of potential changes in the financial markets and / or the scope of macroeconomic determinants of the economic situation in particular sectors and entire economies?
If you know the results of prognostic research in this area, please send links to websites or scientific publications in which this type of prognostic issues are taken.
I wish you the best in New Year 2019.
Best wishes
Hi everyone, lately by using all the nice tools and benefits that Artificial Intelligence can offer as a technology, I am searching for possible applications in the Automotive Business.
Below are some simple examples of basic scenarios that I have discovered already and I would like to enhance them or discover new ones, that could help Automotive Industries on taking proper business actions:
1. Based on historical CRM Opportunities taking into account the Lead Source (TV, WEB, Phone), Customer Genre, Customer Age, Customer Geographical Area, Customer Follow-Up-Times and Model Of Interest, Model Price, predict the possibility to convert this opportunity into an Invoice.
2. Based on historical Service Document Turnover (Service Quote -> Service Schedule -> Service Order -> Service Invoice), predict the possibility of a new open (un-invoiced) Document.
3. Based on historical Vehicle Document Turnover (Vehicle Quote -> Vehicle Order -> Vehicle Invoice), predict the possibility of a new open (un-invoiced) Document.
4. Based on historical clocking of technicians that spent on fixing vehicles taking into account Model Code, Vehicle Mileage, Job Qualification Code, Parts Number Labor, Number, predict the expected workshop load for the following scheduled based on open Service Schedules.
What do you think?
I try to predict the occurrence of individual aquatic plants (48 species) with Random Forest (RF) models. For this I use six explanatory variables. The datasets are highly unbalanced. Lets say minimal 2.5% have presences, but can also go up to 25% (of 2000 observations). Not surprisingly, the accuracy (~70%) and Cohen's kappa (~0.2) are not very satisfactory. Moreover, the True Negative (TN) rate is high (~80%) while True Positive (TP) rate is low (~15%). I tried multiple things from changing the cut-off to 40-45%, which works somehow (still not satisfactory). Additionally, I subsampled my dataset (also down-sampling), build an RF model with 50 trees and repeat this 20 times and combine these 20 RF models in a one RF model (somehow circle reasoning as this is what down sampling does), but results in similar performance. Changing the mtry, node size (85-100% of the lowest class) or maximum number of observations ending in the terminal node (0-15% of the lowest class) also does not improve the performance. However, the latter two "smooth" the patterns, but does not improve performance or distinction between TN and TP. The best option seems to set the cut-off to 45%, node size to 90% and maximum obs to 10%.
First, my guess resulting to the low performance is of course due to the unbalanced dataset, where simply the pattern of absences is better captured than that of the presences. However, I cannot resolve this with the data I currently have (am I sure that I cannot resolve this? not really). This would mean I need more data (anyhow I want this). Second, TN are easier to predict in general. For example, fish need water, if there is no water the model predicts no fish (easy peasy). However, if there is water the model predicts fish, but because there is water, this does not necessarily mean there is fish. For aquatic plants, if flow velocity is > 0.5 m/s species of vascular plants are often absent and mosses are present. Yet, if flow velocity < 0.5 m/s this does not mean vascular are present or mosses are absent. Third, the predictor variables are not suitable and in general the species seem to distributed widely along the gradient of these predictors (you do not need an ML model to tell you this if you look at the boxplots). Moreover, correlations between predictors also present (while not an issue for prediction it is an issue for inference), for some species this is more apparent than others; and some species occur everywhere along these gradient. Although, this idea somehow seems to float around, actually relative little articles discuss this (excluding articles addressing the high misclassification rates of Trophic Macrophyte Indices in general):
Even using different model types does not really work (SVM, KNN, GLM, [binomial]). Naive Bayes seem to work, but the prior ends up extremely low for some species thus the model hardly predicts presence. However I turn or twist (organize) the data, I cannot obtain a satisfactory prediction. Are there any statistic or machine learning experts who have any tips or tricks to improve model performance, besides increasing the datasets?
P.S. Perhaps I should start a contest on Kaggle.
Hello all,
I am a new user of Python and Machine learning!
Hereunder, I am trying to explain my data and model and then ask my question.
I have a couple of independent variables: Ambient temperature, Solar intensity, PCM melting temperature (PCM is a kind of material that we glue to the back of the PV panel in this experiment) and, Wind speed. My only dependent variable is the power output of a photovoltaic panel.
All independent variables change during a day (from 8 AM to 5 PM) as "Time" passes. For instance, ambient temperature increases from 8 AM to 3 PM and gradually drops from 3 PM to 5 PM.
My question is: can I consider Time (which is defined in an hour -- e.g. 8,9,....,13,14,....,17) as another independent variable to use machine learning techniques (in Python) like linear regression, Bayesian linear regression and SVM in order to predict the behaviour of the system?
I think because time here shows its effects on temperatures and solar intensity directly, I can disregard "time" as another independent variable.
I am quite confused here. Any help and suggestion would be much appreciated.
Thanks a lot.
Here is the situation: I am trying to predition the energy consumption (load) of households using the artificial intelligence (machine learning) techniques.
Problem: The data is only available for the 40% of the households. Is it possible to predict the energy consumption for the rest of 60% households based on the available data (features) of 40% of households?
Dear All!
Is there a software in which I will make NMR prediction of compounds in deuterated acetontrile, acetone or methanol ? In mestrenova I can make only predictions in chloroform, dmso or water.
Thank you so much for your help!
Dear researchers,
Any recommendation on FREE online Webserver/ Software For metabolomic approaches and toxicity prediction for dermal ?
Is better if enclosed with guidance on how to interpret the results generated from the webserver.
This is because I would like to generate a report and have to do interpretation on it.
Thank you.
It is about a 3 class classification problem. Where test data has the probability of occurance of different classes are almost similar. I.e. they occur around 33% times each. Now upon training a model yields an accuracy of 45-48% on out of sample test data. Is this result significant in terms of prediction? Here accuracy is computed as %of correctly identified class to all classes. In other similar problems where problem is modelled as 2 class classification problem the maximum accuracy obtained in the literature is around 69%. But in present case the classes are "up" "down" and "no-change" instead of just "up" and "down"