Science topic

# Stochastic Optimization - Science topic

Explore the latest questions and answers in Stochastic Optimization, and find Stochastic Optimization experts.
Questions related to Stochastic Optimization
Question
In robust optimization, random variables are modeled as uncertain parameters belonging to a convex uncertainty set and the decision-maker protects the system against the worst case within that set.
In the context of nonlinear multi-stage max-min robust optimization problems:
What are the best robustness models such as Strict robustness, Cardinality constrained robustness, Adjustable robustness, Light robustness, Regret robustness, and Recoverable robustness?
How to solve max-min robust optimization problems without linearization/approximations efficiently? Algorithms?
How to approach nested robust optimization problems?
For example, the problem can be security-constrained AC optimal power flow.
To tractably reformulate robust nonlinear constraints, you can use the Fenchel duality scheme proposed by Ben Tal, Hertog and Vial in
"Deriving Robust Counterparts of Nonlinear Uncertain Inequalities"
Also, you can use Affine Decision Rules to deal with the multi-stage decision making structure. Check for example: "Optimality of Affine Policies in Multistage Robust Optimization" by Bertsimas, Iancu and Parrilo.
Question
In a situation, process simulation fails to converge for a particular set of decision variables, What would be the sequence of the line of code to close an unconverged simulation, and reopen the saved file before calling the simulation file for a new set of decision variables.
Also, in the case of an error message appearing during the simulation and the remaining VBA code cannot be executed. How do I use a "GoTo" statement to skip the remaining VBA code and continue the optimization by calling the simulation file for a new set of decision variables?
Can anyone help me with a similar VBA script that I can adapt to my code?
Attached here is the code for the interaction between excel and Aspen plus. Any idea/comment about the code to address this issue above would be highly appreciated.
Thanks
In general, if your simulation process is preceded by the generation of a population of feseable solutions, starting from an adequate to your task optimization (mono or multiple objective) model, then you will be able to simulate only feasible solutions, so your simulation model will not fail
Question
When accounting for uncertainty in demand for humanitarian logistics planning, one of the most common ways is to use stochastic optimization approach in which the demand is generally assumed to follow a certain distribution (usually normal or uniform).
My question here is, how to identify these distributions when there is no historical data (as is the case in most disasters)?
Thank you very much for all your replies, this is exactly the dilemma, while some suggest estimating these distributions, others question its possibility. But like Christopher Paulus Imanto suggested, normal distribution would not harm when there is absolutely no other ways to estimate the nature of distribution.
Thanks again, this helps.
Question
it seems that with solving the stationary form of forward Fokker Planck equation we can find the equilibrium solution of stochastic differential equation.
is the above statement true?is it a conventional way to find the equilibrium solution of a SDE? and do SDEs always have equilibrium solution?
The question about ``equilibrium solution" and probably also many other questions concerning Fokker-Planck equation are answered in the book of H. Risken, The Fokker-Planck Equation: Methods of Solution and Applications. For this one search the key-word detailed balance, as suggested above. The question abut stability of SDE is discussed for example here http://www6.cityu.edu.hk/ma/ws2010/doc/mao_notes.pdf .
Question
It will be really good if the suggested journal doesn't spend much time in revision cycles, because I submitted this algorithm to "Applied soft computing" journal 1 year ago, and after 6 revision cycles, they just reject it with no real reasons.
Hello, I suggest IJIETAP, Scopus indexed.
Question
in process control in engineering, of course in many situations we need to control a system under a performance index (optimal control), where the system is exposed to uncertainty ( parameter uncertainty or disturbance or noise). and sometimes we need some constrained on the state of the system.
there are two approaches: robust optimal control, stochastic optimal control.
when we use robust optimal control (because some bounds on the uncertainty is known) we consider the worst case scenario, and we can use optimal control and hard constraints on the states can be satisfied.I think this is a practical approach
on the other hand, when we can not specify some bounds on the uncertainty and the probability distribution of uncertainty is known, we must use stochastic optimal control. In this case, the hard constraint can not be defined, and we should use the definition of chance constraints, meaning the constraint can be satisfied with some level of probability.
now my question is, does such definition a practical definition in real-world application?and is it really applied in industry?
Most of the constraints are for safety. for example we want the temperature of the boiler to be bounded. it is dangerous if we want the temperature of the boiler to be bounded with some probability. so I want to know that, is chance constrain a practical definition in real-world application in engineering?
Dear Dr. Nieuwenhuis , Dr.Lafifi and Dr. Mahrouf
what I want to know here is that I believe the chance constraint is accepted by theorists, but what about experimentalists? do they really use it?
for an engineer, does chance-constrain really has a meaning?
is it really used in process control or even in engineering approaches nowadays?
Question
A question for GAMS and/or SDDP experts!
I have two benders based LP models, static and dynamic (based on SDDP) for a power plant with storage. Given the capacities of the plant, both the models give the same objective function. However if I add the generation expansion problem to it as benders master, the duals generated by the dynamic model for energy storage (in particular) are wrong.
Although if I calculate the dual manually by increasing the storage capacity by 1 unit the objective function changes by the correct dual (which I know from my static model) and not by the number generated by GAMS.
GAMS does not calculate duals.  Instead, it passes the model to a solver and returns the duals that are returned from the solver.  There are three possibilities if the duals on two different closely related runs are different while the primal objective is the same.
1. There may exist alternative duals.  This is highly likely in the case of degeneracy and for many (especially network-like) structures; there may even exist many different primal solutions all with the same objective function value.
2. Another possibility is, of course, a bug in the solver that is used to solve this particular model.  Does the primal-dual solution returned by GAMS satisfy the optimality conditions?  If not, send it to GAMS and ask them if they can report this to the solver developers.
3. There may be numerical issues leading to abnormal solver behavior.  Look into your model, make sure there are no scaling issues for instance, if you want to exclude this possibility.
To sort things out, I would check whether the primal and dual solutions returned by GAMS satisfy the primal and dual optimality conditions.  One easy way to do this, is to use the EXAMINER as the GAMS solver.  EXAMINER will call the actual LP solver to solve the model and then test the solution for primal feasibility, dual feasibility, complementarity, etc. and report back to you if the solution is primal and dual feasible.  See https://www.gams.com/latest/docs/solvers/examiner/index.html for more details.
Question
The modeling of these processes requires specifying, for each operation of production process, an interval of the authorized duration.
Considering the performance of the Petri Nets tool in terms of modeling synchronizations, parallels, conflicts and sharing of resources, this tool is seen as an important research way for modeling and evaluation of robustness.
I'd never heard of them, but a quick look at Wiki describes this as a way of distributing data and analysis across a network of processors, perhaps to optimize the overall load and production time.  Please correct me on anything here.  And, thanks for mentioning this particular field, I intend to have more of a look at the topic, since I am interested in distributed network operations/processing.
Question
Are there versions of CMA-ES specifically designed for high dimensional search spaces? Are there any implementations available (preferable in MATLAB)?
you can check this:
Loshchilov, I. (2014, July). A computationally efficient limited memory CMA-ES for large scale optimization. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation (pp. 397-404). ACM.
Question
I know that the polynomial chaos expansion can deal with many distributions such as normal distribution, beta distribution and gamma distribution. But if they simultaneously occur in equations, how can I handle such occasion?
Hi,
PCE is alternative to the Monte-Carlo method and initially was created to deal with
Gaussian variables. But it can be also used with non-Gaussian random variables.
The question is are your variables independent or not?
There are some techniques for correlated non-Gaussian random variables.
Question
1. Decay Steps ?      e.g  200
2. Base decay ?       e.g   0.99
3. SGD Batch size ?   e.g 128
Sgd batch size is used when the training set is too big then you try to randomly select training instances so for example on each epoch you use 128 instances,  the whole purpose is to speed up learning process,  and it actually works
Question
I am interested in optimization of the fittest metabolic network topology which survives in the evolutionary pressure. Can anyone help me on evolutionary optimization algorithm? Actually I am in trouble in set up of the multi objective fitness function.
Stochastic optimization
genetic algorithm
Pareto optimization
True multi-objective optimization considers two or more fitness functions simultaneously in order to determine a set of pareto-optimal solutions a posteriori. This should not be confused with other approaches that construct a single fitness function by assigning relative weights to multiple criteria based on what the researcher thinks is important a priori.
I highly recommend that you read the following book:
Multi-Objective Optimization using Evolutionary Algorithms
by Kalyanmoy Deb
ISBN: 978-0-471-87339-6
Question
A simple function 1/x is not define at x=0 , this extends to first and second derivative. From what angle can we veiw the optimal behaviour of this function. A source to a comphrehensive discussion on this type of function in area of optimization will be appreciated
Kamil, the students don't even need to know the terms "infimum" and "supremum" to understand the example that I gave.  You can just use "min" if you want.  You could also draw a picture of the feasible region (which is the region lying above the hyperbola y = 1/x), so that students can see what is happening visually.  The point is simply that, when nonlinear functions are involved, you have to be careful when trying to optimise.  All the students need is a familiarity with drawing a function of a single variable.
By the way, I'm afraid that I don't see any connection with fuzzy logic.  Fuzzy logic is concerned with performing inference when the membership of entities to discrete classes is uncertain.  It has nothing to do with either analysis or optimisation.
Question
My proposed answer is that they know what it is that they want to know. They do not know that the optimal statistic exists, let alone that it is linear! Irving Fisher told them that estimation is a choice of functions and so it is a problem in stochastic optimal control or just simply calculus of variations. Instead of doing the worl, they massage the data set until it tells them exactly what the pay masters wish for it to tel. Thus they pride themselves by being his/her master's voice. Apologies to author of RCA Vector's logo
I ran this by two econometrician who are economics Nobel acolytes.  They told me essentially that the only objections then had to econometrics (thy advocate that no none should teach econometrics!). Their reason is that the "results" do not confirm the economic prejudices of either one of them.  Can you help?
Dear Pro Mohamed
It is my honor to say on my own that not only the econometricians, the econmists also have to know the theory and also have to know the ways to check theories. But, in every economic theories and also econometrics theory they have assumptions that are the point.
And, day by day the researches in both areas try to study more by rejecting the assumption and check it.
Bests
Canh
Question
I'm looking for a good way to approximate a multivariate log-normal distribution by a discrete distribution with finite state space. The discrete distribution should have the same first two moments.
Thanks for your help!
Have a look at doi: 10.1016/j.ejor.2014.07.049 for a method to generate a distribution with N support points and N probability weights, to match a given mean vector, covariance matrix, average third moment and average fourth moment. If you can bear with using optimisation, there are other moment-matching options available in the literature.
Question
I want to develop a stochastic optimization model that handles inventories of a just-in-time setting.
I would recommend you to use stochastic dynamic programming, the example  in the page 51 of the attached is a very good example (Dynamic Programming and Stochastic Control, written by Dimitri Bertsekas) .
Question
When I use ACO to detect image edges, there are more than 9 parameters need to be set. According to the previous methods, I chose an experimental technology to determine these parameters, but this is not rigorous due to the lack of mathematical derivation. How can I remedy this deficiency? Thank you!
The problem is suited for a genetic algorithm,
with the parameters as the genome.
The tricky part is evaluation of fitness.
For a sample image, create a black-white line image
where the lines are black pixels and all other
is white. You can obtain this BW image manually
(precision is not so much an issue) or with another
edge finding algorithm.
In order to evaluate an ant (its fitness),
let it start on the original image, possibly
at a place where the BW image has a black pixel,
meaning there's a line. Now let the ant follow
the edge. How well this is done can be measured.
For example, there must always be a black pixel
of the BW image within the scope of the ant
(say 5*5 area), and the ant must move away from
pixels it had already visited. Except at the
end of an edge where it should stop.
Regards,
Joachim
Question
Data centers load is basically server work load and cooling load associated with it. In literature papers, there are ways to model the IT load (delay tolerant and delay sensitive) and also cooling load, but how can we use that model to find uncertainty of the load. Is the uncertainty assumed by specifying the range of demand, or can be solved as a stochastic optimization considering different scenarios
Question
NONANTICIPATIVITY
As far as I know, decision making under uncertainty can often be formalized as a stochastic problem. I have seen a constraint which is called "Nonanticipativity" in the most papers regarding Stochastic Programming.
I would like to know that
1) What is its concept and role?
2) Is it essential for stochastic optimization problems or just for some especial stochastic problems?
3) Can I ignore it?
I would greatly appreciate any help.
Regards,
There are two main approaches to solving multi-stage stochastic programs: Benders decomposition, which decomposes the problem by scenario, and Lagrangian decomposition, which decomposes it by time-stage instead.
The non-anticipativity constraints appear in the latter approach.  Basically, you have two copies of each variable, one representing the value it takes before you know the realisation of some random parameter, and the other representing the value it takes after you know the value.  In reality, the values of the two variables must be equal, since you have to fix the value of the variable before you know the realisation.  So non-anticipativity constraints are just equations stating that two variables must take equal values.
Now,  by relaxing all of the non-anticipativity constraints in Lagrangian fashion, you obtain a relaxation that decomposes into one independent subproblem per time-stage.  This gives a bound for your original problem.  Then, standard techniques (such as the subgradient method) can be used to solve the Lagrangian dual, i.e., find a collection of Lagrangian multipliers that give you the best bound.
Question
Part of my problem need to solve an finite-horizon, discrete-time MDP where the distribution of the state in each slot is i.i.d. and do not depends on the action. Are there any simple policies that can obtain the optimal solution? What if we consider adding a total cost constraint? Thanks!
Unfortunately I have little experience working with other stopping problems than the basic one so I am a bit out of my comfort zone. But for your problem, with T and P proportionally going to infinity, perhaps you can do something with the quantile function of X. One solution could be to use a policy where you transmit when X>Q(1-P/T) where Q is the quantile function of X. Then you will in average consume P/T resources and the expected value is E(X|X>Q(1-P/T)). However, this is just quick guesswork. I have no idea if it is optimal.
Question
Are there research papers (e.g. variations of Q-learning) on reinforcement learning in Partially observable model-free environments. I am interested in knowing what are the future research aspects as well as challenges of this area ? How much the theory of Stochastic Approximation can be useful here ?
Another question is that from the sequence of observations and actions only how much close I can go to the optimal deterministic policy of the underlying MDP of the POMDP..
POMDPs are a bit tricky, but here are some papers, which you might find interesting:
Point-based value iteration: An anytime algorithm for POMDP
Perseus: Randomized Point-basedValue Iteration for POMDPs
Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs
Question
Hi there!
I have a sample of 2500 data and each having 9 attributes. I divided the set into 75% + 25% for training-testing purpose ( random selection for testing ). In SGB model I have taken 0.05 as learning rate (Shrinkage factor) and 0.5 as the sub-sample fraction for bagging. Each tree has 15 numbers of terminal nodes and all the features (attributes) interactions are allowed. By growing 20,000 trees (i.e. iterating 20,000 times) sequentially I am getting an R2 (R-square) of 99.7 in training data and 98.8 on testing data. In 10 fold cross validation I am getting an R2 of 97.6.
As I have used very low learning rate and bagging concept I assume that the accuracy I am getting is not due to the Over-fitting, and the MSE Vs No. of Tree graph is gradually decreasing without any spikes.
But, as I am iterating it  20,000 times, and getting this much of accuracy I am little bit confused regarding the Over-fitting concept. Please suggest me whether my approach and understandings are correct or not.
Thank you.
trat's a tricky (if not even cunning) detail
what kind of data are You testing?
Question
There are methods such as sample average method and progressive hedging algorithm for solving a mixed integer programming stochastic optimization.
Dear Hossein,
from your explanation I understand that all your variables could be chosen in dependence of the realization of the data, which I regard a wait-and-see problem. This means in principle that you could solve the problem after knowing the data.
I cannot see how expected value or variance this way can enter the objective. Your solutions, as you describe it, should be dependent on the current realization, not on all possible ones. This way decomposition of a two- or multi-stage stochastic problem does not help wrt. the thing you are interested in.
A possible question might be: what is the distribution of the optimal value / optimal solution set depending on the distributions of the data. I am not aware whether there are known answers to this type of question.
If you want to solve yout problem for all possible realizations of the data means the same as being able to solve the full-parametric optimization problem. To solve the abovementioned distribution problem the best would be availability of an expression for the solution in an analytic way, but I do not know a possibility of doing this. at least for nonlinear problems.
Question
Stochastic Gradient Descent (SGD) is fast for optimizing many convex objectives. But why does it fail to produce sparse solutions? Is there an intuitive explanation?
To further clarify my question: while Coordinate Descent is "naturally" suitable for producing sparse solutions, why does GD fail to have this ability before any fix added to it?
Or, what's the difference between CD and GD that makes one sparsity-inducing while the other not?
Assume that you are using SGD to optimize the LASSO objective function, that is sum of squared errors plus the L1 norm of the solution.
The L1 term gives you sparsity of the optimal solution. Using SGD, you will indeed converge to the sparse solution with an infinite amount of iterations.
The problem here is that you want each single intermediate solution generated during the optimization to be also sparse. While this is a reasonable wish, it does not come for free. It is an additional requirement that you have to add specifically in any optimization algorithm you use, not only SGD.
For SGD and GD, you can obtain sparse intermediate solutions using the proximal operator of the L1 term, instead of using its gradient. See, for example,
FISTA for GD
or, FOBOS and RDA for SGD
Question