User:Johnsy/Advanced Modelling in Biology/Optimization: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 81: Line 81:
<center><math>p = e^{\frac{-\Delta E}{kT}}</math></center>
<center><math>p = e^{\frac{-\Delta E}{kT}}</math></center>


Pseudocode for the Simulated Annealing Algorithm:
Pseudocode for the Simulated Annealing Algorithm (can be used for both standard or combinatorial optimization problems):
#Select an initial state ''x<sub>0</sub>'' at random and evaluate the energy ''u(x<sub>0</sub>)''.
#Select an initial state ''x<sub>0</sub>'' at random and evaluate the energy ''u(x<sub>0</sub>)''.
#Select ''x<sub>1</sub>'', the next state at random and evaluate the energy''u(x<sub>1</sub>)''.
#Select ''x<sub>1</sub>'', the next state at random and evaluate the energy''u(x<sub>1</sub>)''.
Line 88: Line 88:
#Decrease T according to the cooling schedule
#Decrease T according to the cooling schedule
#Repeat steps 2 - 5 until we reach the desired number of time steps
#Repeat steps 2 - 5 until we reach the desired number of time steps
===Genetic/Evolutionary Algorithms===
This method is only for combinatorial problems and was inspired by genetics.  Evolutionary algorithms are generally faster and get around some of the problems found in simulated annealing.  Instead of minimizing an energy function, we maximize a "fitness" when we use evolutionary algorithms. 
We first define our function ''u'' in state space with a discrete set of solutions. 
#Start out with a population of randomly generated solutions (initial guesses) of size P
#Reproduction - take random pairs of parents and create their offspring (generating a total of P + P/2 solutions)
#Mutation & Crossover - Mutate each solution with a given probability and perform crossovers also with a given probability (where entire portions of the solution get moved to a different position)
#Rank the population according to their energy ''u(x)''
#Kill off the bottom P/2 to optimize ''u(x)'' leaving again a population of size P.
#Repeat from the reproduction step until we've achieved the number of time steps that are desired.

Revision as of 04:37, 19 May 2008

Introduction To Optimization

General Formulation

The general formulation for any optimization problem is this:

  • Minimize [math]\displaystyle{ f(x) }[/math], our cost/objective function, subject to a series of constraints:
    • [math]\displaystyle{ g_i(x) = 0 }[/math]
    • [math]\displaystyle{ h_i(x) \le 0 }[/math]
  • Where [math]\displaystyle{ x }[/math] is the variabe for optimization (can be multidimensional) and g and h are our constraints

What do we mean when we say [math]\displaystyle{ f(x) }[/math]?

  • [math]\displaystyle{ f(x) }[/math] is a function in the general sense, which can be continuous or discontinuous
  • Think of [math]\displaystyle{ f(x) }[/math] as a black box. For any input given, we will receive an output

Standard Optimization vs. Combinatorial Optimization

  • Standard optimization (continuous) deals with functions where the output is continuous (both variables and functions are continuous)
  • Combinatorial optimization deals with functions where the output is a discrete set (the fact that it is discrete does not mean that it is finite). It is important to note that calculus does not apply to these systems, making them more difficult

Traveling Salesman Problem

The traveling salesman problem is a classical combinatorial optimization problem formulated as such:

  • A salesman has to visit N number of cities. In what sequence do you travel through each city to minimize the distance you travel?
    • You can see that there are N! number of possible sequences for the N number of cities

Possible ways of solving this problem:

  • Heuristics - i.e. a lot of informal guesses as to the correct path (properly defined as methods/algorithms that tend to give the correct answer, but are not guaranteed to give the solution)
  • Complete Enumeration - going through all possible N! solutions to find the optimal path. Here, the computational time is related to the number of paths, or in big-O notation: O(N!)
    • O(N!) is known as combinatorial explosion and is a computationally NP-hard problem (Non-deterministic Polynomial time)
    • Some problems have been proven to only be solve by complete enumeration. Other methods do not guarantee the optimal solution for the problem.

Dimensionality

Recall that for a one dimensional system and we want to find the minimum for a given function [math]\displaystyle{ f(x) }[/math], a continuous, differentiable function, we look to the derivative such that:

[math]\displaystyle{ \frac{df}{dx} = 0 }[/math]

To ensure that we have a minimum, we want:

[math]\displaystyle{ \frac{d^2f}{dx^2} \gt 0 }[/math]

The first derivative tells us if there is a change in the monotonicity of the function while the second derivative guarantees us a minimum. For a two-dimensional system, [math]\displaystyle{ f(x,y) }[/math], we look to the gradient and the also the second derivative (Hessian) such that:

[math]\displaystyle{ \nabla f = 0 }[/math]

And that:

[math]\displaystyle{ H= \begin{pmatrix} \frac{d^2f}{dx^2} & \frac{d^2f}{dxdy} \\ \frac{d^2f}{dxdy} & \frac{d^2f}{dy^2} \end{pmatrix} \gt 0 }[/math]

The condition above is known as positive definite, and this guarantees that we have a minimum. H is also positive definite if the following condition exists:

[math]\displaystyle{ x^THx \gt 0, \forall x }[/math]

The function is also positive definite if the eigenvalues of H are both positive for symmetric matrices (usually this matrix will be symmetric). The eigenvalues being positive is related to the function being convex in both directions. Recall that if one eigenvalue is positive and the other negative, then the point is a saddle node.

Steepest Descent/Newton Method for Optimization

General formulation:

  • We wish to minimize [math]\displaystyle{ f(x) }[/math] given that it is positive definite.

Methods for approaching the problem:

  • fsolve() function in Matlab is an implementation of the steepest descent method
  • ode45() if we can define the function to be minimized in terms of a potential function
    • Let [math]\displaystyle{ u(x) }[/math] be our energy function such that if we integrate [math]\displaystyle{ \frac{dx}{dt} = -\nabla u }[/math] with respect to time, then we are assured that our energy function will decrease along all trajectories.
    • We must supply ode45() with an initial condition to run and with this method, we are guaranteed a minimum, but not necessarily the global minimum. Only if the function is convex that we are guaranteed a global minimum and hence convex functions are easier to optimize than non-convex functions.

Convex functions - functions in which for all points in x that our Hessian is positive definite (will only have one extremum for the entire function). Convexity is the result of the variables and if we can redefine the variables such that our function becomes convex, then this makes optimization much easier.

In the computer, steepest descent is performed by the Euler method of integration given our function:

[math]\displaystyle{ \frac{dx}{dt} = - \nabla u }[/math]

The Euler method expands out the differential to give:

[math]\displaystyle{ x_{t+1} = x_t - (\nabla u)\delta t }[/math]

The computer adjusts [math]\displaystyle{ \delta t }[/math] to make it more efficient to reach the minimum and will minimize [math]\displaystyle{ \nabla u }[/math] as much as possible until it reaches a given tolerance value such that:

[math]\displaystyle{ \parallel x_{t+1} - x_t \parallel \lt \varepsilon }[/math]

Possible problems with steepest descent:

  • If the function has a very narrow well with the minimum, then the initial conditions must be very close to the minimum to achieve optimization
  • If the function has several minima of various depths, then many initial conditions must be used to guarantee that we obtain the global minimum of the function.
  • Convergence of steepest descent is related to how fast you get to the solution. For very flat, but convex functions, then this will take relatively long to converge since we are not sure of the step size that we should be taking because the variation in the function is too small. To correct for flatness in a function we can use conjugate gradient methods which don't always take the path against the gradient but will take steps out of the direction of the gradient (similar to going up the ridges of a valley) before again taking steps in the direction of th gradient.
  • Although we might have a convex function, steepest descent will only get to the minimum of a function given an infinite time (remember that this only works for continuous functions).

Combinatorial Optimization with Greedy Algorithms

For discrete sets in combinatorial problems, we utilize greedy algorithms which takes a starting condition and looks to it's nearest neighbors and minimizes that distance before continuing. Remember that without complete enumeration, this will not necessarily give the best solution and may get caught in large distances later on in the algorithm. This algorithm minimizes locally in an attempt to minimize globally and this is no longer an O(N!) problem anymore, making it computationally easier than complete enumeration.

Simulated Annealing

This method attempts to correct for the presence of several minima and is a heuristic algorithm for non-convex problems derived from materials physics. We slightly adapt the Newton method by including a random term which will cause our function to sometimes go against the gradient.

[math]\displaystyle{ x_{t+1} = x_t - (\nabla u)\delta t + \alpha w(0,\sigma) }[/math]
  • [math]\displaystyle{ w(0,\sigma) }[/math] is a randomly generated number from a normal distribution with mean centered at zero and a standard deviation σ.
  • α is our control parameter (eg temperature) to determine the "jumpiness" of the step. The function usually begins with a high temperature and slowly decreases α allowing a complete exploration of the energy function. If we can adjust the temperature infinitely slowly, then we are guaranteed to be at the global minimum. The adjustment of α over time is known as the cooling schedule and is arbitrarily set. For some energy functions, a initially slow cooling period is required, then rapidly cools down after a certain number of time steps. For generally more convex functions, it is possible to decrease the temperature quickly at the beginning and slowly towards the end. Several cooling schedules are usually used to see which one obtains the best results.
  • For any change in energy that decreases, then we will always accept the change, but for an increase in energy, we will go upwards with a certain probability defined by the Boltzmann distribution:
[math]\displaystyle{ p = e^{\frac{-\Delta E}{kT}} }[/math]

Pseudocode for the Simulated Annealing Algorithm (can be used for both standard or combinatorial optimization problems):

  1. Select an initial state x0 at random and evaluate the energy u(x0).
  2. Select x1, the next state at random and evaluate the energyu(x1).
  3. Accept the change if the cost function is decreased
  4. Otherwise, accept the change with a probability [math]\displaystyle{ p = e^{\frac{-\Delta E}{kT}} }[/math]
  5. Decrease T according to the cooling schedule
  6. Repeat steps 2 - 5 until we reach the desired number of time steps

Genetic/Evolutionary Algorithms

This method is only for combinatorial problems and was inspired by genetics. Evolutionary algorithms are generally faster and get around some of the problems found in simulated annealing. Instead of minimizing an energy function, we maximize a "fitness" when we use evolutionary algorithms.

We first define our function u in state space with a discrete set of solutions.

  1. Start out with a population of randomly generated solutions (initial guesses) of size P
  2. Reproduction - take random pairs of parents and create their offspring (generating a total of P + P/2 solutions)
  3. Mutation & Crossover - Mutate each solution with a given probability and perform crossovers also with a given probability (where entire portions of the solution get moved to a different position)
  4. Rank the population according to their energy u(x)
  5. Kill off the bottom P/2 to optimize u(x) leaving again a population of size P.
  6. Repeat from the reproduction step until we've achieved the number of time steps that are desired.