[AIM]Components of Heuristics and Hill Climbing

Day1

Outline

  • Benchmark function optimisation
  • Components of heuristic search
  • Representation
  • Evaluation
  • Neighbourhoods
  • Basic Hill Climbing/Local Search Methods
  • Strengths and weaknesses of hill climbing

Benchmark function optimisation

Benchmark function: The function, which can be used to test performance of any optimization approach and the related problem.

Why to use benchmark functions for optimisation?

Benchmark functions servers as a testbed for performance comparison of heuristic optimisation algorithms
  • The global minimum should be known
  • The function should be easily computed
  • The function is recognised to have certain characteristics(e.g. separable vs. non-separable)

Classification of benchmark functions

  • Continuity(Differentiability): Discontinuous vs continuous
  • Dimensionality
  • Modality: Unimodal vs Multimodal with few local minima vs Multimodal with exponential number of local minima
  • Separability
Explanation:
1. What do you mean by modality?
modal means the climax value. In Wikipedia, the definition is as follows:
In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object.
Therefore, unimodal means there is only one climax(might also be lowest value).
2. What do you mean by continuity or differentiability
Actually they are different. But differentiable function must be continuity.
In Wikipedia:
In calculus (a branch of mathematics), a differentiable function of one real variable is a function whose derivative exists at each point in its domain. As a result, the graph of a differentiable function must have a (non-vertical) tangent line at each point in its domain, be relatively smooth, and cannot contain any breaks, bends, or cusps.

Main Components of Heuristic(search) optimisation methods

  • Representation (encoding) of candidate solutions
  • Evaluation function
  • Initialisation (e.g. random)
  • Neighbourhood relation (move operators)
  • Search process (guideline)
  • Mechanism for escaping from local optima

Representation

We have different problems, such as 0/1 Knapsack problem, personnel rostering, DNA sequencing, parameter control…
We need different representation with respect to different problem:
  • Binary encoding is the most common for Knapsack problem
  • Integer representation: Assignment problem, like personnel rostering or timetabling problem
  • Permutation encoding can be used in ordering problems: TSP, sequencing problems. Compared with integer representation, permutation cannot overlap one value, it can only permute value
  • Value encoding, value can be neither number, alphabet or word, can be used in DNA sequencing, planning.
  • Nonlinear encoding: Tree encoding- Genetic programming
  • Special encoding : For example, random key encoding.

Evaluation Function

Also referred to as objective, cost, fitness, penalty, etc.
  • Indicates the quality of a given solution, distinguishing between better and worse individuals
  • Serves as a major link between the algorithm and the problem being solved
  • Evaluation functions could be computationally expensive

Delta evaluation

Key idea: calculate effects of differences between current search position s and a neighbour s’ on evaluation function value.
  • Evaluation function values often consist of independent contributions of solution components; hence, f(s’) can be efficiently calculated from f(s) by differences between s and s’ in terms of solution components.
  • For example, a TSP problem(delta evaluation):
  • In function optimisation

Neighbourhood

A neighbourhood of a solution x is a set of solutions that can be reached from x by a simple operator(move operator/heuristic)

Examples of neighbourhood

Binary representation

Flipping one bit of the solution
1 0 0 0 1 1 0 0 0 0 -> 0 0 0 0 1 1 0 0 0 0
Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.
Size: if the binary vector is of size n then the neighbourhood size is n.

Integer representation

Neighbourhood: a discrete value is replaced by any other character of the alphabet.
5 7 6 6 4 3 8 4 2 -> 7 7 6 6 4 3 8 4 2
Size: if the alphabet is of size k then the neighbourhood size is (k-1)n

Permutation

Adjacent pairwise interchange: 5 1 4 3 2 -> 1 5 4 3 2
Size: if permutation is of size n then the neighbourhood size is n-1
Insertion operator: take an arbitrary job in the permutation and insert it in another position 5 1 4 3 2 -> 1 4 5 3 2
Size: if permutation is of size n then the neighbourhood size n*(n-1)
Exchange operator: arbitrary selected two elements are swapped
5 1 4 3 2 -> 3 1 4 5 2
Inversion operator: arbitrary select two elements and invert the sequence between them
5 1 4 3 2 -> 5 3 4 1 2

summary from representation/evaluation/neighbourhood

  • Choosing an appropriate encoding to represent a candidate solution is crucial in heuristic optimisation
  • Initialisation could influence the performance of an optimisation algorithm
  • Evaluation function guides the search process and fast evaluation is important
  • Benchmarks are useful for performance comparison of different algorithms

Search paradigms

  • Mutational (diversification/exploration)
  • Hill Climbing (intensification/exploitation)

    Processes a given candidate solution and generates a solution which is not guaranteed to be better than the input

    Processes a given candidate solution and generates a better or equal quality solution

Hill Climbing Algorithm


Initial starting points may be chosen:
  • randomly
  • use a constructive heuristic
  • according to some regular pattern
  • based on other information(e.g. results of a prior search)

Simple Hill Climbing Heuristics(classification)

Simple Hill Climbing examining all neighbours:
  • Best improvement(steepest descent/ascent)
  • First improvement(gradient descent/ascent)
Stochastic Hill Climbing (randomly choose neighbours):
  • Random selection/random mutation hill climbing
Random-restart (shotgun) hill climbing is built on top of hill climbing and operates by changing the starting solution for the hill climbing, randomly and returning the best

Best improvement

Every time bit flip one bit, but no matter it improve or not, go back to the initial current solution and flip next bit.
After flip all the bits, choose the best one.
MAX-SAT as a example:

First improvement

Every time bit flip one bit, if there is improvement, accept the bit flip, otherwise reject it and go back.

Davis’s bit hill climbing

Make a permutation, and flip as the order(sequence) of the permutation. The accept criteria is same as first improvement.

Random mutation hill-climbing

Randomly create an array of integers as the order of the bit flip number.

Hill Climbing vs Random Walk

  • A hill-climbing method exploits the best available solution for possible improvement but neglect exploring a large portion of the search space.
  • Random walk(performs search in the search space, sampling new points with equal probability, e.g. random bit flip) explores the search space thoroughly but misses exploiting promising regions.
  • enter image description here

Weaknesses of hill climbing

  • Local Optimum: if all neighboring states are worse or the same. The algorithm will halt even though the solution may be far from satisfactory
  • Plateau(neutral space/shoulder): All neighboring states are the same as the current state. In other words the evaluation function is essentially flat. The search will conduct a random walk.
  • Ridge/valley: the search may oscillate from side to side, making little progress. In each case, the algorithm reaches a point at which no progress is being made. If this happens, an obvious thing to do is start again from a different starting point.
  • As a result, HC may not find the optimal solution and may get stuck at a local optimum.

评论

此博客中的热门博文

[MLE] Linear Classification

[AIM] MetaHeuristics

[CS231] Neural Networks