Evolutionary Computation | Vamshi Jandhyala

Basic Evolutionary Processes

The first thing to note is that there are at least two possible interpretations of the term evolutionary system. It is frequently used in a very general sense to describe a system that changes incrementally over time. The second sense, and the one in this post, is the narrower use of the term in biology, namely, to mean a $\textbf{Darwinian evolutionary system}$ . One way of characterizing a Darwinian evolutionary system is to identify a set of core components that constitues such a system. The core components are:

one or more populations of individuals competing for limited resources,
the notion of dynamically changing populations due to the birth and death of individuals,
a concept of fitness which reflects the ability of an individual to survive and reproduce, and
a concept of variational inheritance: offspring closely resemble their parents, but are not identical.

Such a characterization leads naturally to the view of an evolutionary system as a process that, given particular initial conditions, follows a trajectory over time through a complex evolutionary state space. One can then study various aspects of these processes such as their convergence properties, their sensitivity to initial conditions, their transient behavior, and so on. Depending on one’s goals and interests, various components of such a system may be fixed or themselves subject to evolutionary pressures.

Evolutionary systems/algorithms as parallel adaptive search

It does not take much imagination to interpret an evolutionary system as a parallel adaptive search procedure not unlike a swarm of ants exploring a landscape in search of food. Initial random individual movement gives way to more focused exploration, not as the result of some pre-planned group search procedure, but rather through dynamic reorganization as clues regarding food locations are encountered. Mathematically, one can view individuals in the population as representing sample points in a search space that provide clues about the location of regions of high fitness. The simulated evolutionary dynamics produce an adaptive, fitness-biased exploration of the search space. When the evolutionary process is terminated, the results of that search process (e.g., the best point found) can be viewed as the “answer” to the search problem.

In order to be applied to a particular problem, this abstract notion of an evolutionary algorithm(EA)-based parallel adaptive search procedure must be instantiated by a series of key design decisions involving:

deciding what an individual in the population represents,
providing a means for computing the fitness of an individual,
deciding how children (new search points) are generated from parents (current search points),
specifying population sizes and dynamics,
defining a termination criterion for stopping the evolutionary process, and
returning an answer.

Representation of an individual

In the simplest representation, the individuals in the population represent potential solutions to the problem at hand, and the fitness of an individual is defined in terms of the quality of the solution it represents or in terms of its proximity to a solution. There are, of course, other possible scenarios. The population as a whole could represent a solution. For example, each individual could represent a decision rule, and the population could represent the current set of rules being evaluated collectively for its decision-making effectiveness. Alternatively, we might develop a hybrid system in which individuals in the population represent the initial conditions for a problem-specific local search procedure, such as the initial weights used by an artificial neural network backpropagation procedure.

For simplicity, the focus of this post will be on on EAs that search solution spaces. However, having committed to searching solution spaces still leaves open the question of how to represent that space internally in an EA.

There are two primary approaches one might take in choosing a representation: a phenotypic approach in which individuals represent solutions internally exactly as they are represented externally, and a genotypic approach in which individuals internally represent solutions encoded in a universal representation language. Both approaches have their advantages and disadvantages. A phenotypic approach generally allows for more exploitation of problem-specific properties, but at the expense of more EA software development time. A genotypic approach encourages rapid prototyping of new applications, but makes it more difficult to take advantage of domain knowledge.

Fixed-Length Linear Objects

The simplest and most natural internal representation for an EA involves individuals that consist of fixed-length vectors of genes. Hence, solution spaces that are defined as $N$ dimensional parameter spaces are the simplest and easiest to represent internally in an EA since solutions are described by fixed-length vectors of parameters, and simple internal representations are obtained by considering each parameter a “gene”. In this case the only decision involves whether individual parameters are internally represented phenotypically (i.e., as is) or encoded genotypically (e.g., as binary strings). There are other representationa possible using $\textbf{nonlinear and variable}$ length objects but then a lot of care is required while devising reproductive operators.

Reproductive Operators

It was noted in the previous section that closely tied to EA design decisions about representation are the choices made for reproductive operators. If our EA is searching solution spaces, we need reproductive operators that use the current population to generate interesting new solution candidates. With simple EAs we have two basic strategies for doing so:

perturb an existing solution (mutation) and/or
hybridize existing solutions (recombination).

There is no a priori reason to choose one or the other. In fact, these two reproductive strategies are quite complementary in nature, and an EA that uses both is generally more robust than an EA using either one alone. However, the specific form that these reproductive operators take depends heavily on the choice made for representation.

EA theory provides some additional high-level guidance to help the practitioner. First, EA theory tells us that a smoothly running EA engine has reproductive operators that exhibit high fitness correlation between parents and offspring. That is, reproductive operators are not expected to work magic with low fitness parents, nor are they expected to produce (on average) poor quality offspring from high fitness parents. While fitness correlation does not give specific advice on how to construct a particular reproductive operator, it can be used to compare the effectiveness of candidate operators. Operators that achieve good fitness correlation are those that effectively manipulate semantically meaningful building blocks. This is why the choices of representation and operators are so closely coupled. Finally, in order for mutation and recombination operators to manipulate building blocks effectively, the building blocks must not in general be highly $\textbf{epistatic}$ ; that is, they must not interact too strongly with each other with respect to their effects on the viability of an individual and its fitness.

Mutation

The classic one-parent reproductive mechanism is mutation that operates by cloning a parent and then providing some variation by modifying one or more genes in the offspring’s genome. The amount of variation is controlled by specifying how many genes are to be modified and the manner in which genes are to be modified. These two aspects interact in the following way. It is often the case that there is a natural distance metric associated with the values that a gene can take on, such as Euclidean distance for the real-valued parameter landscapes. Using this metric, one can then quantify the exploration level of mutation operators based on the number of genes modified and the amount of change they make to a gene’s value.

Recombination

The classic two-parent reproductive mechanism is recombination in which subcomponents of the parents’ genomes are cloned and reassembled to create an offspring genome. For simple fixed-length linear genome representations, the recombination operators have traditionally taken the form of $\textbf{crossover}$ operators, in which the crossover points mark the linear subsegments on the parents’ genomes to be copied and reassembled. So, for example, a $1-$ point crossover operator would produce an offspring by randomly selecting a crossover point between genes $i$ and $i + 1$ , and then copying genes $1...i$ from parent $1$ and genes $i + 1...L$ from parent $2$ . Similarly, a $2-$ point crossover operator randomly selects two crossover points and copies segments one and three from parent $1$ and segment two from parent $2$ . For these kinds of reproductive operators, the amount of variation introduced when producing children is dependent on two factors: how many crossover points there are and how similar the parents are to each other. The interesting implication of this is that, unlike mutation, the amount of variation introduced by crossover operating on fixed-length linear genomes diminishes over time as selection makes the population more homogeneous. This dependency on the contents of the population makes it much more difficult to estimate the level of crossover-induced variation. What can be calculated is the variation due to the number of crossover points. One traditional way of doing this is to calculate the “disruptiveness” of a crossover operator. This is done by calculating the probability that a child will inherit a set of $K$ genes from one of its parents. Increasing the number of crossover points increases variation and simultaneously reduces the likelihood that a set of $K$ genes will be passed on to a child. Even the $1-$ and $2-$ point crossover operators described above, which are the ones used in canonical GAs, can be shown to introduce adequate amounts of variation when the parent population is fairly heterogeneous. One of the difficulties with these traditional crossover operators is that, by always choosing a fixed number of crossover points, they introduce a (generally undesirable) distance bias in that genes close together on a genome are more likely to be inherited as a group than if they are widely separated. A solution to this dilemma is to make the number of crossover points a stochastic variable as well. One method for achieving this is to imagine flipping a coin at each gene position. If it comes up heads, the child inherits that gene from parent 1; otherwise, the child inherits that gene from parent $2$ . Hence, a coin flip sequence of $TTHHHH$ corresponds to a $1-$ point crossover, while a sequence of $HHTTTH$ corresponds to a $2-$ point crossover operation. Depending on the coin flip sequence, anywhere from zero to $L − 1$ crossover points can be generated. If the probability of heads is $0.5$ , then the average number of crossover points generated is $L/2$ and has been dubbed “uniform crossover”. The key feature of uniform crossover is that it can be shown to have no distance bias. Hence, the location of a set of genes on a genome does not affect its heritability. However, uniform crossover can be shown to have a much higher level of disruption than $1-$ or $2-$ point crossover, and for many situations its level of disruption (variation) is too high. Fortunately, the level of disruption can be controlled without introducing a distance bias simply by varying from $0.0$ (no disruption) to $0.5$ (maximum disruption) the probability of a heads occurring. This unbiased crossover operator with tunable disruption has been dubbed “parameterized uniform crossover” and is widely used in practice.

Population Size

Parent Population Size

Intuitively, the parent population size can be viewed as a measure of the degree of parallel search an EA supports, since the parent population is the basis for generating new search points. For simple landscapes only small amounts of parallelism are required. However, for more complex, multi-peaked landscapes, populations of $100s$ or even $1000s$ of parents are frequently required.

Offspring Population Size

By contrast, the offspring population size plays quite different roles in an EA. One important role relates the exploration/exploitation balance that is critical for good EA search behavior. The current parent population reflects where in the solution space an EA is focusing its search, based on the feedback from earlier generations. The number of offspring generated is a measure of how long one is willing to continue to use the current parent population as the basis for generating new offspring without integrating the newly generated high-fitness offspring back into the parent population.

Selection

The simple EAs all maintain a population of size $m$ by repeatedly:

using the current population as a source of parents to produce $n$ offspring, and
reducing the expanded population from $m + n$ to m individuals.

Regardless of the particular values of $m$ and $n$ , both steps involve selecting a subset of individuals from a given set. In step one, the required number of parents are selected to produce $n$ offspring. In step two, $m$ individuals are selected to survive. So far, we have seen several examples of the two basic categories of selection mechanisms: $\textbf{deterministic}$ and $\textbf{stochastic}$ selection methods. With deterministic methods, each individual in the selection pool is assigned a fixed number that corresponds to the number of times they will be selected. With stochastic selection mechanisms, individuals in the selection pool are assigned a fixed probability $p_i$ of being chosen. So, for example, in a GA parents are selected stochastically using a fitness-proportional probability distribution. Stochastic selection can be used to add “noise” to EA-based problem solvers in a way that improves their “robustness” by decreasing the likelihood of converging to a sub-optimal solution.

However, more important than whether selection is stochastic or deterministic is how a particular selection mechanism distributes selection pressure over the selection pool of candidate individuals. The various selection schemes can be ranked according to selection pressure strength. The ranking from weakest to strongest is:

uniform
fitness-proportional
linear ranking and binary tournament
nonlinear ranking and tournaments
truncation

Survival Selection

So far there has not been any discussion regarding exactly which individuals are competing for survival. The answer differs depending on whether a particular EA implements an $\textbf{overlapping}$ or $\textbf{non-overlapping}$ generation model. With non-overlapping models, the entire parent population dies off each generation and the offspring only compete with each other for survival. In non-overlapping models, if the offspring population size $n$ is significantly larger than the parent population size $m$ then competition for survival increases. Hence, we see another role that increasing offspring population size plays, namely, amplifying non-uniform survival selection pressure. However, a much more significant effect on selection pressure occurs when using an EA with an overlapping-generation model. In this case, parents and offspring compete with each other for survival. The combination of a larger selection pool $(m + n)$ and the fact that, as evolution proceeds, the $m$ parents provide stronger and stronger competition, results in a significant increase in selection pressure over a non-overlapping version of the same EA.

Convergence and Stopping Criteria

There are a number of EA properties that one can potentially use as indicators of convergence and stopping criteria. Ideally, of course, we want an EA to stop when it “finds the answer”. For some classes of search problems (e.g., constraint satisfaction problems) it is easy to detect that an answer has been found. But for most problems (e.g., global optimization) there is no way of knowing for sure. Rather, the search process is terminated on the basis of other criteria (e.g., convergence of the algorithm) and the best solution encountered during the search process is returned as “the answer”. The most obvious way to detect convergence in an EA is recognizing when an EA has reached a fixed point in the sense that no further changes in the population will occur. The difficulty with this is that only the simplest EAs converge to a static fixed point in finite time. Almost every EA of sufficient complexity to be of use as a problem solver converges in the limit as the number of generations approaches infinity to a probability distribution over population states. To the observer, this appears as a sort of “punctuated equilibrium” in which, as evolution proceeds, an EA will appear to have converged and then exhibit a sudden improvement in fitness. So, in practice, we need to be able to detect when an EA has converged in the sense that a “law of diminishing returns” has set in. As we saw earlier, from a dynamical systems point of view homogeneous populations are basins of attraction from which it is difficult for EAs to escape. Hence, one useful measure of convergence is the degree of homogeneity of the population. This provides direct evidence of how focused the EA search is at any particular time, and allows one to monitor over time an initially broad and diverse population that, under selection pressure, becomes increasingly more narrow and focused. By choosing an appropriate measure of population homogeneity (e.g., spatial dispersion, entropy, etc.) one typically observes a fairly rapid decrease in homogeneity and then a settling into a steady state as the slope of the homogeneity measure approaches zero. A simpler, but often just as effective measure is to monitor the best-so-far objective fitness during an EA run. Best-so-far curves are typically the mirror image of homogeneity curves in that best-so-far curves rise rapidly and then flatten out. A “diminishing returns” signal can be triggered if little or no improvement in global objective fitness is observed for g generations (typically, 10–20). Both of these measures (population homogeneity and global objective fitness improvements) are fairly robust, problem-independent measures of convergence. However, for problem domains that are computationally intensive (e.g., every fitness evaluation involves running a war game simulation), it may be the case that one cannot afford to wait until convergence is signaled. Instead, one is given a fixed computational budget (e.g., a maximum of 10,000 simulations) and runs an EA until convergence or until the computational budget is exceeded.

Returning an Answer

A final benefit of having a simple EA search solution spaces is that the process of returning an answer is fairly straightforward: whenever the stopping criterion is met, the individual with the best objective fitness encountered during the evolutionary run is returned. Of course, there is nothing to prohibit one from returning the $K$ most fit solutions encountered if that is perceived as desirable. This raises the issue of how an EA “remembers” the most fit $K$ individuals encountered during an entire evolutionary run. One solution is to use an EA with an elitist policy that guarantees that the best $K$ individuals will never be deleted from the population. At the other end of the spectrum are “generational” EAs in which parents only survive for one generation. For such EAs the best solutions encountered over an entire run may not be members of the population at the time the stopping criterion is met, and so an additional “memo pad” is required to record the list of the top $K$ solutions encountered. Does it matter which of these two approaches one uses? In general, the answer is yes. The more elitist an EA is, the faster it converges, but at the increased risk of not finding the best solution. In computer science terms, increasing the degree of elitism increases the “greediness” of the algorithm. If we reduce the degree of elitism, we slow down the rate of convergence and increase the probability of finding the best solution. The difficulty, of course, is that if we slow down convergence too much, the answer returned when the stopping criterion is met may be worse than those obtained with more elitist EAs. As noted earlier, appropriate rates of convergence are quite problem-specific and often are determined experimentally.

Generic Genetic Algorithm implementation in Python

$\textbf{Parent selection}$ : All parents are selected once.

$\textbf{Reproductive Operator}$ : One point crossover.

$\textbf{Reproduction}$ : Parents are sorted by fitness and each of the adjacent $m/2$ pairs in the sorted population are used for recombination with $1-$ point crossover reproductive operator.

$\textbf{Survival selection}$ : Non-overlapping i.e. parent population is completely replaced by child population.

from typing import TypeVar, Generic
from collections.abc import Iterable
from dataclasses import dataclass
from abc import ABC, abstractmethod

@dataclass
class Chromosome:
    genes: Iterable[Generic[T]]
    size: int
    fitness: float = None
    age: int = 0

class Problem(ABC):
    @abstractmethod
    def genotype(self):
        pass
    
    @abstractmethod
    def terminate(self):
        pass
      
    @abstractmethod
    def fitness_fun(self):
        pass

class Crossover:
    @staticmethod
    def one_point_crossover(x, y):
        from random import randint
        size = len(x.genes)
        cx_point = randint(0, size)
        (h1, t1), (h2, t2) = np.split(x.genes,[cx_point]), np.split(y.genes,[cx_point])
        return (Chromosome(genes = np.concatenate([h1, t2]), size = size), 
                Chromosome(genes = np.concatenate([h2, t1]), size = size))

class Selection:
    @staticmethod
    def elite(population, fitness_fun, n):
        return sorted(population, fitness_fun, reverse=True)[:n]
        
    @staticmethod
    def random(population, n):
        from random import choices
        return choices(population, k=n)

class GeneticAlgorithm:
    def __init__(self, problem):
        self.problem = problem
        self.population = None

    def initialize(self):
        return [self.problem.genotype() for _ in range(self.problem.population_size)]

    def evaluate(self):
      for i in range(len(self.population)):
          self.population[i].fitness = self.problem.fitness_fun(self.population[i])
          self.population[i].age += 1
      self.population.sort(key=lambda x:x.fitness, reverse=True)
      
    def reproduce(self):
        def chunks(lst, n):
            for i in range(0, len(lst), n):
                yield lst[i:i + n]

        new_population = []
        for x,y in chunks(self.population, 2):
            new_population.extend(list(self.problem.crossover(x,y)))
        self.population = new_population
 
    def run(self):
        self.population = self.initialize()
        while(not self.problem.terminate(self.population)):
            self.reproduce()
            self.evaluate()
        return self.population[0]

Applications

The N-Queens problem

Arrange $N$ queens on a $N\times N$ chess board so that none of the queens conflict with another. This problem, known as $N$ -queens, is a fundamental constraint satisfaction problem

Solution

$\textbf{Initial Population}$ : A random permutation of $0,\dots, N-1$ indicating the position of each queen in a row.

$\textbf{Fitness function}$ : $N - \text{number of diagonal clashes}$

$\textbf{Parent selection}$ : All parents are selected once.

$\textbf{Reproductive Operator}$ : One point crossover.

$\textbf{Reproduction}$ : Parents are sorted by fitness and each of the adjacent $m/2$ pairs in the sorted population are used for recombination with $1-$ point crossover reproductive operator.

$\textbf{Survival selection}$ : Non-overlapping i.e. parent population is completely replaced by child population.

$\textbf{Termination condition}$ : $fitness = N$ .

class NQueens(Problem):
    def __init__(self, l, population_size, crossover):
        self.population_size = population_size
        self.crossover = crossover
        self.l = l

    def fitness_fun(self, chromosome):
        diag_clashes = 0
        for i in range(chromosome.size):
            for j in range(chromosome.size):
                if i != j:
                  dx = abs(i - j)
                  dy = abs(chromosome.genes[i] - chromosome.genes[j])
                  diag_clashes += 1 if dx == dy else 0
        return len(np.unique(chromosome.genes)) - diag_clashes

    def genotype(self):
        genes = np.random.permutation(self.l)
        chromosome = Chromosome(genes=genes, size=self.l)
        chromosome.fitness = self.fitness_fun(chromosome)
        return chromosome

    def terminate(self, population):
        if population[0].fitness == self.l:
            return True
        else:
            return False

ga_n_queens = GeneticAlgorithm(NQueens(8, 1000, Crossover.one_point_crossover))
print(ga_n_queens.run())

References

Evolutionary Computation: A Unified Approach by Kenneth A De Jong

Are You Clever Enough?

Simulated Annealing