Genetic algorithms (referred to hereafter as GAs) are a family of search algorithms that seek optimal solutions to problems using the principles of natural selection and evolution. GAs can be applied to almost any optimization problem and are especially useful for problems where other calculus-based techniques do not work, such as when the objective function has many local optimums, it is not differentiable or continuous, or solution elements are constrained to be integers or sequences. In most cases GAs require more computation than specialized techniques that take advantage of specific problem structure or characteristics. However, for optimization problems with no such techniques available, GAs provide a robust general method of solution. The current GA implementation in IML is experimental, and will be further developed and tested in later SAS releases.
In general, GAs use the following procedure to search for an optimum solution:
An initial population of solutions is randomly generated, and an objective function value is evaluated for each member of the solution population.
A new solution population is generated from the current solution population. First, individual members are chosen stochastically to parent the next generation such that those who are the "fittest" (have the best objective function values) are more likely to be picked. This process is called selection. Those chosen solutions are either copied directly to the next generation or passed to a crossover operator, with a user-specified crossover probabilty. The crossover operator combines two or more parents to produce new offspring solutions for the next generation. A fraction of the next generation solutions, selected according to a user-specified mutation probability, is passed to a mutation operator that introduces random variations in the solutions.
The crossover and mutation operators are commonly called genetic operators. The crossover operator passes characteristics from each parent to the offspring, especially those characteristics shared in common. It is selection and crossover that distinguish GAs from a purely random search, and direct the algorithm toward finding an optimum. Mutation is designed to ensure diversity in the search to prevent premature convergence to a local optimum.
As the final step in regeneration, the current population is replaced by the new solutions generated by selection, crossover, and mutation. The objective function values are evaluated for the new generation. A common variation on this approach that is supported in IML is to pass one or more of the best solutions from the current population on to the next population unchanged. This often leads to faster convergence, and assures that the best solution generated at any time during the optimization is never lost.
After regeneration, the process checks some stopping criteria, such as the number of iterations or some other convergence criteria. If the stopping criteria is not met, then the algorithm loops back to the regeneration step.
Although GAs have been demonstrated to work well for a variety of problems, there is no guarantee of convergence to a global optimum. Also, the convergence of GAs can be sensitive to the choice of genetic operators, mutation probability, and selection criteria, so that some initial experimentation and fine-tuning of these parameters is often required.
In the traditional formulation of GAs, the parameter set to be searched is mapped into finite-length bit strings, and the genetic operators applied to these strings, or chromosomes, are based on biological processes. While there is a theoretical basis for the effectiveness of GAs formulated in this way (Goldberg 1989), in practice most problems don’t fit naturally into this paradigm. Modern research has shown that optimizations can be set up using the natural solution domain (for example, a real vector or integer sequence) and applying crossover and mutation operators analogous to the traditional genetic operators, but more appropriate to the natural formulation of the problem (Michalewicz 1996). This latter approach is sometimes called evolutionary computing. IML implements the evolutionary computing approach because it makes it much easier to formulate practical problems with realistic constraints. Throughout this documentation, the term "genetic algorithm" is to be interpreted as evolutionary computing.
IML provides a flexible framework for implementing GAs, enabling you to write your own modules for the genetic operators and objective function, as well as providing some standard genetic operators that you can specify. This framework also enables you to introduce some variations to the usual GA, such as adapting the optimization parameters during the optimization, or incorporating some problem-specific local optimizations into the process to enhance convergence.
An IML program to do GA optimization is structured differently from a program doing nonlinear optimization with the nlp routines. With the nlp routines, generally a single call is made in which the user specifies the objective and optimization parameters, and that call runs the optimization process to completion. In contrast, to perform a GA optimization you use separate calls to the GA routines to specify the problem encoding (GASETUP), genetic operators (GASETMUT and GASETCRO), objective function (GASETOBJ), and selection criteria (GASETSEL). You then call the GAINIT routine to initialize the problem population. After that, you advance the optimization process by calling GAREGEN (for the regeneration step) within an IML loop. Within the loop you can use GAGETMEM and GAGETVAL calls to retrieve population members and objective function values for examination. This strategy allows you to monitor the convergence of the GA, adjust optimization parameters with GA routine calls within the loop, and exit the loop when the GA is not producing furthur improvement in the objective function. The next section explains the optimization parameters in more detail and gives guidance on how they should be set.