Forecasting with Genetic Programming (in Python)
In the last months I’ve been investigating various methodologies to perform an effective forecasting, especially for energy demand time series. Given the various talks about Genetic Programming I attended in the last Evo* conferences (EUROGP conference is among them), I wanted to implement a forecasting method using Genetic Programming and my first choice was pyevolve framework, an amazing simple and powerful (and updated!) library for Evolutionary Computation written in Python.
I was surprised that the first (working) implementation came out only after twenty minutes, using the simple tutorial the author Christian Perone provides in his blog.
Everything starts in this way:
from pyevolve import * import pylab import math |
Then you define a main function, where I put the data load routine from a .csv file
def main_run(): # Load data from file.csv myReader = csv.reader(open('filename.csv', 'rb'), delimiter = ',') for row in myReader: # do something |
Then with the following simple lines you define a classical GP program setting the maximum trees depth and selecting the fitness function (eval_func in this case):
genome = GTree.GTreeGP() genome.setParams(max_depth=3, method="ramped") genome.evaluator.set(eval_func) ga = GSimpleGA.GSimpleGA(genome) |
Then there is the most important part, the definition of terminals and functions:
ga.setParams(gp_terminals = ['x1', 'x2','x3','x4','x5','x6','x7', 'ephemeral:random.random()'], gp_function_prefix = "gp") |
Here I defined ‘x1‘ … ‘x7‘ as , thus I added a constant, a random float between 0 and 1 (you can use any kind of random function within random python package). For the functions is very easy, all the python functions whose name is starting with the specified prefix are considered GP functions (in my case I have gp_add, gp_mul and so on for other arithmetic operators).
Now you can end with the last settings:
ga.setMinimax(Consts.minimaxType["minimize"]) ga.setGenerations(1000) ga.setMutationRate(0.08) ga.setCrossoverRate(0.1) ga.setPopulationSize(2000) ga.evolve(freq_stats=5) best = ga.bestIndividual() best.writeDotImage("tree_pi.jpg") print best |
where you set evolutionary parameters (crossover, mutation etc), statistics console outputs, the plot of the best individuals and also the output to a png file of the tree of the best individual.
Now you can define the GP functions that as stated above whose name starts with ‘gp’:
def gp_add(a, b): return a+b def gp_sub(a, b): return a-b def gp_mul(a, b): return a*b |
And finally the fitness function:
def eval_func(chromosome): code_comp = chromosome.getCompiledCode() err = 0; for i in xrange(8, len(target)-1): x1 = target[i-1] x2 = target[i-2] # ... x7 = target[i-7] evaluated = eval(code_comp) err += (target[i] - evaluated) return (abs(err)) |
Here you set as error function the sum of the absolute prediction errors, you ‘compile’ the GP tree as python code and then for all the time series samples (that I defined as a list surprisingly called target) the for loop evaluates the error between target value and obtained value computing in the end the error.
At this point you can run your code and, et voilà, you obtain a predictor for your forecasting problem. Now you can really start to do research getting an accurate predictor tuning the algorithm parameters, selecting the right variables and constant as the GP functions. To get more information about pyevolve check also the specific Google group.