Among the new features that will be introduced in PHP 5.5, the probably most exiting one is the concept of generators.
What are Generators?
Lets first look at what Wikipedia has to say about generators:
In computer science, a generator is a special routine that can be used to control the iteration behaviour of a loop.
A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values.
So basically a generator is a special function that creates a sequence of values and is used in loops. What then is the difference to a regular function returning an array?
There are cases where you cannot generate all values at once (think of infinite sequences or memory limitations) and this is where generators come in very handy as they produce the values on demand.
Generators in other languages
Generators first appeared back in 1975 in a language called CLU which was created at MIT. CLU also introduced other features such as multiple return values and multiple assignment that influenced modern languages like Ruby, Python and Go.
First lets start off by converting above Python script into its PHP equivalent:
fib() function implements a generator that produces the fibonacci numbers one by one. For each computed value it uses the
yield keyword with the value as parameter. PHP 5.5 automatically treats every function that contains a
yield call as a generator.
Generators are implemented as a subclass of the
The initial call to a generator function will return an object of the Generator class. This can easily be tested:
Iterators have been available in PHP for quite some time and implementing Generators in the same manner, makes them automatically usable in places where you earlier would have used an Iterator. So we now know how a Generator is implemented, but what about that
yield keyword behaves much like
return. It passes a value back to the caller of the function, but instead of completely removing the function from the stack, its state is saved for re-entry. Whenever the generator function is called again (in this case a call to
next() on the Generator object - remember, it implements the Iterator interface) the execution is passed back to the point of the last
In PHP, unlike other languages, Iterators always consist of a key and a value. Yield has support for yielding keys using a similar syntax to that used in foreach loops and associative arrays:
yield $key => $value;. Internally generators need to generate key is none way explicitly yielded. The behaviour of arrays was adapted here: By default the keys start with the integer value 0 and auto-increment by one. If an explicitly yielded key is larger than the current key, it is used as the new starting-point for auto-generated keys.
Other keys, like strings, do not affect the mechanism of auto-incremented keys.
To test the performance of generators against other common methods of iteration Nikita Popov wrote a micro benchmark that tested the execution time of a simple loop with one hundred, ten thousand and one million iterations. The methods tested are a iterator implementation, the range function, a self-generated array and generators (here xrange).
His initial tests showed that the generators consistently outperform all other methods by at least factor 1.5 given a high number of iterations (>= 10000).
I ran his benchmark on my Thinkpad x61s with a self-compiled PHP 5.5 Alpha 1 on Ubuntu 12.04 and used
memory_get_peak_usage() to get the maximum amount of memory used for each test. The tests were separated into multiple files to ensure that the memory usage really comes from a single method.
The tests were run multiple times which showed that the results obtained are consistent over time. The results in the listing above come from a single run and confirm the numbers obtained by Nikita Popov. They also show, that generators are memory efficient in that their memory usage is in O(1). The memory usage for range and urange scale linearly and quickly let PHP run into its memory limit.
Note: The benchmark done here is still very simple and results in real use cases can vary (e.g. iterating over objects instead of numbers).
Generators seem to outperform Iterators and arrays for big data sets in both, speed and memory efficiency. They are also much faster to implement, since you do not have to extend the Iterator class. Personally, this is one of the few features in PHP 5.5 I am really excited about, but given the slow adoption rate of new PHP versions by most web hosts, I fear that it will take a long time until we see them used in popular projects. PHP 5.5 itself is roughly scheduled for March 2013 (one year after PHP 5.4 was released), so implementation details might change until then.