Generators and Coroutines are very powerful tools in Python that can help simplify logic, speed up data-intensive programs or provide flexible and re-useable APIs. In this post, we will explore three main concepts in Python : Generators, Coroutines and Cogenerators.
Generators
Generators in Python are objects that contain some sort of internal state, and know how to produce the “next” value in a sequence.
Before we talk about what generators are, we should talk about what problems they can help solve! By using generators you can:
- Iterating over data structures in a way that decouples your logic from the data structure
- Generators can be used to replace callbacks with iteration, you can perform work, and yield a value whenever you want to report back to the caller
- Processing data in small chunks so that only a small portion of the data is ever loaded into memory (Lazy evaluation)
Generators provide many methods, but the ones that we will focus on are __iter__
and __next__
. __next__
allows you to call value = next(some_generator);
this call will tell the generator to update it’s internal state and give you the next value. __iter__
allows your generator to implement the Iterator interface such that you can iterate over your generator using the element for element in some_generator
syntax (usually, if you already implement __next__
your __iter__
will just return self
, otherwise you can have an object create a new iterable object and return that).
Generators using classes
Let’s define a simple program that will use a generator to produce Fibonacci numbers. In this example, we will implement the generator class ourselves.
class FibonacciGenerator:
def __init__(self, n1=0, n2=1, max_iters=100):
self.max_iters = max_iters
self.current_iter = 0
self.n1 = n1
self.n2 = n2
def __next__(self):
if self.current_iter < self.max_iters:
self.current_iter += 1
sum_ = self.n1 + self.n2
self.n1 = self.n2
self.n2 = sum_
return sum_
else:
raise StopIteration
def __iter__(self):
return self
In the __init__
, we set the current number of iteration, the max number of iterations and the first two Fibonacci numbers n1
and n2
. In the __next__
method, we check if we are under the maximum number of iterations and if so compute the next Fibonacci number, update n1
and n2
and then return the next Fibonacci number. __iter__
is very simple and we can just return the object since the FibonacciGenerator
class implements __next__
.
We can then use this class to easily compute and iterate over Fibonacci numbers. We can exhaust all of the numbers by invoking [e for e in gen]
. If we try to get another value after the generator has been used up, an exception will be raised.
gen = FibonacciGenerator(max_iters=10)
nums = [e for e in gen]
print(nums)
try:
v = next(gen)
except Exception as e:
print("failed")
Generators using the Yield keyword
As seen above, we can implement a generator manually using a class. However, this requires a lot of boilerplate and somewhat obfuscates what the generator is actually doing when the internal state is more complex than a few integers.
However, Python can generate generator instance for us directly from function code when we use the yield
keyword! Let’s implement our Fibonacci number generator using yield
.
def fibonacci_generator(n1=0, n2=1, max_iters=100):
for i in range(max_iters):
sum_ = n1 + n2
n1 = n2
n2 = sum_
yield sum_
Looking at the code above, it is already much simpler and clearer than the class based example. When using yield
in a function like this, Python will automatically turn the function into a generator instance, while the yield
keyword will act somewhat like a return statement. More specifically, when next(generator)
is called, the function will run as expected until it encounters the yield
keyword, the value that was yielded is returned to the caller, and the function pauses until the called invoked next(generator)
again.
We can examine the generator and we see that the behavior is identical to our class based example:
gen = fibonacci_generator(max_iters=10)
nums = [e for e in gen]
print(gen)
print(nums)
try:
v = next(gen)
except Exception as e:
print("failed")
Example of using a generator to process data structures
Now that we’ve seen the details of implementing and invoking generators, we can take a look at an example of implementing a generator to traverse a data structure, while implementing the logic separately.
Let’s say that we have some data stored in a binary tree. Any logic that we want to perform on the tree would involve implementing our “business” logic and our traversal logic in the same place. Alternatively, we can implement a generator that will traverse the tree node by node, and yield the value at each step. We can them implement sum, min and max operations efficiently, without needing access to the internals of the traversal.
class Node:
def __init__(self, val, l, r):
self.val = val
self.l = l
self.r = r
def traverse_tree(root):
yield root.val
if root.l is not None:
for e in traverse_tree(root.l):
yield e
if root.r is not None:
for e in traverse_tree(root.r):
yield e
if __name__ == '__main__':
a = Node(1, None, None)
b = Node(2, None, None)
c = Node(4, None, None)
d = Node(8, None, None)
e = Node(-5, None, None)
a.l = b
a.r = c
b.l = d
c.l = e
all_vals = [e for e in traverse_tree(a)]
print(all_vals)
max_ = a.val
min_ = a.val
for val in traverse_tree(a):
if val > max_:
max_ = val
if val < min_:
min_ = val
print(max_, min_)
Coroutines
Coroutines share a lot of similarities with generators, but they provide a few extra methods and a bit of a difference in how the yield
keyword is used. In essence, coroutines consume values sent by the caller, instead of returning values to the caller. In terms of technical details, the main differences are:
- Coroutines use
send(val)
instead of__next__()
. The coroutine will then have access to the value sent. - Coroutines need to be “primed”. That means you need to initialize it properly before you can start using it (this will raise an error)
- Like generators, coroutines are suspended on
yield
keyword, This is can lead to unintuitive behavior if not expected
Let’s implement a simple coroutine that accepts values and prints them. The key change here is that we will access the value sent by the caller with val = (yield)
.
def simple_coroutine(max_iters=5):
for i in range(max_iters):
print(f'before yield i = {i}')
val = (yield)
print(f"after yield, val = {val}, i = {i}")
To use the coroutine, we need to “prime” it by either calling coroutine.send(None)
or next(coroutine)
(these two statements are equivalent). Once the coroutine has been primed, we can iterate over it in the same manner as a generator, with the difference that coroutine.send(val)
is used in place of next(generator)
. Coroutines will even fail the same way as generators if exhausted.
coroutine = simple_coroutine()
# need to prime the coroutine
print('before priming')
next(coroutine)
print('before send')
coroutine.send('Dummy val a')
print('main thread after a')
coroutine.send('Dummy val b')
print('main thread after b')
coroutine.send('Dummy val c')
print('main thread after c')
coroutine.send('Dummy val d')
try:
coroutine.send('Dummy val e')
except:
print("failed")
The Yield keyword
The yield keyword seems to behave unintuitively, but we can break down what exactly it does to understand the underlying model.
- Function will run until it encounters
yield
keyword. The function is then suspended - If you
yield value
, then the value is returned to the caller ofsend(…)
ornext()
- The function will then wait until the next time
send(…)
ornext()
is called. If a value is sent, you can access it byvalue = (yield)
Additionally, We can combine generator and coroutine syntax into send_val = yield return_val
. This implies that we can have objects which will be both generators and coroutines.
Co-Generators
I like to call objects that are both generators and coroutines, “Co-Generators” as it helps disambiguate how the object should be interacted with. We can now implement a co-generator which will both accept and yield values.
When dealing with co-generators, it is key to understand the statement sent_val = yield return_val
executes in two distinct stages. Upon the next(cogenerator)
or cogenerator.send(None)
, the function will execute up until the yield val
statement, which will immediately return the value to the caller. The function will then be suspended until the first call of cogenerator.send(some_val)
, which will take the value, pass it into the function and will be assigned to sent_val
. This means that you can have some external code run after yield val
but before val = (yield)
!
Below we can see an example of a co-generator that sends and yields values, with several print statements that will execute between both steps of the yield evaluation.
def complex_cogenerator(max_iters=5):
print('start of cogenerator')
for i in range(max_iters):
print(f'start of loop, i={i}')
val = yield i
print(f'end of loop, i={i}, val={val}')
print('end of cogenerator')
yield None
if __name__ == '__main__':
print('start of main')
co_gen = complex_cogenerator()
print('after cogenerator creation')
v = next(co_gen)
print(f'after cogenerator priming, v={v}')
while v is not None:
print(f'main thread before send, v={v}')
v = co_gen.send('Dummy val a')
print(f'main thread after send, v={v}')
When running this example, we will note a few things:
- no logic runs when
complex_cogenerator()
is called. In fact this function behaves like an initializer, rather than an actual function. - The co-generator needs to be primed before it can be used. But after the priming, we can iterate over the co-generator by using a
while
loop. - The order of execution of the print statements is non-obvious, but makes sense when accounting for the two-step execution of the
a = yield b
statement.
Conclusion
In this article, we took a look at generator classes, generators using yield
, coroutines using yield
and how to combine generators and coroutines into co-generators.
Hopefully you will be able to leverage this knowledge to build better abstractions around you data structures for more flexible, robust and performant code.