Generators

An iterator is an object that facilitates sequential access to elements in a container (e.g., a list, tuple, or dictionary) without requiring all elements to be processed or loaded into memory simultaneously. Instead, it accesses the existing container directly, returning one element at a time (lazy evaluation). The next() method is a built-in method that advances the iterator to the next item in the container and returns its value. If there are no more items to return, it raises a StopIteration exception.

Do not confuse it with an iterable, which is a data structure we can iterate through with a loop.


l = ["1", "2", "3"]
iter_l = iter(l) # iter() is a built-in method that creates an iterator from an iterable object

# Iterating through the list
try:
    while True:
        print(next(iter_l))
except StopIteration:
    pass       

# Restarting the iterator
iter_l = iter(l)

if len(l) > 0:
    first_element = next(iter_l) # moving to the first element
    l.remove(first_element) # removing the first element (the iterator is now pointing to the next valid element - "2")

# Displaying the list after removal
iter_l = iter(l)
try:
    while True:
        print(next(iter_l))
except StopIteration:
    pass

We can prevent the StopIteration error by providing a second argument to the next() function (a default value for an exhausted iterator).


l = ["1", "2", "3"]
iter_l = iter(l)
while True:
    value = next(iter_l, "End")
    if value == "End":
        break
    print(value)

We can create custom iterables by defining a class that implements the __next__() and __iter__() dunder methods, making the class an Iterator. Below, you can see an example of the Iterator design pattern, which you can learn more about here.


class MyList:
    def __init__(self, values):
        self.values = values

    def __iter__(self):
        self.index = 0 # initializing the iterator (starting position)
        return self # returning the class object itself (an object can make itself iterable by returning self from __iter__())

    def __next__(self):
        # Checking if there are still elements left
        if self.index < len(self.values):
            value = self.values[self.index]
            self.index += 1
            return value
        else:
            raise StopIteration

my_list = MyList(["1", "2", "3"])

# Using our Iterator in a loop
for item in my_list:
    print(item)

A generator is a more convenient and efficient way to create an iterator object. Unlike traditional iterators, generators are defined using functions that contain one or more yield statements instead of return.

When the yield instruction executes inside the generator, it pauses its execution, returns a value to the caller (e.g., a for loop or the next() instruction), and resumes the execution. The state of the function, including its local variables and the point of execution, is preserved so that it can resume from where it left off. This behavior allows the generator to produce a sequence of values lazily, one at a time, without storing the entire sequence in memory. Because of that, they are very memory-efficient. Unlike traditional functions that use return and terminate after returning a single value, a generator can yield multiple values over time.


def gen(n):
    for i in range(n):
        yield i
    
for x in gen(5):
    print(x)

The two examples below have the same output in terms of the values yielded, but they differ in behavior after the last value is yielded. In the second program, the last value printed is 4, and not y because when we manually call next(), the generator function halts after yielding all its values (when the generator object is exhausted) and does not proceed to the last print("y") statement. In contrast, the loop in the first example automatically handles the entire iteration, which allows it to reach the final print("y") statement.


def generator(n):
    for x in range(n):
        print("x")
        yield x
        print("y")

for y in generator(5):
    print(y)


def generator(n):
    for x in range(n):
        print("x")
        yield x
        print("y")

x = generator(5)
for i in range(5):
    print(next(x))

Practical examples

The use case of a generator is such that we can loop through a sequence or a large amount of data without needing to store all of it at once. We use it when we do not care about the data before and after the particular iteration (e.g., when we are processing it right away).


def read_large_file(file_path):
    with open(file_path, "r") as file:
        for line in file:
            yield line

def process_line(line):
    if "x" in line:
        print(line)

for line in read_large_file("huge_file.txt"):
    process_line(line)


def powers_of_2(n):
    p = 1
    for i in range(n):
        yield p
        p *= 2
    
for x in powers_of_2(8):
    print(x)

Using *args with a generator can be risky and memory-intensive. In the example below, when we call func(*g), Python expands the generator g into individual arguments all at once before passing them to the function. Suppose the generator produces a large or infinite sequence. In that case, Python will attempt to materialize all elements in memory simultaneously, which can lead to excessive memory usage or even crash the program with a MemoryError, defeating the generator’s usual one-at-a-time, lazy behavior.


def generator():
    for x in range(10):
        yield x

def func(*args):
    print(args)

g = generator()
func(*g)

Addition to the lesson - custom lists

Earlier, I mentioned creating custom iterables by implementing the __next__() and __iter__() dunder methods, but this should only be done if we need a fully custom iterable. If we only need a simple customized list with all the built-in methods of a normal list along with some additional functionalities, we can inherit from list. A custom list like this is necessary to implement the Observer design pattern, which you can learn more about here.


class MyList(list):
    def __init__(self, values = None):
        super().__init__(values or []) # initializing with values or an empty list
        self.history = [] # keeping track of modifications

    def append(self, item):
        self.history.append(f"Appended {item}") # logging the change
        super().append(item) # calling the original inherited append method

    def remove(self, item):
        if item in self:
            self.history.append(f"Removed {item}") # logging removal
            super().remove(item)
        else:
            print(f"Item {item} not in list")

    def show_history(self):
        return self.history

my_list = MyList([1, 2, 3])
print("Initial list:", my_list)
my_list.append(4)
print("Updated list:", my_list)
my_list.remove(2)
print("After removing 2:", my_list)
print("Accessing first item:", my_list[0])
print("Modification history:", my_list.show_history())