Functional Python Programming
上QQ阅读APP看书,第一时间看更新

Pairing up items from a sequence

A common restructuring requirement is to make start-stop pairs out of points in a sequence. Given a sequence, , we would also want to create a paired sequence, . The first and second items form a pair. The second and third items form the next pair. When doing time-series analysis, we may be combining more widely separated values. In this example, the pairs are immediately adjacent values.

A paired sequence will allow us to use each pair to compute distances from point to point using a trivial application of a haversine function. This technique is also used to convert a path of points into a series of line segments in a graphics, application.

Why pair up items? Why not do something such as this:

begin= next(iterable)
for end in iterable:
    compute_something(begin, end)
    begin = end  

This, clearly, will process each leg of the data as a begin-end pair. However, the processing function and the loop that restructures the data are tightly bound, making reuse more complex than necessary. The algorithm for pairing is hard to test in isolation because it's bound to the compute_something() function.

This combined function also limits our ability to reconfigure the application. There's no easy way to inject an alternative implementation of the compute_something() function. Additionally, we've got a piece of an explicit state, the begin variable, which makes life potentially complex. If we try to add features to the body of loop, we can easily fail to set the begin variable correctly if a point is dropped from consideration. A filter() function introduces an if statement that can lead to an error in updating the begin variable.

We achieve better reuse by separating this simple pairing function. This, in the long run, is one of our goals. If we build up a library of helpful primitives such as this pairing function, we can tackle problems more quickly and confidently.

There are many ways to pair up the points along the route to create start and stop information for each leg. We'll look at a few here and then revisit this in Chapter 5, Higher-Order Functions, and again in Chapter 7, The Itertools Module. Creating pairs can be done in a purely functional way using a recursion.

The following code is one version of a function to pair up the points along a route:

from typing import Iterator, Any
Item_Iter = Iterator[Any]
Pairs_Iter = Iterator[Tuple[float, float]]
def pairs(iterator: Item_Iter) -> Pairs_Iter:
def pair_from(
head: Any,
iterable_tail: Item_Iter) -> Pairs_Iter:
nxt= next(iterable_tail) yield head, nxt yield from pair_from(nxt, iterable_tail)
try: return pair_from(next(iterator), iterator) except StopIteration: return iter([])

The essential work is done by the internal pair_from() function. This works with the item at the head of an iterator plus the iterator object itself. It yields the first pair, pops the next item from the iterable, and then invokes itself recursively to yield any additional pairs.

The type hints state the parameter, iterator, must be of type Item_Iter. The result is of the Pairs_Iter type, an iterator over two-tuples, where each item is a float type. These are hints used by the mypy program to check that our code is likely to work. The type hint declarations are contained in the typing module.

The input must be an iterator that responds to the next() function. To work with a collection object, the iter() function must be used explicitly to create an iterator from the collection.

We've invoked the pair_from() function from the pairs() function. The pairs() function ensures that the initialization is handled properly by getting the initial item from the iterator argument. In the rare case of an empty iterator, the initial call to next() will raise a StopIteration exception; this situation will create an empty iterable.

Python's iterable recursion involves a for loop to properly consume and yield the results from the recursion. If we try to use a simpler-looking return pair_from(nxt, iterable_tail) statement, we'll see that it does not properly consume the iterable and yield all of the values. Recursion in a generator function requires yield from a statement to consume the resulting iterable. For this, use yield from recursive_iter(args). Something like return recursive_iter(args) will return only a generator object; it doesn't evaluate the function to return the generated values.

Our strategy for performing tail-call optimization is to replace the recursion with a generator expression. We can clearly optimize this recursion into a simple for loop. The following code is another version of a function to pair up the points along a route:

from typing import Iterator, Any, Iterable, TypeVar
T_ = TypeVar('T_')
Pairs_Iter = Iterator[Tuple[T_, T_]]
def legs(lat_lon_iter: Iterator[T_]) -> Pairs_Iter:
begin = next(lat_lon_iter) for end in lat_lon_iter: yield begin, end begin = end

The version is quite fast and free from stack limits. It's independent of any particular type of sequence, as it will pair up anything emitted by a sequence generator. As there's no processing function inside the loop, we can reuse the legs() function as needed.

The type variable, T_, created with the TypeVar function, is used to clarify precisely how the legs() function restructures the data. The hint says that the input type is preserved on output. The input type is an Iterator of some arbitrary type, T_; the output will include tuples of the same type, T_. No other conversion is implied by the function.

The begin and end variables maintain the state of the computation. The use of stateful variables doesn't fit the ideal of using immutable objects for functional programming. The optimization is important. It's also invisible to users of the function, making it a Pythonic functional hybrid.

We can think of this function as one that yields the following kind of sequence of pairs:

list[0:1], list[1:2], list[2:3], ..., list[-2:]

Another view of this function is as follows:

zip(list, list[1:])

While informative, this zip-based version only work for sequence objects. The legs() and pairs() functions work for any iterable, including sequence objects.