Evolution of the sort in Python and the Role of functools.cmp_to_key
In this article, we will explore the evolution of sorting in Python from Python 2 to 3 and look into the cmp_to_key function from functools.
In Python, the sort
method and the sorted
callable are commonly used for sorting operations. sort
is a list method that modifies the list in place, whereas sorted takes an iterable as its first argument and returns a sorted list containing the iterable’s elements.
Both of these use the Timsort algorithm under the hood to perform the sorting operation.
In this article, we will explore the evolution of sorting in Python from Python 2 to 3 and look into the cmp_to_key
function from functools
.
Background
Below are the type annotations for sort
and sorted
, from the Python typeshed library.
@overload
def sorted(
__iterable: Iterable[SupportsRichComparisonT],
*,
key: None = None,
reverse: bool = False
) -> list[SupportsRichComparisonT]:
...
@overload
def sorted(
__iterable: Iterable[_T],
*,
key: Callable[[_T], SupportsRichComparison],
reverse: bool = False
) -> list[_T]:
...
class list(MutableSequence[_T]):
@overload
def sort(
self: list[SupportsRichComparisonT], *, key: None = None, reverse: bool = False
) -> None:
...
@overload
def sort(
self, *, key: Callable[[_T], SupportsRichComparison], reverse: bool = False
) -> None:
...
Both sort
and sorted
accept an optional keyword-only argument key
. It should be a callable and if supplied, will be used to calculate the comparison key.
From Python docs:
key specifies a function of one argument that is used to extract a comparison key from each list element (for example,
key=str.lower
). The key corresponding to each item in the list is calculated once and then used for the entire sorting process.
Now let’s take a look at the Python 2 docs for the sorted built-in:
sorted(iterable[, cmp[, key[, reverse]]])
Return a new sorted list from the items in iterable.
The optional arguments cmp, key, and reverse have the same meaning as those for the list.sort() method (described in section Mutable Sequence Types).
cmp specifies a custom comparison function of two arguments (iterable elements),
which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument.
...
There is an additional argument cmp
compared to what we have seen in Python 3.
The docs also mention:
In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function. This is because cmp is called multiple times for each list element while key and reverse touch each element only once. Use functools.cmp_to_key() to convert an old-style cmp function to a key function.
The cmp
argument was removed in Python 3, and the cmp_to_key helper to convert comparison functions to new-style key functions has been added to the functools module.
Usage in the Wild
If we have a comparison function (which takes two arguments and returns their comparison result), this can be converted to a sort key by using cmp_to_key
function provided in the functools
module.
The below example from the pretty_midi, a MIDI processing library illustrates the use of cmp_to_key
.
class PrettyMIDI(object):
"""A container for MIDI data in an easily-manipulable format."""
...
def write(self, filename):
"""Write the MIDI data out to a .mid file."""
def event_compare(event1, event2): # <--- (1)
"""Compares two events for sorting.
Events are sorted by tick time ascending. Events with the same tick
time ares sorted by event type. Some events are sorted by
additional values. For example, Note On events are sorted by pitch
then velocity, ensuring that a Note Off (Note On with velocity 0)
will never follow a Note On with the same pitch.
Parameters
----------
event1, event2 : mido.Message
Two events to be compared.
"""
secondary_sort = {
"set_tempo": lambda e: (1 * 256 * 256),
"time_signature": lambda e: (2 * 256 * 256),
"key_signature": lambda e: (3 * 256 * 256),
"lyrics": lambda e: (4 * 256 * 256),
"text_events": lambda e: (5 * 256 * 256),
"program_change": lambda e: (6 * 256 * 256),
"pitchwheel": lambda e: ((7 * 256 * 256) + e.pitch),
"control_change": lambda e: (
(8 * 256 * 256) + (e.control * 256) + e.value
),
"note_off": lambda e: ((9 * 256 * 256) + (e.note * 256)),
"note_on": lambda e: ((10 * 256 * 256) + (e.note * 256) + e.velocity),
"end_of_track": lambda e: (11 * 256 * 256),
}
# If the events have the same tick, and both events have types
# which appear in the secondary_sort dictionary, use the dictionary
# to determine their ordering.
if (
event1.time == event2.time
and event1.type in secondary_sort
and event2.type in secondary_sort
):
return (secondary_sort[event1.type](event1) -
secondary_sort[event2.type](event2))
# Otherwise, just return the difference of their ticks.
return event1.time - event2.time
# Create track 0 with timing information
timing_track = mido.MidiTrack()
... # Code processing the timing_track
# Sort the (absolute-tick-timed) events.
timing_track.sort(key=functools.cmp_to_key(event_compare)) # <--- (2)
...
The event_compare
function (pointed out in comment (1)) provides a specific comparison logic for MIDI events, ensuring they are sorted correctly by their tick time and type and additional criteria for certain event types. Then, functools.cmp_to_key
is used to convert this comparison function into a key function ( pointed out in comment (2)).
Should You Use It
As cmp_to_key
was aimed to help transition from Python 2 style cmp
functions to key
functions, its usage is not very common in newer projects. However, it can come in handy when we already have a comparison function defined in our code to compare two objects and need to sort an iterable of these objects. In such a case, cmp_to_key will allow us to reuse our comparison logic (defined in our comparison function) without defining a new key function.
Behind the Scenes
Below is the source code of cmp_to_key
from the functools module
def cmp_to_key(mycmp):
"""Convert a cmp= function into a key= function"""
class K(object):
__slots__ = ['obj']
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) < 0
def __gt__(self, other):
return mycmp(self.obj, other.obj) > 0
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) <= 0
def __ge__(self, other):
return mycmp(self.obj, other.obj) >= 0
__hash__ = None
return K
cmp_to_key
intelligently maps the original comparison function to a key function suitable for sorting by implementing all the comparison dunders.
It returns a new callable (class K) with all the comparison dunders, which will be called during the sorting process to calculate the sort key. Since the sorting process compares each of the calculated keys, the comparison dunders will get invoked, and they will return based on the result from the original comparison function.
For example, during sorting, when K object
< another K object
needs to be checked, Python will actually call the lt
dunder method on K. This method simply calls the original cmp function to compare the obj
attributes of the two K instances.
I share interesting Python snippets 🐍 like this from open-source projects illustrating Python language features in my newsletter, “Python in the Wild”.
Subscribe to the newsletter on Substack or Linkedin to receive new Pythonic posts to your email 💌🚀.