itertools Tricks I Actually Use

itertools Tricks I Actually Use

itertools has a lot of stuff. Here are the ones I actually reach for.

Why bother with itertools?

Two main reasons: memory efficiency and cleaner code. When you're working with large datasets, itertools functions return iterators instead of building whole lists in memory. That log file with 10 million lines? You can process it without loading the entire thing into RAM.

Plus, once you know these patterns, your code becomes way more readable. chain(lists) is clearer than manual concatenation. groupby(data, key) beats a bunch of if statements tracking the previous value.

Here's a concrete example. Say you're processing CSV files and need to find duplicate entries:

# Memory-hungry approach
with open('huge_file.csv') as f:
    all_lines = f.readlines()  # Loads entire file into memory
    seen = set()
    for line in all_lines:
        # process...

# Memory-efficient with itertools
from itertools import islice

with open('huge_file.csv') as f:
    # Skip header
    data = islice(f, 1, None)
    seen = set()
    for line in data:
        # Same processing, but data is processed line by line

The second approach never loads more than one line at a time. For a 5GB file, that's the difference between your script running or crashing.

chain: flatten iterables

from itertools import chain

all_items = list(chain(list1, list2, list3))
# same as list1 + list2 + list3 but works with any iterable

The real power move is chain.from_iterable() for nested structures:

# You have a list of lists
nested = [[1, 2], [3, 4], [5, 6]]

# Don't do this
flat = []
for sublist in nested:
    flat.extend(sublist)

# Do this instead
from itertools import chain
flat = list(chain.from_iterable(nested))

Works great when you're combining results from multiple API calls or flattening query results.

groupby: group consecutive items

from itertools import groupby

# Data must be sorted by the key first!
data = sorted(items, key=lambda x: x['category'])
for category, group in groupby(data, key=lambda x: x['category']):
    print(f'{category}: {list(group)}')

Here's a real example I used recently - grouping log entries by date:

from itertools import groupby
from datetime import datetime

# Parse log lines with timestamps
logs = [
    {'timestamp': '2025-01-15', 'message': 'Started'},
    {'timestamp': '2025-01-15', 'message': 'Error occurred'},
    {'timestamp': '2025-01-16', 'message': 'Restarted'},
    {'timestamp': '2025-01-16', 'message': 'All clear'},
]

# Group by date
for date, entries in groupby(logs, key=lambda x: x['timestamp']):
    messages = [entry['message'] for entry in entries]
    print(f"{date}: {len(messages)} events")
    print(f"  {', '.join(messages)}")

Remember: groupby only groups consecutive items. Always sort first unless your data is already ordered.

Common groupby gotcha

The most common mistake with groupby is consuming the iterator twice. The groups are iterators, and they get exhausted:

from itertools import groupby

data = [{'type': 'A', 'val': 1}, {'type': 'A', 'val': 2}, {'type': 'B', 'val': 3}]
data.sort(key=lambda x: x['type'])

for key, group in groupby(data, key=lambda x: x['type']):
    # This exhausts the iterator
    count = len(list(group))
    # This gets nothing - group is already consumed!
    values = [item['val'] for item in group]
    print(f"{key}: {count} items, values: {values}")

# Output: A: 2 items, values: []  <-- values is empty!

The fix is to convert to a list once:

for key, group in groupby(data, key=lambda x: x['type']):
    items = list(group)  # Convert to list first
    count = len(items)
    values = [item['val'] for item in items]
    print(f"{key}: {count} items, values: {values}")

# Output: A: 2 items, values: [1, 2]

pairwise: sliding windows

Python 3.10 added pairwise() and it's perfect for comparing adjacent elements:

from itertools import pairwise

prices = [100, 105, 103, 108, 102]
for prev, curr in pairwise(prices):
    change = curr - prev
    print(f"{prev} -> {curr}: {'+' if change > 0 else ''}{change}")

Great for detecting transitions, calculating deltas, or validating sequences are in order.

Before Python 3.10, you'd have to write your own:

# The old way (still useful to know)
def manual_pairwise(iterable):
    a, b = iter(iterable), iter(iterable)
    next(b, None)
    return zip(a, b)

# Or using itertools.tee
from itertools import tee

def pairwise_tee(iterable):
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

If you're stuck on Python 3.9 or earlier, the tee version is the standard recipe from the itertools docs.

takewhile and dropwhile: conditional iteration

These let you slice iterables based on conditions instead of indices:

from itertools import takewhile, dropwhile

numbers = [1, 3, 5, 8, 10, 2, 4]

# Take elements while they're odd
small = list(takewhile(lambda x: x < 10, numbers))
# [1, 3, 5, 8]

# Drop elements while they're odd, keep the rest
rest = list(dropwhile(lambda x: x % 2 == 1, numbers))
# [8, 10, 2, 4]

I use these for processing data until some condition changes - like reading config lines until you hit a section marker.

accumulate: running totals

accumulate() is like a running reduce:

from itertools import accumulate

daily_sales = [100, 150, 200, 175]
running_total = list(accumulate(daily_sales))
# [100, 250, 450, 625]

# Works with custom operations too
from operator import mul
running_product = list(accumulate([2, 3, 4], mul))
# [2, 6, 24]

Perfect for cumulative metrics, running averages, or tracking state over a sequence.

islice: slice any iterable

from itertools import islice

# Get first 10 items from a generator
first_10 = list(islice(huge_generator, 10))

# Skip first 5, take next 10
middle = list(islice(huge_generator, 5, 15))

Unlike regular slicing, this works on any iterable - even ones that don't support indexing like generators or file objects.

Here's a real scenario: paginating API results without loading everything:

from itertools import islice

def fetch_all_users():
    \"\"\"Generator that yields users from paginated API\"\"\"
    page = 1
    while True:
        response = api.get(f'/users?page={page}')
        if not response['users']:
            break
        yield from response['users']
        page += 1

# Get users 50-75 without loading the first 49 into memory
users_50_to_75 = list(islice(fetch_all_users(), 50, 75))

# Or process in chunks of 100
def process_in_batches(generator, batch_size=100):
    while True:
        batch = list(islice(generator, batch_size))
        if not batch:
            break
        yield batch

for batch in process_in_batches(fetch_all_users()):
    bulk_update(batch)

This pattern saved me when I had to migrate 500k user records from one system to another. The old system's API was slow, but processing in batches kept memory usage constant.

zip_longest: zip without truncation

Regular zip() stops at the shortest iterable. Sometimes you want to keep going:

from itertools import zip_longest

names = ['Alice', 'Bob', 'Charlie']
scores = [85, 92]

# Regular zip loses Charlie
list(zip(names, scores))  # [('Alice', 85), ('Bob', 92)]

# zip_longest keeps everyone
list(zip_longest(names, scores, fillvalue=0))
# [('Alice', 85), ('Bob', 92), ('Charlie', 0)]

combinations and permutations

Need all pairs? All arrangements?

from itertools import combinations, permutations

items = ['A', 'B', 'C']

# All unique pairs (order doesn't matter)
list(combinations(items, 2))
# [('A', 'B'), ('A', 'C'), ('B', 'C')]

# All arrangements (order matters)
list(permutations(items, 2))
# [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

Be careful - these grow fast. 10 items choosing 5 is 252 combinations.

Here's a practical example - testing all pairs of settings to find incompatible combinations:

from itertools import combinations

settings = ['debug_mode', 'cache_enabled', 'compression', 'verbose_logging']

# Test all pairs for conflicts
for setting1, setting2 in combinations(settings, 2):
    if conflicts(setting1, setting2):
        print(f"Warning: {setting1} conflicts with {setting2}")

# Or generating test cases
from itertools import product

browsers = ['chrome', 'firefox', 'safari']
screen_sizes = ['mobile', 'tablet', 'desktop']
themes = ['light', 'dark']

# All combinations (Cartesian product)
test_cases = list(product(browsers, screen_sizes, themes))
# 3 * 3 * 2 = 18 test cases

for browser, size, theme in test_cases:
    run_test(browser, size, theme)

The difference: combinations gives you unique pairs from one list. product gives you all combinations across multiple lists (like nested for loops).

batched: split into chunks (Python 3.12+)

from itertools import batched

for batch in batched(items, 100):
    process_batch(batch)

This one's new and I use it constantly for processing data in batches. API rate limits? Batch your requests. Bulk database inserts? Batch them. Processing a huge file? Batch it.

If you're on Python 3.11 or earlier, here's the equivalent using islice:

def batched(iterable, n):
    \"\"\"Batch data into lists of length n. The last batch may be shorter.\"\"\"
    iterator = iter(iterable)
    while True:
        batch = list(islice(iterator, n))
        if not batch:
            return
        yield batch

# Now you can use it like the built-in
for batch in batched(range(10), 3):
    print(batch)
# [0, 1, 2]
# [3, 4, 5]
# [6, 7, 8]
# [9]

I keep this snippet in my utils because I constantly need to batch database operations:

from itertools import batched  # or the custom version above

def bulk_insert_users(users):
    \"\"\"Insert users in batches to avoid overwhelming the database\"\"\"
    for batch in batched(users, 1000):
        db.execute(
            "INSERT INTO users (name, email) VALUES (%s, %s)",
            [(u.name, u.email) for u in batch]
        )
        db.commit()  # Commit after each batch

Real-world use case: data processing pipeline

Here's how these combine in practice. I recently processed server logs to find request patterns:

from itertools import islice, groupby, chain

def process_logs(log_file):
    # Parse lines, skip first 10 (headers)
    parsed = (parse_line(line) for line in islice(log_file, 10, None))

    # Group by hour
    by_hour = groupby(parsed, key=lambda x: x['timestamp'].hour)

    # Get top requests per hour
    for hour, requests in by_hour:
        top = sorted(requests, key=lambda x: x['duration'], reverse=True)[:5]
        yield (hour, top)

# Chain results from multiple files
all_results = chain.from_iterable(
    process_logs(open(f)) for f in log_files
)

Memory efficient, readable, and it handles gigabytes of logs without breaking a sweat.

itertools vs list comprehensions

When should you reach for itertools instead of a list comprehension?

Use itertools when:
- Working with large datasets (memory matters)
- You need lazy evaluation (process as you go)
- The pattern has a name (groupby, chain, etc. are clearer than manual code)
- Chaining multiple operations on iterables

Use list comprehensions when:
- The data fits in memory easily
- You need random access to results
- The transformation is simple and obvious
- You're doing simple filtering or mapping

Often I'll use itertools to process data and list comprehensions to transform it: [transform(x) for x in islice(data, 100)].

Performance deep dive

Let's talk numbers. Here's a benchmark comparing different approaches to flattening a list:

import timeit
from itertools import chain

nested = [[i] * 100 for i in range(1000)]

# Method 1: Manual loop
def manual_flatten(nested):
    result = []
    for sublist in nested:
        result.extend(sublist)
    return result

# Method 2: List comprehension
def list_comp_flatten(nested):
    return [item for sublist in nested for item in sublist]

# Method 3: chain.from_iterable
def chain_flatten(nested):
    return list(chain.from_iterable(nested))

# Results on my machine:
# manual_flatten: 8.2ms
# list_comp_flatten: 6.8ms
# chain_flatten: 5.1ms

The itertools version is fastest and most readable. But the real win is memory:

# Memory comparison with a huge dataset
import sys

huge_nested = [[i] * 10000 for i in range(10000)]

# List comprehension builds entire result in memory
result = [item for sublist in huge_nested for item in sublist]
print(sys.getsizeof(result))  # ~800MB

# chain returns an iterator - almost no memory
from itertools import chain
lazy_result = chain.from_iterable(huge_nested)
print(sys.getsizeof(lazy_result))  # 48 bytes!

# Process it lazily
for i, item in enumerate(lazy_result):
    if i % 1000000 == 0:
        process(item)

When you're dealing with production data, this difference matters.

Combining itertools functions

The real power comes from combining these tools. Here are some patterns I use regularly:

Pattern 1: Sliding window aggregation

from itertools import islice

def sliding_window(iterable, size):
    \"\"\"Generate a sliding window over an iterable\"\"\"
    iterators = [iter(iterable) for _ in range(size)]
    for i, it in enumerate(iterators):
        for _ in range(i):
            next(it, None)
    return zip(*iterators)

# Calculate moving average
prices = [100, 102, 98, 105, 103, 99, 101]
for window in sliding_window(prices, 3):
    avg = sum(window) / len(window)
    print(f"Window {window}: avg = {avg:.2f}")

# Or use accumulate for running calculations
from itertools import accumulate, chain

def moving_average(values, window_size):
    \"\"\"Memory-efficient moving average\"\"\"
    window = []
    for value in values:
        window.append(value)
        if len(window) > window_size:
            window.pop(0)
        yield sum(window) / len(window)

averages = list(moving_average(prices, 3))

Pattern 2: Grouping with multiple keys

from itertools import groupby
from operator import itemgetter

# Group by multiple fields
transactions = [
    {'date': '2025-01-15', 'category': 'food', 'amount': 20},
    {'date': '2025-01-15', 'category': 'transport', 'amount': 5},
    {'date': '2025-01-15', 'category': 'food', 'amount': 15},
    {'date': '2025-01-16', 'category': 'food', 'amount': 25},
]

# Sort by date, then category
transactions.sort(key=itemgetter('date', 'category'))

# Group by date first
for date, day_trans in groupby(transactions, key=itemgetter('date')):
    day_list = list(day_trans)
    print(f"\n{date}:")

    # Then group by category within each day
    day_list.sort(key=itemgetter('category'))
    for category, cat_trans in groupby(day_list, key=itemgetter('category')):
        total = sum(t['amount'] for t in cat_trans)
        print(f"  {category}: ${total}")

Pattern 3: Filtering and transforming pipelines

from itertools import islice, takewhile, chain

def process_data_pipeline(filename):
    \"\"\"A real-world data processing pipeline\"\"\"
    with open(filename) as f:
        # Skip comments and empty lines
        lines = (line.strip() for line in f)
        lines = (line for line in lines if line and not line.startswith('#'))

        # Take only the data section (until we hit a separator)
        data_section = takewhile(lambda x: x != '---', lines)

        # Parse and validate
        records = (parse_record(line) for line in data_section)
        valid_records = (r for r in records if r.is_valid())

        # Process in batches of 100
        for batch in batched(valid_records, 100):
            yield batch

# Use it
for batch in process_data_pipeline('data.txt'):
    database.bulk_insert(batch)

Testing itertools-heavy code

Itertools can make testing tricky because you're working with iterators. Here are some tips:

from itertools import chain, islice
import pytest

def test_with_itertools():
    # Problem: iterators get exhausted
    data = chain([1, 2], [3, 4])
    assert list(data) == [1, 2, 3, 4]
    assert list(data) == []  # Oops! Already consumed

    # Solution 1: Recreate for each test
    def make_data():
        return chain([1, 2], [3, 4])

    assert list(make_data()) == [1, 2, 3, 4]
    assert list(make_data()) == [1, 2, 3, 4]

    # Solution 2: Use a fixture
    @pytest.fixture
    def test_data():
        return chain([1, 2], [3, 4])

    def test_something(test_data):
        assert list(test_data) == [1, 2, 3, 4]

# Testing infinite iterators
from itertools import count, cycle

def test_infinite_iterators():
    # Don't forget to limit them!
    counter = count(start=1, step=2)
    assert list(islice(counter, 5)) == [1, 3, 5, 7, 9]

    # Or use takewhile
    from itertools import takewhile
    counter = count()
    assert list(takewhile(lambda x: x < 5, counter)) == [0, 1, 2, 3, 4]

More hidden gems

Here are a few more itertools functions that come in handy:

cycle: repeat forever

from itertools import cycle, islice

# Rotate through options
colors = cycle(['red', 'green', 'blue'])
for i, color in zip(range(7), colors):
    print(f"Item {i}: {color}")

# Output:
# Item 0: red
# Item 1: green
# Item 2: blue
# Item 3: red
# Item 4: green
# ...

# Practical use: round-robin task assignment
from itertools import cycle

workers = ['worker1', 'worker2', 'worker3']
worker_pool = cycle(workers)

for task in tasks:
    worker = next(worker_pool)
    assign_task(task, worker)

repeat: repeat a value

from itertools import repeat

# Repeat a value n times
list(repeat('X', 5))  # ['X', 'X', 'X', 'X', 'X']

# Or infinitely (be careful!)
counter = 0
for x in repeat('tick'):
    print(x)
    counter += 1
    if counter > 3:
        break

# Useful with map
list(map(pow, range(5), repeat(2)))  # [0, 1, 4, 9, 16]
# Same as: [x**2 for x in range(5)]

compress: filter with a mask

from itertools import compress

data = ['A', 'B', 'C', 'D', 'E']
selectors = [1, 0, 1, 0, 1]

list(compress(data, selectors))  # ['A', 'C', 'E']

# Practical use: applying filter results
users = fetch_users()
is_active = [u.last_login > cutoff_date for u in users]
active_users = list(compress(users, is_active))

When NOT to use itertools

Don't reach for itertools when:

1. You need random access

# Bad - converting to list defeats the purpose
from itertools import chain
data = list(chain(list1, list2))
print(data[5])  # Why not just use list1 + list2?

# Good - just concatenate
data = list1 + list2
print(data[5])

2. Debugging complex chains

# Hard to debug
result = chain.from_iterable(
    groupby(
        takewhile(lambda x: x < 100,
                 filter(lambda x: x % 2 == 0, data)),
        key=lambda x: x // 10
    )
)

# Easier to debug - break it down
evens = (x for x in data if x % 2 == 0)
under_100 = takewhile(lambda x: x < 100, evens)
sorted_data = sorted(under_100)
grouped = groupby(sorted_data, key=lambda x: x // 10)
result = chain.from_iterable(grouped)

3. The data is tiny

# Overkill for 5 items
from itertools import chain
all_items = list(chain(few_items1, few_items2))

# Just do this
all_items = few_items1 + few_items2

Wrapping up

itertools is one of those libraries that doesn't feel essential until you learn it, then you wonder how you lived without it. Start with the basics - chain, islice, groupby - and add more as you encounter the use cases.

The key insight: itertools lets you work with data as a stream rather than a collection. That mindset shift - thinking about processing data as it flows rather than loading it all first - will make you a better Python developer.

And here's a secret: once you're comfortable with itertools, you'll find yourself writing more generator functions. They compose beautifully with these tools and make your code both faster and clearer. That's when Python really starts to feel elegant.

Send a Message