Speed Up Slow Functions with lru_cache

If a function is slow and you call it with the same arguments repeatedly, cache the results. This is called memoization, and Python's functools.lru_cache makes it trivial.

What is memoization?

Memoization is a fancy word for "remember what you already computed." If you call a function with the same inputs multiple times, why recalculate the answer each time? Just store the result after the first call and return it instantly on subsequent calls.

It's like asking someone for directions to the grocery store. The first time they think about it and tell you. The second time you ask? They don't re-plan the route, they just repeat what they said before.

The basic pattern

from functools import lru_cache

@lru_cache
def expensive_function(n):
    # some slow computation
    return result

First call with n=5? Does the computation. Second call with n=5? Returns the cached result instantly.

How LRU cache works

LRU stands for "Least Recently Used." When the cache fills up, it evicts the least recently used items to make room for new ones. Think of it like your closet: if you haven't worn something in a while and you need space, that's the first thing to go.

The cache keeps track of both the function arguments and their results. When you call the function, it checks: "Have I seen these exact arguments before?" If yes, return the cached result. If no, run the function and cache the result.

Real example: Fibonacci

# Without cache: O(2^n) - unusably slow
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

# With cache: O(n) - fast
@lru_cache
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

Without caching, fib(35) takes seconds. With caching? Milliseconds. The difference is dramatic because recursive functions often recompute the same values many times.

Real-world use cases

Expensive API calls

@lru_cache(maxsize=128)
def fetch_weather(city):
    # Hit external API - slow and rate-limited
    return requests.get(f"api.weather.com/{city}").json()

If multiple users request the same city's weather within a short time, you hit the API once and serve cached results to everyone else.

Database lookups

@lru_cache(maxsize=256)
def get_user_permissions(user_id):
    # Database query - relatively slow
    return db.query("SELECT * FROM permissions WHERE user_id = ?", user_id)

User permissions don't change often. Cache them and save database round trips. In a web application handling hundreds of requests per second, this single line can reduce database load by 90% or more. Imagine a user making multiple requests that each check permissions three times - that's potentially hundreds of database queries reduced to just a handful of cache lookups.

Complex calculations

@lru_cache(maxsize=512)
def calculate_shipping_cost(origin_zip, dest_zip, weight_oz, carrier):
    # Complex calculation involving distance, zones, weight brackets
    distance = geocode_distance(origin_zip, dest_zip)
    zone = get_shipping_zone(distance)
    rate = carrier_rate_table(carrier, zone, weight_oz)
    return rate * 1.1  # Add handling fee

# During checkout, multiple users might ship similar packages
# Cache saves recalculating the same routes repeatedly

E-commerce sites might see the same shipping calculations thousands of times per day. Caching these saves both computation time and makes checkout faster for customers.

Configuration loading

@lru_cache(maxsize=1)
def load_config():
    # Parse YAML/JSON from disk
    with open("config.yaml") as f:
        return yaml.safe_load(f)

Read the config file once, cache it forever (or until you clear the cache).

Debugging with cache_info()

Want to see if your cache is actually helping? Use cache_info():

@lru_cache(maxsize=100)
def slow_function(x):
    return x ** 2

slow_function(5)
slow_function(5)
slow_function(10)

print(slow_function.cache_info())
# CacheInfo(hits=1, misses=2, maxsize=100, currsize=2)

This tells you: 1 hit (cache returned a result), 2 misses (had to compute), max cache size is 100, currently storing 2 results.

Python 3.9+: @cache vs @lru_cache

Python 3.9 added @cache as a simpler alternative:

from functools import cache

@cache
def expensive_function(n):
    return n ** 2

The difference? @cache is just @lru_cache(maxsize=None) - an unbounded cache that never evicts items. Use it when you know the number of unique inputs is limited and you want maximum speed.

Thread safety

Good news: lru_cache is thread-safe out of the box. Multiple threads can call the cached function simultaneously without issues. The cache uses locks internally to prevent race conditions.

The typed parameter

By default, lru_cache treats 3 and 3.0 as the same:

@lru_cache
def add_one(x):
    return x + 1

add_one(3)    # computes and caches
add_one(3.0)  # returns cached result, doesn't recompute

If you want to treat different types as different cache entries:

@lru_cache(typed=True)
def add_one(x):
    return x + 1

add_one(3)    # computes and caches
add_one(3.0)  # computes separately, caches separately

Most of the time you don't need typed=True, but it's useful if your function has type-dependent behavior.

Common patterns

Caching property computations

class DataProcessor:
    def __init__(self, data):
        self.data = data

    @lru_cache(maxsize=1)
    def expensive_analysis(self):
        # Process self.data - takes a while
        return analysis_result

Note: For instance methods, you might want to use @cached_property from functools instead, which is designed for this use case.

Recursive functions

We saw Fibonacci, but this pattern works for any recursive algorithm:

@lru_cache(maxsize=None)
def count_paths(grid, x, y):
    if x == 0 and y == 0:
        return 1
    paths = 0
    if x > 0:
        paths += count_paths(grid, x-1, y)
    if y > 0:
        paths += count_paths(grid, x, y-1)
    return paths

Dynamic programming problems are perfect for lru_cache. Longest common subsequence, edit distance, coin change - all become trivial:

@lru_cache(maxsize=None)
def min_coins(amount, coins):
    \"\"\"Find minimum coins needed to make amount.\"\"\"
    if amount == 0:
        return 0
    if amount < 0:
        return float('inf')

    min_count = float('inf')
    for coin in coins:
        count = min_coins(amount - coin, coins) + 1
        min_count = min(min_count, count)

    return min_count

# Without cache: exponential time
# With cache: polynomial time
result = min_coins(100, (1, 5, 10, 25))

Preprocessing and normalization

Sometimes you want to cache after normalizing inputs:

@lru_cache(maxsize=256)
def search_case_insensitive(query):
    # Cache is case-insensitive
    return database.search(query)

def search(query):
    # Normalize to lowercase before caching
    return search_case_insensitive(query.lower().strip())

# These all hit the same cache entry
search("Python")
search("python")
search("  PYTHON  ")

This pattern is useful for user inputs where you want to maximize cache hits despite variations in formatting.

Chaining cached functions

You can build pipelines where each stage is cached:

@lru_cache(maxsize=100)
def fetch_raw_data(source_id):
    return api.get_data(source_id)

@lru_cache(maxsize=100)
def parse_data(source_id):
    raw = fetch_raw_data(source_id)
    return json.loads(raw)

@lru_cache(maxsize=100)
def transform_data(source_id):
    parsed = parse_data(source_id)
    return apply_transformations(parsed)

# Each stage is cached independently
# If you clear fetch_raw_data cache, parsed and transformed remain cached

This gives you fine-grained control over cache invalidation for multi-stage pipelines.

When NOT to use caching

Functions with side effects

# BAD - don't cache this
@lru_cache
def send_email(user_id, message):
    email_service.send(user_id, message)

If you call this twice with the same arguments, the second call does nothing (returns the cached None). The user doesn't get a second email. This is almost never what you want.

Time-sensitive data

# BAD for real-time data
@lru_cache
def get_stock_price(ticker):
    return api.get_price(ticker)

Stock prices change constantly. Caching them means serving stale data. If you must cache time-sensitive data, clear the cache periodically or use a shorter TTL (which lru_cache doesn't support natively - you'd need a different library).

Non-hashable arguments

# This won't work - lists aren't hashable
@lru_cache
def process_items(items_list):
    return sum(items_list)

You'll get a TypeError: unhashable type: 'list'. Convert to tuples if you need to cache functions that take sequences.

Memory considerations

Every cache entry uses memory. An unbounded cache (maxsize=None) will grow forever, which can cause memory issues.

# Could use gigabytes of RAM if called with many unique inputs
@lru_cache(maxsize=None)
def process_large_data(data_id):
    return load_and_process_huge_dataset(data_id)

Set appropriate maxsize values based on your memory constraints and access patterns. Start conservative (100-500) and tune based on cache_info() stats.

Cache invalidation: The hard problem

Phil Karlton famously said: "There are only two hard things in Computer Science: cache invalidation and naming things." Let's tackle the first one.

Sometimes you need to clear the cache. Maybe the underlying data changed, or you're running tests, or you just want to start fresh.

Clearing the entire cache

@lru_cache(maxsize=128)
def get_user_data(user_id):
    return db.fetch_user(user_id)

# User updates their profile
user.update_profile()
get_user_data.cache_clear()  # Wipe the whole cache

cache_clear() removes all cached entries. It's simple but brutal - you lose all cached data, even for users who didn't change.

The problem with selective invalidation

Here's what you might want but can't do:

# This doesn't exist - wishful thinking
get_user_data.cache_clear(user_id=123)  # Only clear one user

lru_cache doesn't support invalidating specific entries. It's all or nothing. If you need fine-grained control, you have a few options:

Use a more sophisticated caching library like cachetools or joblib
Roll your own cache using a dictionary
Structure your code so you can clear the entire cache when needed

Time-based invalidation

Since lru_cache doesn't have TTL (time-to-live) support, you can combine it with timestamps:

import time
from functools import lru_cache

_cache_timestamp = time.time()

@lru_cache(maxsize=128)
def get_data_with_timestamp(key, timestamp):
    # timestamp parameter ensures cache is invalidated when it changes
    return fetch_expensive_data(key)

def get_data(key):
    global _cache_timestamp
    # If cache is older than 5 minutes, invalidate
    if time.time() - _cache_timestamp > 300:
        get_data_with_timestamp.cache_clear()
        _cache_timestamp = time.time()
    return get_data_with_timestamp(key, _cache_timestamp)

This is a bit hacky, but it works. The timestamp parameter changes every 5 minutes, effectively creating new cache entries and letting old ones age out.

Understanding maxsize: The Goldilocks problem

Choosing the right maxsize is an art. Too small and you'll have constant cache evictions (low hit rate). Too large and you waste memory.

How to choose maxsize

Start by asking: "How many unique inputs will this function see?"

# User lookup - probably thousands of users
@lru_cache(maxsize=500)
def get_user(user_id):
    return db.fetch_user(user_id)

# Config file - exactly one input (no arguments)
@lru_cache(maxsize=1)
def get_config():
    return load_config_file()

# Math function - could be infinite inputs
@lru_cache(maxsize=1000)
def calculate_prime_factors(n):
    return factorize(n)

Measuring cache effectiveness

The hit ratio is your key metric:

@lru_cache(maxsize=100)
def expensive_func(x):
    return x ** 2

# After running your application...
info = expensive_func.cache_info()
hit_ratio = info.hits / (info.hits + info.misses)
print(f"Cache hit ratio: {hit_ratio:.2%}")

A good hit ratio is typically above 80%. If you're seeing 50% or less, either increase maxsize or your access pattern isn't cache-friendly.

The cost of eviction

When the cache fills up, lru_cache has to do extra work to track least recently used items and evict them. For very high-throughput functions, this overhead can matter.

# High cache churn - evictions happen constantly
@lru_cache(maxsize=10)
def process_request(request_id):
    # Called with thousands of unique IDs per second
    return handle_request(request_id)

If maxsize is too small relative to your working set, you might actually slow things down. Monitor with cache_info() - if currsize equals maxsize and you have lots of misses, increase the size.

Common gotchas and how to avoid them

Gotcha 1: Mutable default arguments

This classic Python footgun gets worse with caching:

# DANGEROUS
@lru_cache
def process_items(items=[]):
    items.append("processed")
    return len(items)

process_items()  # Returns 1
process_items()  # Returns 1 (cached, but items list is mutated!)

Never use mutable default arguments with cached functions. Use None and create new objects inside:

# SAFE
@lru_cache
def process_items(items=None):
    if items is None:
        items = []
    items = items.copy()  # Don't mutate inputs
    items.append("processed")
    return len(items)

Gotcha 2: Caching methods with self

Instance methods include self as the first argument, which means the cache holds a reference to the instance:

class DataProcessor:
    def __init__(self, data):
        self.data = data

    @lru_cache(maxsize=128)
    def process(self, key):
        return expensive_operation(self.data, key)

# Problem: cache holds references to DataProcessor instances
# They won't be garbage collected even when you're done with them
processor1 = DataProcessor(big_data)
processor1.process("key1")  # Cached with processor1
processor2 = DataProcessor(other_data)
processor2.process("key1")  # Different cache entry!

Each instance gets its own cache entries, and instances can't be garbage collected while cached. For instance methods, prefer @cached_property or implement your own caching strategy.

Gotcha 3: Large objects in cache

# Memory hog
@lru_cache(maxsize=1000)
def load_dataset(dataset_id):
    # Returns a 100MB numpy array
    return np.load(f"dataset_{dataset_id}.npy")

With maxsize=1000, you could theoretically cache 100GB of data. Be mindful of what you're caching. If objects are large, use a much smaller maxsize or don't cache at all.

Gotcha 4: Cache pollution in tests

@lru_cache(maxsize=128)
def get_config():
    return load_config()

def test_with_custom_config():
    # Oops, the real config is already cached
    with mock.patch('load_config', return_value=test_config):
        result = get_config()  # Returns cached production config!
    assert result == test_config  # FAILS

Always clear caches in test setup:

def setup_function():
    get_config.cache_clear()

def test_with_custom_config():
    with mock.patch('load_config', return_value=test_config):
        result = get_config()
    assert result == test_config  # PASSES

Or use setUp in unittest, fixtures in pytest, etc.

Advanced patterns

Combining with other decorators

Order matters when stacking decorators:

# CORRECT: cache the property lookup
from functools import cached_property

class DataAnalyzer:
    @cached_property
    def expensive_property(self):
        return analyze_data(self.data)

# For regular functions with multiple decorators:
from functools import wraps, lru_cache

def log_calls(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__}")
        return func(*args, **kwargs)
    return wrapper

# CORRECT: cache is innermost (closest to function)
@log_calls
@lru_cache(maxsize=128)
def compute(x):
    return x ** 2

# WRONG: would log every call, even cache hits
@lru_cache(maxsize=128)
@log_calls
def compute(x):
    return x ** 2

Put lru_cache closest to the actual function so you cache the final result, not intermediate decorator behavior.

Partial function caching

Sometimes you want to cache based on only some arguments:

# Cache by user_id only, ignore timestamp
@lru_cache(maxsize=128)
def _get_user_cached(user_id):
    return db.fetch_user(user_id)

def get_user(user_id, timestamp=None):
    # timestamp is ignored for caching purposes
    return _get_user_cached(user_id)

This is useful when some arguments are just metadata that don't affect the result.

Building a custom cache decorator

If lru_cache doesn't fit your needs, you can build your own:

from functools import wraps
import time

def ttl_cache(seconds=300, maxsize=128):
    \"\"\"Cache with time-to-live expiration.\"\"\"
    def decorator(func):
        cache = {}
        timestamps = {}

        @wraps(func)
        def wrapper(*args, **kwargs):
            key = str(args) + str(kwargs)
            now = time.time()

            # Check if cached and not expired
            if key in cache:
                if now - timestamps[key] < seconds:
                    return cache[key]
                else:
                    del cache[key]
                    del timestamps[key]

            # Compute and cache
            result = func(*args, **kwargs)
            cache[key] = result
            timestamps[key] = now

            # Evict oldest if over maxsize
            if len(cache) > maxsize:
                oldest = min(timestamps.items(), key=lambda x: x[1])[0]
                del cache[oldest]
                del timestamps[oldest]

            return result

        return wrapper
    return decorator

@ttl_cache(seconds=60, maxsize=100)
def get_stock_price(ticker):
    return api.fetch_price(ticker)

This is simplified (not thread-safe, inefficient eviction), but shows the concept. For production, use a library like cachetools.

Performance benchmarks: Show me the numbers

Let's quantify the speedup with some real examples:

Fibonacci (recursive)

import time

def fib_no_cache(n):
    if n < 2:
        return n
    return fib_no_cache(n-1) + fib_no_cache(n-2)

@lru_cache(maxsize=None)
def fib_cached(n):
    if n < 2:
        return n
    return fib_cached(n-1) + fib_cached(n-2)

# Benchmark fib(35)
start = time.time()
result = fib_no_cache(35)
uncached_time = time.time() - start

start = time.time()
result = fib_cached(35)
cached_time = time.time() - start

print(f"Uncached: {uncached_time:.2f}s")  # ~3.5 seconds
print(f"Cached: {cached_time:.4f}s")       # ~0.0001 seconds
print(f"Speedup: {uncached_time/cached_time:.0f}x")  # ~35,000x faster!

That's not a typo. Caching can give you a 35,000x speedup for recursive algorithms.

API calls

import requests
from time import time

@lru_cache(maxsize=100)
def fetch_user_cached(user_id):
    return requests.get(f"https://api.example.com/users/{user_id}").json()

# First call: ~200ms (network latency)
start = time()
fetch_user_cached(123)
print(f"First call: {(time()-start)*1000:.0f}ms")

# Second call: ~0.01ms (cached)
start = time()
fetch_user_cached(123)
print(f"Cached call: {(time()-start)*1000:.2f}ms")

# 20,000x speedup

Database queries

@lru_cache(maxsize=256)
def get_user_permissions_cached(user_id):
    # Assume this takes ~10ms
    return db.execute("SELECT * FROM permissions WHERE user_id = ?", user_id)

# Simulating a request that checks permissions 10 times
# Without cache: 10 * 10ms = 100ms
# With cache: 10ms + 9 * 0.01ms = ~10ms
# 10x speedup just for a single request

Debugging cache issues

Inspecting cache contents

cache_info() tells you stats, but what if you want to see actual cached values?

@lru_cache(maxsize=128)
def compute(x):
    return x ** 2

compute(5)
compute(10)

# Access internal cache (undocumented, use carefully)
print(compute.__wrapped__.__code__.co_freevars)  # Shows variable names
# Better: use cache_info() for stats
print(compute.cache_info())

Actually, accessing cached values directly is tricky because lru_cache doesn't expose them. If you need to inspect cache contents, consider using cachetools instead, which gives you more control.

Finding cache-related performance issues

import cProfile

@lru_cache(maxsize=100)
def expensive_func(x):
    return sum(range(x))

# Profile to see cache effectiveness
cProfile.run('[expensive_func(i % 50) for i in range(10000)]')

If you see lots of time in the actual function rather than the cache lookup, your cache might be too small or your hit rate too low.

Logging cache hits and misses

from functools import lru_cache, wraps

def logged_cache(maxsize=128):
    def decorator(func):
        cached_func = lru_cache(maxsize=maxsize)(func)

        @wraps(func)
        def wrapper(*args, **kwargs):
            before = cached_func.cache_info()
            result = cached_func(*args, **kwargs)
            after = cached_func.cache_info()

            if after.hits > before.hits:
                print(f"Cache HIT: {func.__name__}{args}")
            else:
                print(f"Cache MISS: {func.__name__}{args}")

            return result

        wrapper.cache_info = cached_func.cache_info
        wrapper.cache_clear = cached_func.cache_clear
        return wrapper
    return decorator

@logged_cache(maxsize=100)
def compute(x):
    return x ** 2

compute(5)  # Cache MISS: compute(5,)
compute(5)  # Cache HIT: compute(5,)

Comparing lru_cache with alternatives

Understanding when to use lru_cache versus other caching solutions helps you make better architectural decisions.

lru_cache vs Redis/Memcached

In-process memory caching with lru_cache is fast but limited to a single process:

# lru_cache: Single process, microsecond lookup
@lru_cache(maxsize=1000)
def get_product(product_id):
    return db.fetch_product(product_id)

# Redis: Shared across processes/servers, millisecond lookup
import redis
r = redis.Redis()

def get_product_redis(product_id):
    cached = r.get(f"product:{product_id}")
    if cached:
        return json.loads(cached)
    product = db.fetch_product(product_id)
    r.setex(f"product:{product_id}", 300, json.dumps(product))
    return product

Use lru_cache when caching within a single process is sufficient. Use Redis when you need to share cache across multiple servers or want persistence across application restarts. For many web applications, combining both gives the best performance - lru_cache for ultra-fast local lookups, Redis for shared data.

lru_cache vs manual dictionary caching

You could implement caching with a simple dictionary:

# Manual approach - more code, more bugs
_cache = {}
def get_user(user_id):
    if user_id not in _cache:
        _cache[user_id] = db.fetch_user(user_id)
    return _cache[user_id]

# lru_cache approach - one line
@lru_cache(maxsize=128)
def get_user(user_id):
    return db.fetch_user(user_id)

Manual caching requires handling thread safety, eviction policy, and cache statistics yourself. lru_cache gives you all this for free. Only roll your own when you need features lru_cache doesn't provide, like TTL or custom eviction logic.

lru_cache vs cachetools

The cachetools library extends lru_cache with more options:

from cachetools import TTLCache, cached

# Cache with TTL (time-to-live)
cache = TTLCache(maxsize=100, ttl=300)

@cached(cache)
def get_weather(city):
    return api.fetch_weather(city)

# Automatically expires after 5 minutes

Use cachetools when you need TTL expiration, different eviction policies (LFU, RR), or more control over the cache object. Stick with lru_cache for simplicity when LRU eviction is sufficient.

Real-world production tips

Don't over-optimize

Not every function needs caching. Profile first, optimize second. Add lru_cache where it actually matters:

# Probably not worth caching - too fast already
@lru_cache  # Overkill
def add_numbers(a, b):
    return a + b

# Worth caching - measurably slow
@lru_cache(maxsize=128)
def calculate_recommendations(user_id):
    # Complex algorithm taking 100ms+
    return recommendation_engine.compute(user_id)

The overhead of checking the cache, hashing arguments, and managing the LRU data structure is tiny but not zero. For functions that execute in microseconds, caching might actually be slower than just recomputing. Use profiling tools like cProfile or line_profiler to identify actual bottlenecks before adding caching.

Monitor cache effectiveness in production

Log cache stats periodically:

import logging
from functools import lru_cache

@lru_cache(maxsize=500)
def get_product_data(product_id):
    return db.fetch_product(product_id)

# In your monitoring/metrics code
def log_cache_stats():
    info = get_product_data.cache_info()
    hit_rate = info.hits / (info.hits + info.misses) if info.hits + info.misses > 0 else 0
    logging.info(f"Product cache - Hit rate: {hit_rate:.2%}, Size: {info.currsize}/{info.maxsize}")

# Call this periodically (e.g., every minute)

This helps you tune maxsize based on real traffic patterns.

Handle cache warming

For critical paths, pre-populate the cache on startup:

@lru_cache(maxsize=1000)
def get_product_category(category_id):
    return db.fetch_category(category_id)

def warm_cache():
    # Warm up cache with most common categories
    common_categories = [1, 2, 3, 5, 8, 13, 21]
    for cat_id in common_categories:
        get_product_category(cat_id)

# Call during application startup
warm_cache()

This ensures your first real users get cached responses instead of slow cold cache misses.

Bottom line

lru_cache is one of the easiest performance wins in Python. Add one decorator, get potentially massive speedups. Just remember:

Only cache pure functions (same inputs = same outputs, no side effects)
Use maxsize wisely - too small and you get constant evictions, too large and you waste memory
Check cache_info() to verify it's actually helping
Clear caches in tests to avoid pollution
Be careful with instance methods - they prevent garbage collection
Don't cache time-sensitive data or functions with side effects
Monitor hit rates in production and tune accordingly

The best part? You can add caching incrementally. Find a slow function, add @lru_cache, measure the impact. If it helps, keep it. If not, remove it. No big refactoring needed.

Happy caching!