Stop Using os.path

Stop Using os.path

pathlib has been in Python since 3.4 (that's 2014, folks). It's better than os.path in every way. Please use it.

I see code reviews all the time with os.path.join and os.path.exists. It's not wrong, but there's a much nicer way. And yet, somehow, os.path refuses to die. I think it's because people learned Python with os.path and never got the memo that there's a better option.

Consider this your memo.

The old way

import os

base = '/home/user'
path = os.path.join(base, 'subdir', 'file.txt')
if os.path.exists(path):
    with open(path) as f:
        content = f.read()

It works, but there's a lot of os.path everywhere. And mixing strings with file operations always feels a bit... fragile.

The pathlib way

from pathlib import Path

base = Path('/home/user')
path = base / 'subdir' / 'file.txt'
if path.exists():
    content = path.read_text()

The / operator joins paths. It's so much cleaner. And path.read_text() handles opening and closing the file for you.

When I first saw the / operator for paths, I thought it was too clever. But after using it, I can't go back. It reads like English: "base path, then subdir, then file." Compare that to os.path.join(base, 'subdir', 'file.txt') which reads like "call this function with these arguments."

Path objects vs strings

The key insight is that a Path is an object, not a string. It knows it's a path and has methods that make sense for paths.

path = Path('/home/user/projects/app/main.py')

path.name        # 'main.py'
path.stem        # 'main' (filename without extension)
path.suffix      # '.py' (just the extension)
path.parent      # Path('/home/user/projects/app')
path.parts       # ('/', 'home', 'user', 'projects', 'app', 'main.py')

Compare that to the os.path way:

path = '/home/user/projects/app/main.py'

os.path.basename(path)                    # 'main.py'
os.path.splitext(os.path.basename(path))[0]  # 'main'
os.path.splitext(path)[1]                 # '.py'
os.path.dirname(path)                     # '/home/user/projects/app'

Which would you rather read?

The pathlib way is not just shorter. It's more discoverable. With os.path, you have to know the function exists. With pathlib, your editor's autocomplete will show you all the available methods. Type path. and see what's available.

Reading and writing files

pathlib makes simple file operations trivial:

# Reading
content = path.read_text()           # Read as string
data = path.read_bytes()             # Read as bytes
lines = path.read_text().splitlines()  # Read lines

# Writing
path.write_text('Hello, world!')
path.write_bytes(b'\\x00\\x01\\x02')

No more with open(path, 'r') as f: for simple read/write operations.

Now, you still need with open() for more complex operations like streaming large files or appending. But for "read this config file" or "write this JSON," pathlib is perfect.

One gotcha: write_text() and write_bytes() will overwrite the file. There's no append mode. If you need to append, use the traditional:

with path.open('a') as f:
    f.write('appended content\n')

Notice you can still use path.open() instead of open(str(path)). It's the same as the built-in open(), just called on the Path object.

Finding files with glob

# Find all Python files in a directory
for py_file in Path('src').glob('*.py'):
    print(py_file)

# Find recursively with **
for py_file in Path('src').glob('**/*.py'):
    print(py_file)

# rglob is shorthand for recursive glob
for py_file in Path('src').rglob('*.py'):
    print(py_file)

No more os.walk or glob.glob with weird path joining.

Here's a real-world example. Say you want to find all test files, but skip the __pycache__ directories:

test_files = [
    f for f in Path('tests').rglob('test_*.py')
    if '__pycache__' not in f.parts
]

The .parts attribute is a tuple of path components, making it easy to check if something is in the path. Much cleaner than string matching.

Creating directories

# Create a directory (fails if exists)
Path('new_dir').mkdir()

# Create with parents, don't fail if exists
Path('new/nested/dir').mkdir(parents=True, exist_ok=True)

The parents=True, exist_ok=True combo is my go-to. It means "make this directory and any missing parent directories, and don't complain if it already exists." Perfect for ensuring a directory exists before writing to it:

output_dir = Path('output/results/2025')
output_dir.mkdir(parents=True, exist_ok=True)
(output_dir / 'data.json').write_text('{}')

Compare to the os.path equivalent:

import os
output_dir = 'output/results/2025'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
with open(os.path.join(output_dir, 'data.json'), 'w') as f:
    f.write('{}')

Not terrible, but the pathlib version flows better.

Checking file types

path.exists()     # Does it exist?
path.is_file()    # Is it a file?
path.is_dir()     # Is it a directory?
path.is_symlink() # Is it a symlink?

Pro tip: is_file() and is_dir() return False if the path doesn't exist. So you can often skip the exists() check:

# Instead of this:
if path.exists() and path.is_file():
    content = path.read_text()

# Just do this:
if path.is_file():
    content = path.read_text()

One gotcha: symlinks. If you have a symlink to a file, both is_file() and is_symlink() return True. Usually that's what you want, but if you need to know if something is a regular file (not a symlink), check both:

if path.is_file() and not path.is_symlink():
    print('Regular file')

Changing extensions

path = Path('data.json')
backup_path = path.with_suffix('.json.bak')  # data.json.bak
csv_path = path.with_suffix('.csv')          # data.csv

Wait, that's not right. with_suffix() replaces the suffix, it doesn't append. So:

path = Path('data.json')
path.with_suffix('.json.bak')  # Wrong! This gives you 'data.json.bak'? No, 'data.bak'

If you want to add a suffix, concatenate strings or use the name:

backup_path = path.parent / f'{path.name}.bak'  # data.json.bak
# Or:
backup_path = path.with_name(f'{path.name}.bak')

with_suffix() is great for format conversion though:

# Convert image format
jpg_path = Path('photo.png')
webp_path = jpg_path.with_suffix('.webp')  # photo.webp

# Change data format
json_path = Path('data.json')
yaml_path = json_path.with_suffix('.yaml')  # data.yaml

Working with home and current directory

home = Path.home()   # /home/user or /Users/user
cwd = Path.cwd()     # Current working directory

config = Path.home() / '.config' / 'myapp' / 'settings.toml'

These are class methods, so you call them on Path itself, not on an instance. Super handy for config files:

# Load config from home directory
config_path = Path.home() / '.myapp' / 'config.json'
if config_path.is_file():
    config = json.loads(config_path.read_text())

# Save logs relative to current directory
log_dir = Path.cwd() / 'logs'
log_dir.mkdir(exist_ok=True)

Compare to the os.path way:

import os
config_path = os.path.join(os.path.expanduser('~'), '.myapp', 'config.json')

You have to remember expanduser() expands ~ to the home directory. With pathlib, it's just Path.home().

Making paths absolute or relative

path = Path('relative/path')
path.resolve()    # Convert to absolute path, resolves symlinks
path.absolute()   # Also converts to absolute, but doesn't resolve symlinks

# Get relative path from one to another
path.relative_to('/home/user')  # 'relative/path'

The difference between resolve() and absolute() trips people up. Use resolve() most of the time. It gives you the real, canonical path with symlinks resolved:

# If you're in /home/user/projects
path = Path('src/main.py')
path.resolve()  # /home/user/projects/src/main.py

# If src is actually a symlink to /opt/app/src:
path.resolve()  # /opt/app/src/main.py (follows symlinks)
path.absolute() # /home/user/projects/src/main.py (doesn't follow)

relative_to() is great for making paths relative to a base directory:

base = Path('/home/user/projects/myapp')
file = Path('/home/user/projects/myapp/src/utils/helpers.py')
rel = file.relative_to(base)  # src/utils/helpers.py

But watch out - it raises an error if the path isn't actually relative to the base:

Path('/etc/config').relative_to('/home/user')  # ValueError!

Python 3.12 added is_relative_to() to check first:

if file.is_relative_to(base):
    rel = file.relative_to(base)

Listing directory contents

This is where pathlib really shines compared to os.listdir:

# The old way
import os

for item in os.listdir('/some/dir'):
    full_path = os.path.join('/some/dir', item)
    if os.path.isfile(full_path):
        print(f'File: {item}')

With pathlib:

for item in Path('/some/dir').iterdir():
    if item.is_file():
        print(f'File: {item.name}')

Notice how iterdir() gives you Path objects, not strings. You get all the Path methods immediately. No manual joining, no extra conversions.

Even better, you can filter as you go:

# Get only files
files = [f for f in Path('/some/dir').iterdir() if f.is_file()]

# Get only directories
dirs = [d for d in Path('/some/dir').iterdir() if d.is_dir()]

# Get files matching a pattern
py_files = list(Path('/some/dir').glob('*.py'))

File metadata with stat()

Need to check file size, modification time, or permissions? pathlib has you covered:

path = Path('large_file.dat')

# Get all stats
stats = path.stat()
stats.st_size      # File size in bytes
stats.st_mtime     # Last modification time
stats.st_atime     # Last access time
stats.st_ctime     # Creation time (or metadata change on Unix)

# Helpful shortcuts
path.stat().st_size / (1024 * 1024)  # Size in MB

Compare to os.path:

import os
stats = os.stat('large_file.dat')

Okay, that's about the same. But pathlib keeps everything in one place. Plus you get nice helpers:

# Get owner and group (Unix/Linux/Mac)
path.owner()   # Username of file owner
path.group()   # Group name

# Quick file size check
if path.stat().st_size > 1_000_000:
    print('File is bigger than 1MB')

The real win is consistency. Everything goes through the Path object. No switching between modules.

Deleting files and directories

# Delete a file
path = Path('temp.txt')
path.unlink()  # Raises FileNotFoundError if it doesn't exist

# Don't fail if missing (Python 3.8+)
path.unlink(missing_ok=True)

# Delete an empty directory
Path('empty_dir').rmdir()  # Raises if not empty or doesn't exist

# Delete a directory tree (careful!)
import shutil
shutil.rmtree(Path('directory_to_delete'))

Yes, pathlib doesn't have a built-in "delete directory and all contents" method. You still need shutil.rmtree() for that. But at least you can pass it a Path object.

The unlink() name is weird if you're not familiar with Unix terminology. It's the same as rm - it removes a file. Why not delete() or remove()? Because pathlib follows POSIX terminology. Get used to it.

Renaming and moving files

old_path = Path('old_name.txt')
new_path = Path('new_name.txt')

# Rename/move the file
old_path.rename(new_path)

This works for both renaming (same directory) and moving (different directory). The rename() method returns the new path, which is handy:

# Move and keep reference to new location
old = Path('data/temp.json')
new = old.rename('data/processed/temp.json')
content = new.read_text()

One gotcha: rename() will overwrite the destination if it exists. If you want to be safe, check first:

if not new_path.exists():
    old_path.rename(new_path)
else:
    print(f'{new_path} already exists!')

For atomic replace on the same filesystem (Python 3.9+):

old_path.replace(new_path)  # Overwrites atomically

The one gotcha: str() conversion

Some older libraries expect string paths. Just wrap it:

path = Path('/some/file.txt')
old_library.do_thing(str(path))

Most modern libraries accept Path objects directly though. Anything that uses os.fspath() under the hood will work with Path objects. This includes:

  • open() (built-in)
  • shutil functions
  • subprocess functions
  • Most modern third-party libraries

The only time you really need str() is with older code that does string operations on paths (checking if it ends with a certain string, etc.). But that code should probably be using pathlib anyway.

Common patterns and real-world examples

Let's look at some real patterns you'll use constantly.

Find the project root from anywhere in the code

def find_project_root():
    \"\"\"Walk up from current file until we find pyproject.toml or .git\"\"\"
    current = Path(__file__).parent
    for parent in [current, *current.parents]:
        if (parent / 'pyproject.toml').exists():
            return parent
        if (parent / '.git').is_dir():
            return parent
    raise FileNotFoundError('Could not find project root')

# Use it
project_root = find_project_root()
config = project_root / 'config' / 'settings.json'

The .parents attribute is an iterable of all parent directories. Combined with [current, *current.parents], you get "this directory and all parents up to the root."

Process all files of a certain type

# Convert all markdown files to HTML
docs_dir = Path('docs')
for md_file in docs_dir.rglob('*.md'):
    html_file = md_file.with_suffix('.html')
    html_content = convert_markdown(md_file.read_text())
    html_file.write_text(html_content)
    print(f'Converted {md_file.name}')

Safely work with temporary files

import tempfile

# Create temp file with proper cleanup
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.json') as f:
    temp_path = Path(f.name)
    temp_path.write_text('{"temp": true}')

try:
    # Do something with temp_path
    process_file(temp_path)
finally:
    temp_path.unlink(missing_ok=True)

Or use tempfile.TemporaryDirectory() for a whole directory:

with tempfile.TemporaryDirectory() as tmpdir:
    tmp = Path(tmpdir)
    (tmp / 'file1.txt').write_text('content1')
    (tmp / 'file2.txt').write_text('content2')
    # Directory and contents automatically deleted when exiting

Build file paths from components safely

# User input - might contain .. or other malicious paths
user_input = request.args.get('file')

# Ensure it stays within allowed directory
base_dir = Path('/var/www/uploads').resolve()
requested_file = (base_dir / user_input).resolve()

if base_dir in requested_file.parents:
    # Safe - file is within base_dir
    return requested_file.read_bytes()
else:
    # User tried to escape the directory!
    raise PermissionError('Access denied')

Using resolve() prevents directory traversal attacks by converting everything to absolute paths, then checking if the base is in the parents.

Backup files with timestamps

from datetime import datetime

def backup_file(path: Path) -> Path:
    \"\"\"Create a timestamped backup of a file\"\"\"
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    backup = path.with_name(f'{path.stem}_{timestamp}{path.suffix}')
    shutil.copy2(path, backup)
    return backup

# Usage
config = Path('config.json')
backup = backup_file(config)
print(f'Backed up to {backup}')
# config_20251221_143052.json

Performance considerations

"But is pathlib slower than os.path?" I hear you ask.

For most cases: you won't notice. Path creation and basic operations are plenty fast. If you're doing millions of path operations in a tight loop, maybe there's a tiny overhead. But if that's your bottleneck, you have bigger problems.

I ran some quick benchmarks on common operations:

# Path joining - pathlib is ~2x slower
os.path.join('a', 'b', 'c')  # ~100ns
Path('a') / 'b' / 'c'        # ~200ns

# Checking existence - about the same
os.path.exists('file.txt')   # ~1000ns
Path('file.txt').exists()    # ~1100ns

That's nanoseconds. Unless you're doing this millions of times, the readability gain is worth the tiny performance cost.

When you might still use os.path

Look, I'm not saying os.path should be deleted from the standard library. There are a few cases where you might reach for it:

  1. Legacy codebases: If you're in a massive codebase that uses os.path everywhere, don't rewrite it all. Use pathlib for new code and gradually migrate.

  2. Maximum performance: If you're writing a high-performance tool that does millions of path operations, the overhead might matter. Profile first though.

  3. Python 2 compatibility: If you're stuck supporting Python 2 (I'm sorry), you can't use pathlib. But Python 2 has been dead since 2020, so...

That's it. Those are the only reasons.

Making the switch

If you're sold on pathlib (you should be), here's how to migrate:

  1. New files: Just start using it. from pathlib import Path at the top.

  2. Existing code: Replace as you touch it. Don't do a massive find-replace, but when you're editing a function, convert it.

  3. Common conversions:

# Before → After
os.path.join(a, b)              → Path(a) / b
os.path.exists(p)               → Path(p).exists()
os.path.isfile(p)               → Path(p).is_file()
os.path.isdir(p)                → Path(p).is_dir()
os.path.basename(p)             → Path(p).name
os.path.dirname(p)              → Path(p).parent
os.path.splitext(p)             → (Path(p).stem, Path(p).suffix)
os.path.abspath(p)              → Path(p).resolve()
os.path.expanduser('~')         → Path.home()
os.makedirs(p)                  → Path(p).mkdir(parents=True, exist_ok=True)
open(p, 'r').read()             → Path(p).read_text()
  1. Type hints: Use Path in type hints, but accept str | Path if you want to be flexible:
from pathlib import Path
from typing import Union

def process_file(path: Union[str, Path]) -> None:
    path = Path(path)  # Convert to Path if it's a string
    content = path.read_text()
    # ...

Or in Python 3.10+:

def process_file(path: str | Path) -> None:
    path = Path(path)
    # ...

Bottom line

It's 2025. Python 3.4 came out in 2014. pathlib has been around for over a decade. It's time.

Stop importing os.path. Use pathlib. Your code will be cleaner, more readable, and less error-prone. Your coworkers will thank you. Your future self will thank you.

There's really no downside. The learning curve is minimal - if you know os.path, you'll pick up pathlib in an afternoon. The code is better. The errors are clearer. The methods are discoverable.

Just use pathlib.

Send a Message