Skip to main content
21.1k

Refactoring the libp2p Test Framework: A Fresh Start

testing libp2p interoperability performance
by Dave Grantham 7 min read

The libp2p ecosystem spans multiple programming languages, transports, and protocols. Testing interoperability across this diverse landscape has always been challenging. Today, we're announcing a complete rewrite of the test-plans repository that fundamentally improves how we test libp2p implementations.

Why a Complete Rewrite?

The original test framework was built with TypeScript, Docker Compose, and various npm dependencies. While functional, it presented several challenges:

More importantly, 2026 marks a pivotal year for libp2p research efforts focused on scaling and optimization. As we push the boundaries of what's possible with peer-to-peer networking, we need a test framework that can keep pace. Researchers investigating new transport protocols, scaling strategies, and exploring AI-driven dynamic protocols require fast feedback loops, reproducible experiments, and the ability to quickly iterate on implementations across multiple languages. The old framework simply couldn't support the velocity and rigor that this research demands.

We set out to address these issues with a clear set of goals.

The 10 Primary Goals

1. Cross-Platform Support

The new framework runs natively on Linux, macOS, and Windows (via WSL). We've eliminated platform-specific code paths and ensured consistent behavior across all environments. A developer on macOS can reproduce the exact test that failed in CI on Linux.

2. Minimal Dependencies

We reduced dependencies to the essentials:

No Node.js. No npm. No Python. No pip. Just standard tools available on any development machine.

3. Reproducible Testing in CI/CD and Local Environments

The framework is optimized for both CI pipelines and local development. You can run the same commands locally that CI runs, with identical results. Quick feedback loops enable faster iteration. The snapshot capability supports capturing the entire test setup—including inputs, docker images, and environment variables—into a downloadable artifact that can be unpacked locally and re-run with a single command, fully reproducing the exact same test pass that was executed on the CI/CD infrastructure. This local reproducibility greatly increases the velocity of developers debugging tests and fixing compatibility issues. It also helps with optimization work by recreating the same conditions locally that led to the measured results in the CI/CD environment.

4. Follow CI/CD and Programming Conventions

We adhere to standard patterns: clear exit codes, structured logging to stderr, machine-readable output to stdout, and conventional command-line arguments. The barrier to entry is low for anyone familiar with shell scripting.

5. Code Reusability via Shared Library

The lib/ directory contains 19 reusable shell scripts (~4,000+ lines) that provide common functionality:

Each test suite (perf, transport, hole-punch) imports these libraries, ensuring consistency and reducing duplication.

6. Aggressive Caching

Three levels of caching dramatically improve performance:

Cache TypeMissHitSpeedup
Test matrix2-5s50-200ms10-100x
GitHub snapshots5-30s1-2s5-15x
Docker images30-300s0.1s300-3000x

The test matrix cache uses a content-addressed key computed from images.yaml and all filter parameters. Change a filter, get a new key. Same filters, same cached matrix.

7. Fine-Grained Filtering

The two-stage filtering model provides precise control:

Stage 1 (SELECT): Narrow from the complete list

./run.sh --impl-select "~rust|~go"  # Only rust and go implementations

Stage 2 (IGNORE): Remove from selected set

./run.sh --impl-ignore "experimental"  # Exclude experimental versions

Filter dimensions include:

Aliases make common patterns easy:

./run.sh --impl-select "~rust"  # Expands to rust-v0.56|rust-v0.55|rust-v0.54|...
./run.sh --impl-ignore "!~rust"  # Everything NOT matching rust (negation)

8. YAML Configuration with Comments

All configuration uses YAML files with extensive comments:

Human-readable configuration lowers the barrier to understanding and modification.

9. Local and Remote Test Applications with Patching

Testing local changes doesn't require forking repositories. The patching strategy lets you:

  1. Clone an implementation locally
  2. Make your changes
  3. Generate a patch file
  4. Reference it in images.yaml

The framework downloads the upstream snapshot, applies your patch, and builds the image. See our Local Testing Strategies guide for details.

10. Docker for Arbitrary Network Layouts

Each test suite uses Docker to create isolated, reproducible network environments:

The hole-punch tests create five containers per test with three networks, simulating realistic NAT traversal scenarios.

What Changed: By the Numbers

Between commits f58b7472 and d6e5bea1:

The result is a simpler, more maintainable codebase that's easier to understand and extend.

Test Suites

Performance Benchmarking (perf/)

Measures the overhead that libp2p introduces:

Baseline tests against iperf, raw QUIC, and HTTPS establish reference points for measuring libp2p overhead.

Transport Interoperability (transport/)

Verifies cross-implementation compatibility:

Tests run in parallel (default: CPU core count) for fast feedback on large test matrices.

Hole-Punch NAT Traversal (hole-punch/)

Tests the DCUtR protocol for establishing direct connections through NAT:

Each test gets unique subnets calculated from the test key, enabling parallel execution without network conflicts.

Implementation Coverage

The test suite covers implementations in:

With 40+ implementation variations across different versions and configurations.

Getting Started

Check Dependencies

cd perf
./run.sh --check-deps

List Available Implementations

./run.sh --list-images

Preview Test Selection

./run.sh --impl-select "~rust" --list-tests

Run Tests

# Performance tests with rust implementations
cd perf
./run.sh --impl-select "~rust" --iterations 5

# Transport interoperability
cd transport
./run.sh --impl-select "~rust|~go"

# Hole-punch tests
cd hole-punch
./run.sh --impl-select "~rust"

Create Reproducible Snapshots

./run.sh --impl-select "~rust" --snapshot

The snapshot captures everything needed to reproduce the test run.

Reproducibility with inputs.yaml

Every test run generates an inputs.yaml file capturing:

To reproduce a previous run:

cp /srv/cache/test-run/perf-abc12345/inputs.yaml ./
./run.sh

The framework reads inputs.yaml at startup and applies the same configuration.

Future Work

Resources

We believe this rewrite significantly improves the developer experience for testing libp2p implementations. The combination of cross-platform support, powerful filtering, reproducibility, and comprehensive documentation makes it easier than ever to ensure your libp2p implementation works correctly with the rest of the ecosystem.

Try it out, and let us know what you think!