Statistical Inference in Large Multi-way Networks

Package Documentation

polyads estimates multi-way gravity models with arbitrary fixed-effect structures (two-way, three-way, four-way, and beyond). It is designed for count data on sparse networks where standard PPML breaks down.

Installation

pip install polyads

Or install the latest development version from GitHub:

pip install git+https://github.com/lucasresenderc/polyads.git

Two Workflows

Estimation always requires a sparse DataFrame of indices and counts. The main design choice is how covariates are supplied:

1. Features in memory

Use when the full feature array X fits in RAM. Pass a dense D-dimensional array of shape (n₁, n₂, …, p) to fit(..., X=X). This is the simplest path for moderate problem sizes.

2. Features on the fly

Use when X is too large to fit in memory or even compute. Pass a callback eval_X that maps an index tuple to its feature vector; features are computed only when needed.

In both cases, df lists the cells entering the likelihood (index columns plus a count column). You never need to store the full D-way tensor of outcomes in memory.

Case 1 — Features in Memory

import numpy as np
import pandas as pd
from polyads.model import PolyadEstimator
from polyads.data import generate_data

# Synthetic three-way gravity data
beta_true = np.array([1.0, -0.5])
df, X = generate_data(
    seed=1,
    n_ds=(100, 100, 100),
    c=-5,
    shape=np.inf,
    beta=beta_true,
    groups=[[0, 1], [0, 2], [1, 2]],
)

columns = df.columns.tolist()
estimator = PolyadEstimator(use_tqdm=True)
estimator.fit(
    df=df,
    indices=columns[:-1],
    values=columns[-1],
    beta_init=np.zeros(2),
    X=X,
)

estimator.summary()

Case 2 — Features via `eval_X`

For sparse networks for which covariates do not fit in memory (or would require too much time to compute), provide a function that returns the covariates for any index encountered during estimation:

def compute_features(indices):
    i, j, t = indices
    return np.array([
        np.log(distance[i, j]),
        fta_indicator[i, j, t],
        border[i, j],
    ])

estimator.fit(
    df=df,
    indices=['i', 'j', 't'],
    values='y',
    beta_init=np.zeros(3),
    eval_X=compute_features,  # pass eval_X instead of X
)

Use X or eval_X, not both. The callback must be pure and fast enough to be called many times.

Examples by Dimension

Two-way (trade)

# Bilateral trade: log λ_ij = β'X_ij + u_i + v_j
estimator.fit(
    df=df,
    indices=['exporter', 'importer'],
    values='flow',
    beta_init=np.zeros(p),
    X=features,
)

Three-way (panel)

# Trade panel: log λ_ijt = β'X_ijt + u_ij + v_it + w_jt
estimator.fit(
    df=df,
    indices=['exporter', 'importer', 'year'],
    values='flow',
    beta_init=np.zeros(p),
    X=features,
)

Data Format

df is a long-format table: one row per cell in the support (non-negative edges), with index columns and a count column (non-negative integers). Zeros may appear among the stored rows; the full N-sized tensor need not be constructed.

df = pd.DataFrame({
    'i1': [0, 0, 1, ...],
    'i2': [0, 1, 0, ...],
    'y':  [5, 0, 3, ...],
})

Case 1: X has shape (n₁, n₂, …, p) (e.g., (n₁, n₂, n₃, p) for three-way models).
Case 2: omit X and supply eval_X(indices) → ℝ^p instead.

Polyads vs. PPML

Method	Complexity	Best for	Pros	Cons
Polyads	O(\|E\|²)	D ≥ 3, sparse networks	Valid inference; Faster for sparse networks	Quadratic in \|E\|; costly when dense
PPML	O(N)	D = 2, dense panels	Linear runtime for dense networks	Biased CIs for D ≥ 3; requires all edges (incl. zeros)

Citation

@misc{resende2025polyads, title={Statistical Inference in Large Multi-way Networks}, author={Lucas Resende and Guillaume Lecué and Lionel Wilner and Philippe Choné}, year={2025}, eprint={2512.02203}, archivePrefix={arXiv}, primaryClass={econ.EM}, url={https://arxiv.org/abs/2512.02203}, }

Full README on GitHub Install from PyPI

Statistical Inference in Large Multi-way Networks

Paper

Paper Presentation

Python Package Demo

Package Documentation

Installation

Two Workflows

1. Features in memory

2. Features on the fly

Case 1 — Features in Memory

Case 2 — Features via `eval_X`

Examples by Dimension

Two-way (trade)

Three-way (panel)

Data Format

Polyads vs. PPML

Citation

Statistical Inference in Large Multi-way Networks

Paper

Paper Presentation

Python Package Demo

Package Documentation

Installation

Two Workflows

1. Features in memory

2. Features on the fly

Case 1 — Features in Memory

Case 2 — Features via eval_X

Examples by Dimension

Two-way (trade)

Three-way (panel)

Data Format

Polyads vs. PPML

Citation

Case 2 — Features via `eval_X`