Paper

We propose a new method to estimate structural parameters in multi-way networks while controlling for rich structures of fixed effects. The method is based on a series of classification tasks and is agnostic to both the number and structure of fixed effects. In contrast to full maximum likelihood approaches, our estimator does not suffer from the incidental parameter problem. For sparsely connected networks, it is also computationally faster than PPML. We provide empirical evidence that our estimator yields more reliable confidence intervals than PPML and its bias-correction strategies. These improvements hold even under model misspecification and are more pronounced in sparse settings. While PPML remains competitive in dense, low-dimensional data, our approach offers a robust alternative for multi-way models that scales efficiently with sparsity. The method is applied to study the causal effect of a policy reform on spatial accessibility to health care in France.

Paper Presentation

A short overview of the method.

Video coming soon

Python Package Demo

A demonstration of the Python package.

Video coming soon

Package Documentation

polyads estimates multi-way gravity models with arbitrary fixed-effect structures (two-way, three-way, four-way, and beyond). It is designed for count data on sparse networks where standard PPML breaks down.

Installation

pip install polyads

Or install the latest development version from GitHub:

pip install git+https://github.com/lucasresenderc/polyads.git

Two Workflows

Estimation always requires a sparse DataFrame of indices and counts. The main design choice is how covariates are supplied:

1. Features in memory

Use when the full feature array X fits in RAM. Pass a dense D-dimensional array of shape (n₁, n₂, …, p) to fit(..., X=X). This is the simplest path for moderate problem sizes.

2. Features on the fly

Use when X is too large to fit in memory or even compute. Pass a callback eval_X that maps an index tuple to its feature vector; features are computed only when needed.

In both cases, df lists the cells entering the likelihood (index columns plus a count column). You never need to store the full D-way tensor of outcomes in memory.

Case 1 — Features in Memory

import numpy as np
import pandas as pd
from polyads.model import PolyadEstimator
from polyads.data import generate_data

# Synthetic three-way gravity data
beta_true = np.array([1.0, -0.5])
df, X = generate_data(
    seed=1,
    n_ds=(100, 100, 100),
    c=-5,
    shape=np.inf,
    beta=beta_true,
    groups=[[0, 1], [0, 2], [1, 2]],
)

columns = df.columns.tolist()
estimator = PolyadEstimator(use_tqdm=True)
estimator.fit(
    df=df,
    indices=columns[:-1],
    values=columns[-1],
    beta_init=np.zeros(2),
    X=X,
)

estimator.summary()

Case 2 — Features via eval_X

For sparse networks for which covariates do not fit in memory (or would require too much time to compute), provide a function that returns the covariates for any index encountered during estimation:

def compute_features(indices):
    i, j, t = indices
    return np.array([
        np.log(distance[i, j]),
        fta_indicator[i, j, t],
        border[i, j],
    ])

estimator.fit(
    df=df,
    indices=['i', 'j', 't'],
    values='y',
    beta_init=np.zeros(3),
    eval_X=compute_features,  # pass eval_X instead of X
)

Use X or eval_X, not both. The callback must be pure and fast enough to be called many times.

Examples by Dimension

Two-way (trade)

# Bilateral trade: log λ_ij = β'X_ij + u_i + v_j
estimator.fit(
    df=df,
    indices=['exporter', 'importer'],
    values='flow',
    beta_init=np.zeros(p),
    X=features,
)

Three-way (panel)

# Trade panel: log λ_ijt = β'X_ijt + u_ij + v_it + w_jt
estimator.fit(
    df=df,
    indices=['exporter', 'importer', 'year'],
    values='flow',
    beta_init=np.zeros(p),
    X=features,
)

Data Format

df is a long-format table: one row per cell in the support (non-negative edges), with index columns and a count column (non-negative integers). Zeros may appear among the stored rows; the full N-sized tensor need not be constructed.

df = pd.DataFrame({
    'i1': [0, 0, 1, ...],
    'i2': [0, 1, 0, ...],
    'y':  [5, 0, 3, ...],
})

Case 1: X has shape (n₁, n₂, …, p) (e.g., (n₁, n₂, n₃, p) for three-way models).
Case 2: omit X and supply eval_X(indices) → ℝ^p instead.

Polyads vs. PPML

Method Complexity Best for Pros Cons
Polyads O(|E|²) D ≥ 3, sparse networks Valid inference; Faster for sparse networks Quadratic in |E|; costly when dense
PPML O(N) D = 2, dense panels Linear runtime for dense networks Biased CIs for D ≥ 3; requires all edges (incl. zeros)

Citation

@misc{resende2025polyads, title={Statistical Inference in Large Multi-way Networks}, author={Lucas Resende and Guillaume Lecué and Lionel Wilner and Philippe Choné}, year={2025}, eprint={2512.02203}, archivePrefix={arXiv}, primaryClass={econ.EM}, url={https://arxiv.org/abs/2512.02203}, }
← Back to homepage