Register Any Point:
Scaling 3D Point Cloud Registration by Flow Matching

1University of Bonn 2Stanford University

Accomplish single-stage multi-view point cloud registration at various scales by flow matching-based generation

Overview

our teaser figure

Our method for scalable multi-view point cloud registration. To register multiple unposed point clouds, prior work typically first performs correspondence matching and then optimizes a pose graph (top-left). In contrast, we introduce a single-stage model that directly generates the registered point cloud via flow matching in Euclidean space (top-right), bypassing the need for explicit correspondence matching and pose graph optimization. Our model generalizes across diverse point cloud data from object-centric, indoor, and outdoor scenarios at scan, sub-map, and map levels (bottom).

Abstract

Point cloud registration aligns multiple unposed point clouds into a common frame, and is a core step for 3D reconstruction and robot localization. In this work, we cast registration as conditional generation: a learned continuous, point-wise velocity field transports noisy points to a registered scene, from which the pose of each view is recovered. Unlike previous methods that conduct correspondence matching to estimate the transformation between a pair of point clouds and then optimize the pairwise transformations to realize multi-view registration, our model directly generates the registered point cloud. With a lightweight local feature extractor and test-time rigidity enforcement, our approach achieves state-of-the-art results on pairwise and multi-view registration benchmarks, particularly with low overlap, and generalizes across scales and sensor modalities. It further supports downstream tasks including relocalization, multi-robot SLAM, and multi-session map merging.

Method

our method overview figure

Overview of our approach to multi-view point cloud registration. Starting with unposed point clouds, we sample points with corresponding local features. We use a diffusion transformer with alternating-attention blocks for conditional flow matching that generates the aggregated point cloud from Gaussian noise. Finally, we recover the individual transformations using SVD from the aggregated point cloud and apply them to the original unposed point clouds to get the registered point clouds.

Acknowledgments

This work is built upon Rectified Point Flow. We also thank the authors of the following works: BUFFER-X and GARF.