Utilizing TabPFN for Multi-Instance Data with Scarce Labels

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

Tabular data is abundant in critical applications such as science, healthcare, finance, energy, and many other industries, making advances in tabular learning highly influential and interesting for the research community. However, in heavy industry applications we are often presented with a special class of tabular regression problems which are not commonly studied. These multi-instance single-target tabular data problems, originate from the difficulty and cost of taking regular measurements during a production process. In this setting, we have to deal with high-dimensional inputs, in combination with scarce labels. While foundation models such as TabPFN show strong results on suitable datasets, their applicability and performance on multi-instance single-target data is limited by memory and runtime constraints when the number of instances grows. In this paper, we propose a cluster-based dimensionality reduction, which compresses multi-instance measurements by splitting them according to the most relevant cluster constructed from the training set. This approach reduces computational overhead while preserving predictive performance, enabling inference for multi-instance datasets. Our experiments demonstrate that the proposed method extends the practical reach of TabPFN, achieving improved performance across multiple datasets.
Original languageEnglish
Title of host publicationEurIPS, AITD Workshop
Publication statusPublished - 2025
EventEurIPS 2025 Workshop, AITD 2025: AI for Tabular Data - Copenhagen, Denmark
Duration: 6 Dec 20256 Dec 2025

Conference

ConferenceEurIPS 2025 Workshop, AITD 2025
Country/TerritoryDenmark
CityCopenhagen
Period6/12/256/12/25

Fields of Expertise

  • Information, Communication & Computing

Fingerprint

Dive into the research topics of 'Utilizing TabPFN for Multi-Instance Data with Scarce Labels'. Together they form a unique fingerprint.

Cite this