Skip to main navigation Skip to search Skip to main content

Learning robust representations for visual reinforcement learning via task-relevant mask sampling

Research output: Contribution to journalArticlepeer-review

Abstract

Humans excel at isolating relevant information from noisy data to predict the behavior of dynamic systems, effectively disregarding non-informative, temporally-correlated noise. In contrast, existing visual reinforcement learning algorithms face challenges in generating noise-free predictions within high-dimensional, noise-saturated environments, especially when trained on world models featuring realistic background noise extracted from natural video streams. We propose Task Relevant Mask Sampling (TRMS), a novel approach for identifying task-specific and reward-relevant masks. TRMS utilizes existing segmentation models as a masking prior, which is subsequently followed by a mask selector that dynamically identifies subset of masks at each timestep, selecting those most probable to contribute to task-specific rewards. To mitigate the high computational cost associated with these masking priors, a lightweight student network is trained in parallel. This network learns to perform masking independently and replaces the Segment Anything Model~(SAM)-based teacher network after a brief initial phase (<10-25% of total training). TRMS enhances the generalization capabilities of Soft Actor-Critic agents under distractions, achieves better performance on the RL-Vigen benchmark, which includes challenging variants of the DeepMind Control Suite, Dexterous Manipulation and Quadruped Locomotion tasks.
Original languageEnglish
Number of pages25
JournalTransactions on Machine Learning Research
Publication statusPublished - 2025

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fields of Expertise

  • Information, Communication & Computing

Fingerprint

Dive into the research topics of 'Learning robust representations for visual reinforcement learning via task-relevant mask sampling'. Together they form a unique fingerprint.

Cite this