CMU researchers are presenting 194 papers at the Fourteenth International Conference on Learning Representations (ICLR 2026), held from April 23rd-April 27th at the Riocentro Convention and Event Center in Rio de Janeiro, Brazil. Here is a quick overview of the areas our researchers are working on:
Here are our most frequent collaborator institutions:
Table of Contents
Oral Papers
Poster Papers
Applications
Computer Vision
Deep Learning
General Machine Learning
Optimization
Reinforcement Learning
Social Aspects
Theory
Uncategorized
Oral Papers
EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
Authors: Wayne Chi (CMU), Valerie Chen (Carnegie Mellon University), Ryan Shar (Apple), Aditya Mittal (CMU, Carnegie Mellon University), Jenny Liang (School of Computer Science, Carnegie Mellon University), Wei-Lin Chiang (UC Berkeley / LMSYS), Anastasios Angelopoulos (University of California Berkeley), Ion Stoica (), Graham Neubig (Carnegie Mellon University), Ameet Talwalkar (University of California-Los Angeles), Chris Donahue (CMU / Google DeepMind)
This work introduces EditBench, a new benchmark for testing how well AI models can edit existing code based on user instructions. Unlike prior benchmarks, it uses real-world coding tasks and contexts, including things like the surrounding code and cursor position. The benchmark includes 545 diverse problems, and results show that most models struggle—only a few achieve strong performance. The study also finds that having more realistic context significantly impacts how well models perform, highlighting the importance of evaluating code-editing in real-world settings.
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Authors: Jinchuan Tian (CMU, Carnegie Mellon University), Sang-gil Lee (NVIDIA), Zhifeng Kong (NVIDIA), Sreyan Ghosh (Nvidia), Arushi Goel (NVIDIA), Chao-Han Huck Yang (NVIDIA Research), Wenliang Dai (NVIDIA), Zihan Liu (Nvidia), Hanrong Ye (NVIDIA), Shinji Watanabe (Carnegie Mellon University), Mohammad Shoeybi (NVIDIA), Bryan Catanzaro (NVIDIA), Rafael Valle (NVIDIA), Wei Ping (Nvidia)
This paper introduces the Unified Audio Language Model (UALM), a single model designed to handle audio understanding, text-to-audio generation, and multimodal reasoning together. Instead of treating these as separate tasks, UALM learns to both interpret and generate audio, achieving performance comparable to specialized state-of-the-art models. The authors also show that combining text and audio during the model’s reasoning process improves its ability to handle complex tasks. Overall, the work demonstrates a step toward more general AI systems that can reason across both language and sound.
CMU researchers are presenting 194 papers at the Fourteenth International Conference on Learning Representations (ICLR 2026), held from April 23rd-April 27th at the Riocentro Convention and Event Center in Rio de Janeiro, Brazil. Here is a quick overview of the areas our researchers are working on:
Here are our most frequent collaborator institutions:
Table of Contents
Oral Papers
Poster Papers
Applications
Computer Vision
Deep Learning
General Machine Learning
Optimization
Reinforcement Learning
Social Aspects
Theory
Uncategorized
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Authors: Yueqi Song (CMU), Ketan Ramaneti (Amazon), Zaid Sheikh (Carnegie Mellon University), Ziru Chen (Ohio State University, Columbus), Boyu Gou (Ohio State University, Columbus), Tianbao Xie (the University of Hong Kong, University of Hong Kong), Yiheng Xu (University of Hong Kong), Danyang Zhang (Shanghai Jiao Tong University), Apurva Gandhi (Carnegie Mellon University), Fan Yang (Fujitsu), Joseph Liu (School of Computer Science, Carnegie Mellon University), Tianyue Ou (Carnegie Mellon University), Zhihao Yuan (Carnegie Mellon University), Frank F Xu (Carnegie Mellon University), Shuyan Zhou (Facebook), Xingyao Wang (All Hands AI), Xiang Yue (Carnegie Mellon University), Tao Yu (University of Hong Kong), Huan Sun (Ohio State University), Yu Su (Ohio State University), Graham Neubig (Carnegie Mellon University)
This work introduces the Agent Data Protocol (ADP), a standardized format for representing training data for AI agents. The authors argue that the main challenge isn’t a lack of data, but that existing datasets are fragmented across different formats and tools. ADP acts as a common “interlingua,” making it easier to combine diverse data sources—like coding, browsing, and tool use—into a single training pipeline. By converting 13 datasets into this unified format, the authors show that models trained on the combined data achieve improved performance.
MotionStream: Real-Time Video Generation with Interactive Motion Controls
Authors: Joonghyuk Shin (Seoul National University), Zhengqi Li (Google), Richard Zhang (Adobe), Jun-Yan Zhu (Carnegie Mellon University), Jaesik Park (Seoul National University), Eli Shechtman (Adobe), Xun Huang (Adobe Research)
This paper introduces MotionStream, a system for generating videos in real time based on motion and text inputs. Unlike prior methods that take minutes to produce a video, MotionStream can stream results at up to 29 frames per second on a single GPU. The key idea is to train a fast, causal model that can generate video continuously, using techniques that prevent quality from degrading over long sequences. As a result, users can interactively control motion—like drawing paths or moving a camera—and see the video update instantly.
OpenThoughts: Data Recipes for Reasoning Models
Authors: Etash Guha (Stanford University, Anthropic), Ryan Marten (Harbor), Sedrick Keh (Toyota Research Institute), Negin Raoof (University of California, Berkeley), Georgios Smyrnis (University of Texas, Austin), Hritik Bansal (University of California, Los Angeles), Marianna Nezhurina (Juelich Supercomputing Center, LAION, Tuebingen University), Jean Mercat (Toyota Research Institute (TRI)), Trung Vu (Google), Zayne Sprague (New York University), Ashima Suvarna (UCLA), Benjamin Feuer (Stanford University), Leon Liangyu Chen (Stanford University), Zaid Khan (University of North Carolina at Chapel Hill), Eric Frankel (Department of Computer Science, University of Washington), Sachin Grover (Arizona State University), Caroline Choi (None), Niklas Muennighoff (Stanford University), Shiye Su (Stanford University), Wanjia Zhao (Stanford University), John Yang (Princeton University), Shreyas Pimpalgaonkar (New York University), Kartik sharma (Georgia Institute of Technology), Charlie Ji (University of California, Berkeley), Yichuan Deng (Department of Computer Science, University of Washington), Sarah Pratt (University of Washington), Vivek Ramanujan (Department of Computer Science, University of Washington), Jon Saad-Falcon (Computer Science Department, Stanford University), Stutee Acharya (University of South Florida), Jeffrey Li (Carnegie Mellon University), Achal Dave (Anthropic), Alon Albalak (SynthLabs), Kushal Arora (McGill University), Blake Wulfe (Toyota Research Institute), Chinmay Hegde (New York University), Greg Durrett (New York University), Sewoong Oh (University of Washington), Mohit Bansal (UNC Chapel Hill), Saadia Gabriel (University of Washington), Aditya Grover (UCLA), Kai-Wei Chang (University of Virginia Main Campus), Vaishaal Shankar (Apple), Aaron Gokaslan (Cornell University), Mike Merrill (None), Tatsunori Hashimoto (Stanford University), Yejin Choi (Stanford University / NVIDIA), Jenia Jitsev (LAION; Juelich Supercomputing Center, Research Center Juelich), Reinhard Heckel (Technical University Munich), Maheswaran Sathiamoorthy (University of Southern California), Alex Dimakis (Electrical Engineering & Computer Science Department, University of California, Berkeley), Ludwig Schmidt (University of Washington / Stanford / Anthropic)
This work introduces the OpenThoughts project, which aims to create high-quality, open-source datasets for training reasoning-focused AI models. The authors show that models trained on their public data can match or exceed the performance of strong existing systems that rely on private datasets. By carefully studying and improving their data generation process, they build larger and better datasets that significantly boost performance across math, coding, and science benchmarks. Overall, the project demonstrates that open data alone can be enough to train highly capable reasoning models.
Mamba-3: Improved Sequence Modeling using State Space Principles
Authors: Aakash Sunil Lahoti (CMU, Carnegie Mellon University), Kevin Li (Carnegie Mellon University), Berlin Chen (Princeton University), Caitlin Wang (Princeton University), Aviv Bick (Carnegie Mellon University), Zico Kolter (Carnegie Mellon University), Tri Dao (Princeton University), Albert Gu (Cartesia AI CMU)
This paper introduces Mamba-3, a new model designed to make AI inference faster and more efficient without sacrificing performance. While many efficient alternatives to Transformers reduce computation, they often struggle with tasks like tracking long-term information; Mamba-3 addresses this with improved state modeling and a more expressive update mechanism. The model also uses a multi-input, multi-output design to boost accuracy without slowing down generation. Overall, Mamba-3 shows that it’s possible to improve both efficiency and capability at the same time, pushing forward the tradeoff between speed and performance.
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
Authors: Yuxuan Zhou (Independent Researcher), Fei Huang (Alibaba Group), Heng Li (Carnegie Mellon University), Fengyi Wu (University of Washington), Tianyu Wang (University of Washington), Jianwei Zhang (Alibaba Group), Junyang Lin (Alibaba Group), Zhi-Qi Cheng (University of Washington)
This paper introduces Hierarchical Speculative Decoding (HSD), a new method to speed up large language model inference by improving the verification step in speculative decoding while preserving exact output distributions. It addresses the challenge of “joint intractability” in sequence-level verification by organizing resampling into a hierarchy that redistributes probability mass across branches, enabling more tokens to be accepted at once. The approach is theoretically proven to be lossless and empirically shows consistent speed improvements across models and benchmarks, outperforming prior tokenwise and blockwise verification methods. Overall, HSD offers a practical and general way to accelerate decoding without sacrificing fidelity, achieving state-of-the-art efficiency when integrated into existing frameworks.
Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
Authors: Haoyue Dai (Carnegie Mellon University), Immanuel Albrecht (FernUniversität in Hagen), Peter Spirtes (Carnegie Mellon University), Kun Zhang (Carnegie Mellon University & MBZUAI)
This paper studies causal discovery in linear non-Gaussian models with latent variables and cycles, focusing on when different causal graphs are observationally indistinguishable. It provides the first general characterization of distributional equivalence in this setting, introducing new tools—especially edge rank constraints—to describe when two models generate the same observed data. Building on this theory, the authors derive practical graphical criteria and transformations to enumerate all equivalent models and propose an algorithm to recover the entire equivalence class from data. Overall, the work removes the need for strong structural assumptions and offers a general, principled framework for latent-variable causal discovery.
Revela: Dense Retriever Learning via Language Modeling
Authors: Fengyu Cai (Technische Universität Darmstadt), Tong Chen (University of Washington), Xinran Zhao (Carnegie Mellon University), Sihao Chen (Microsoft), Hongming Zhang (Tencent AI Lab Seattle), Sherry Wu (Carnegie Mellon University), Iryna Gurevych (Technical University of Darmstadt / Mohamed bin Zayed University of Artificial Intelligence), Heinz Koeppl (TU Darmstadt)
This paper introduces Revela, a self-supervised framework for training dense retrievers by leveraging language modeling objectives instead of relying on annotated query-document pairs. It augments next-token prediction with an in-batch attention mechanism that allows documents to attend to each other, enabling the retriever to learn cross-document relationships jointly with a language model. Experiments across domain-specific, reasoning-intensive, and general benchmarks show that Revela matches or surpasses supervised and API-based retrievers while using significantly less data and compute. Overall, the work demonstrates a scalable and efficient alternative for retriever learning directly from raw text with strong generalization across domains.
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
Authors: Tal Daniel (Carnegie Mellon University), Carl Qi (University of Texas at Austin), Dan Haramati (Brown University), Amir Zadeh (Lambda), Chuan Li (Lambda Labs), Aviv Tamar (Technion), Deepak Pathak (Carnegie Mellon University), David Held (Carnegie Mellon University)
This paper introduces the Latent Particle World Model (LPWM), a self-supervised, object-centric world model that learns to decompose scenes into latent particles (e.g., keypoints, masks, and object attributes) directly from raw video without supervision. It proposes a novel per-particle latent action mechanism that models stochastic dynamics, enabling the system to capture complex multi-object interactions and generate diverse future predictions. The model is trained end-to-end and supports flexible conditioning on actions, language, and goal images, achieving state-of-the-art performance on both real-world and synthetic video prediction tasks. Beyond video modeling, LPWM also demonstrates strong potential for decision-making applications such as imitation learning by leveraging its learned latent dynamics.
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
Authors: Siyuan Wang (Shanghai Jiao Tong University), Gaokai Zhang (Carnegie Mellon University), Li Lyna Zhang (Microsoft Research Asia), Ning Shang (Microsoft), Fan Yang (Microsoft Research), Dongyao Chen (Shanghai Jiaotong University), Mao Yang (Peking University)
The authors introduce LoongRL, a reinforcement learning framework designed to improve long-context reasoning in large language models by training them on challenging, synthesized tasks. They propose KeyChain, a data construction method that embeds hidden question chains within long documents, forcing models to perform multi-step planning, retrieval, and reasoning rather than relying on shortcuts. Through RL training, models develop an emergent “plan–retrieve–reason–recheck” reasoning pattern that generalizes from shorter (16K) to much longer (128K) contexts. Experiments show that LoongRL significantly boosts long-context reasoning performance while maintaining strong short-context abilities, achieving results comparable to much larger models.
Exchangeability of GNN Representations with Applications to Graph Retrieval
Authors: Kartik Nair (Carnegie Mellon University), Indradyumna Roy (IIT Bombay, Aalto University), Soumen Chakrabarti (IIT Bombay), Anirban Dasgupta (IIT Gandhinagar), Abir De (Indian Institute of Technology Bombay)
This paper introduces the concept of exchangeability in graph neural networks (GNNs), showing that the dimensions of learned node embeddings are statistically interchangeable due to random initialization and permutation-invariant training. This property implies that embedding components share identical distributions, enabling simplifications in how graph similarities are computed. Leveraging this insight, the authors approximate complex transportation-based graph distances using simpler Euclidean operations on sorted embedding values. They further propose GRAPHHASH, a locality-sensitive hashing framework that enables efficient and scalable graph retrieval, achieving strong performance compared to existing methods.
Poster Papers
Applications
TusoAI: Agentic Optimization for Scientific Methods
Authors: Alistair Turcan (School of Computer Science, Carnegie Mellon University), Kexin Huang (Stanford University), Lei Li (School of Computer Science, Carnegie Mellon University), Martin J. Zhang (Carnegie Mellon University)
AutoLibra: Agent Metric Induction from Open-Ended Human Feedback
Authors: Hao Zhu (Carnegie Mellon University), Phil Cuvin (Stanford University), Xinkai Yu (University of Pennsylvania, University of Pennsylvania), Charlotte Yan (Stanford University), Jason Zhang (Stanford University), Diyi Yang (Stanford University)
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
Authors: Kevin Han (Carnegie Mellon University), Bowen Deng (UC Berkeley), Amir Barati Farimani (CMU, Carnegie Mellon University), Gerbrand Ceder (University of California, Berkeley)
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Authors: Ganlin Yang (University of Science and Technology of China), Tianyi Zhang (Zhejiang University; Shanghai Artificial Intelligence Laboratory), Haoran Hao (Carnegie Mellon University), Weiyun Wang (Fudan University), Yibin Liu (Northeastern University), Dehui Wang (Shanghai Jiaotong University), Guanzhou Chen (Shanghai AI Laboratory, Shanghai Jiaotong University), Zijian Cai (Shenzhen University), Junting Chen (national university of singaore, National University of Singapore), Weijie Su (University of Science and Technology of China), Wengang Zhou (University of Science and Technology of China), Yu Qiao (Shanghai Aritifcal Intelligence Laboratory), Jifeng Dai (Tsinghua University, Tsinghua University), Jiangmiao Pang (Shanghai AI Laboratory), Gen Luo (Shanghai AI Laboratory), Wenhai Wang (Shanghai AI Laboratory), Yao Mu (Shanghai Jiao Tong University), Zhi Hou (Shanghai Artificial Intelligence Laboratory)
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Authors: Shivin Dass (University of Texas at Austin), Alaa Khaddaj (OpenAI), Logan Engstrom (Massachusetts Institute of Technology), Aleksander Madry (OpenAI), Andrew Ilyas (Carnegie Mellon University), Roberto Martín-Martín (University of Texas at Austin)
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
Authors: Malgorzata Gwiazda (Technical University of Munich), Yifu Cai (Millennium Management LLC), Mononito Goswami (Carnegie Mellon University), Arjun Choudhry (Georgia Institute of Technology), Artur Dubrawski (Carnegie-Mellon University)
MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation
TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis
Authors: Vijay Ekambaram (IBM), Subodh Kumar (International Business Machines), Arindam Jati (International Business Machines (IBM)), Sumanta Mukherjee (International Business Machines), Tomoya Sakai (International Business Machines), Pankaj Dayama (International Business Machines), Wesley Gifford (IBM Research), Jayant Kalagnanam (Carnegie Mellon University)
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation
Authors: Yash Jangir (Carnegie Mellon University), Yidi Zhang (), Kashu Yamazaki (CMU, Carnegie Mellon University), Chenyu Zhang (Peking University), Kuan-Hsun Tu (National Taiwan University), Tsung-Wei Ke (Department of computer science and informational engineering, National Taiwan University), Lei Ke (Carnegie Mellon University), Yonatan Bisk (Carnegie Mellon University), Katerina Fragkiadaki (CMU)
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
Authors: Weihua Du (Tsinghua University), HaileiGong (Huawei Technologies Ltd.), Zhan Ling (UC San Diego), Kang Liu (ByteDance Inc.), Lingfeng Shen (Johns Hopkins University), Xuesong Yao (ByteDance Inc.), Yufei Xu (ByteDance Inc.), Dingyuan Shi (ByteDance Inc.), Yiming Yang (Carnegie Mellon University), Jiecao Chen (ByteDance Inc.)
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
Authors: Zhaojiang Lin (Meta), YONG XU (Meta), Kai Sun (Meta), Jing Zheng (Ant Group), Yin Huang (Facebook), Surya Appini (Meta), Krish Narang (Facebook), Renjie Tao (Facebook), Ishan Jain (Facebook), Siddhant Arora (Carnegie Mellon University), Ruizhi Li (Facebook), Yiteng Huang (Facebook), Kaushik Patnaik (Apple), Wenfang Xu (Meta Platforms, Inc.), Suwon Shon (ASAPP), Yue Liu (Meta), Ahmed Aly (Facebook), Anuj Kumar (Meta), Florian Metze (Carnegie Mellon University), Xin Dong (Facebook)
A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex
Authors: Rabia Gondur (Cold Spring Harbor Laboratory), Patricia Stan (CMU, Carnegie Mellon University), Matthew A Smith (Carnegie Mellon University), Benjamin Cowley (Cold Spring Harbor Laboratory)
FictionalQA: A Dataset for Studying Memorization and Knowledge Acquisition
Authors: John Kirchenbauer (University of Maryland, College Park), Natjanan Mongkolsupawan (Carnegie Mellon University), Yuxin Wen (University of Maryland), Tom Goldstein (University of Maryland), Daphne Ippolito (School of Engineering and Applied Science, University of Pennsylvania)
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
Authors: Justin Lin (Computer Science Department, Stanford University), Eliot Jones (Gray Swan), Donovan Jasper (Stanford University), Ethan Ho (Stanford University), Anna Wu (Computer Science Department, Stanford University), Arnold Yang (Stanford University), Neil Perry (Princeton University), Andy Zou (CMU, Carnegie Mellon University), Matt Fredrikson (University of Wisconsin, Madison), Zico Kolter (Carnegie Mellon University), Percy Liang (Stanford University), Dan Boneh (Stanford University), Daniel Ho (Stanford University)
Bound by semanticity: universal laws governing the generalization-identification tradeoff
Authors: Marco Nurisso (Polytechnic University of Turin), Jesseba Fernando (Northeastern University), Raj Deshpande (Northeastern University London), Alan Perotti (Intesa Sanpaolo AI Research), Raja Marjieh (Princeton University), Steven Frankland (Dartmouth College), Richard Lewis (Carnegie Mellon University), Taylor Webb (University of California, Los Angeles), Declan Campbell (Princeton University), Francesco Vaccarino (Politecnico di Torino), Jonathan Cohen (Princeton University), Giovanni Petri (Network Science Institute, Northeastern University London)
Zero-shot Forecasting by Simulation Alone
Authors: Boris Oreshkin (Amazon), Mayank Jauhari (Amazon), Ravi Kiran Selvam (Amazon), Malcolm Wolff (Amazon), Wenhao Pan (University of Washington), Shankar Ramasubramanian (Amazon), KIN GUTIERREZ (Carnegie Mellon University), Tatiana Konstantinova (Amazon), Andres Potapczynski (New York University), Mengfei Cao (Amazon.com), Dmitry Efimov (Amazon), Michael W Mahoney (University of California Berkeley), Andrew Gordon Wilson (New York University)
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
Authors: Wenli Xiao (Carnegie Mellon University), Haotian Lin (CMU, Carnegie Mellon University), Andy Peng (University of California, Berkeley), Haoru Xue (University of California, Berkeley), Tairan He (NVIDIA), Zhengyi Luo (Carnegie Mellon University), Yuqi Xie (NVIDIA), Fengyuan Hu (NVIDIA), Jim Fan (NVIDIA), Guanya Shi (CMU, Carnegie Mellon University), Yuke Zhu (NVIDIA / UT-Austin)
Improving Attributed Long-form Question Answering with Intent Awareness
Authors: Xinran Zhao (CMU, Carnegie Mellon University), Aakanksha Naik (Allen Institute for Artificial Intelligence), Jay DeYoung (Allen Institute for Artificial Intelligence), Joseph Chee Chang (Allen Institute for Artificial Intelligence), Jena Hwang (Allen Institute for Artificial Intelligence), Sherry Wu (Carnegie Mellon University), Varsha Kishore (Cornell University)
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Authors: Yuxiao Qu (Carnegie Mellon University), Anikait Singh (Stanford University), Yoonho Lee (Stanford University), Amrith Setlur (Carnegie Mellon University), Russ Salakhutdinov (CMU), Chelsea Finn (Stanford University, Physical Intelligence), Aviral Kumar (University of California Berkeley)
Measuring LLM Novelty As The Frontier Of Original And High-Quality Output
Authors: Vishakh Padmakumar (Stanford University), Chen Yueh-Han (New York University), Jane Pan (New York University), Valerie Chen (Carnegie Mellon University), He He (New York University)
BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
Authors: Yitang Li (), Zhengyi Luo (Carnegie Mellon University), Tonghe Zhang (Carnegie Mellon University), Cunxi Dai (Carnegie Mellon University), Anssi Kanervisto (Microsoft Research), Andrea Tirinzoni (Meta, FAIR), Haoyang Weng (Tsinghua University, Tsinghua University), Kris Kitani (Carnegie Mellon University), Mateusz Guzek (Meta AI), Ahmed Touati (Meta AI Research), Alessandro Lazaric (Facebook), Matteo Pirotta (Meta), Guanya Shi (CMU, Carnegie Mellon University)
CaTS: Calibrated Test-Time Scaling for Efficient LLM Reasoning
Authors: Chengsong Huang (Washington University, Saint Louis), Langlin Huang (Washington University, Saint Louis), Jixuan Leng (Carnegie Mellon University), Jiacheng Liu (NVIDIA), Jiaxin Huang (Washington University in St. Louis)
Real-Time Reasoning Agents in Evolving Environments
Authors: Yule Wen (Tsinghua University, Tsinghua University), Yixin Ye (Shanghai Jiaotong University), Yanzhe Zhang (Georgia Institute of Technology), Diyi Yang (Stanford University), Hao Zhu (Carnegie Mellon University)
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists
Authors: Jie Ruan (University of Michigan – Ann Arbor), Inderjeet Nair (University of Michigan – Ann Arbor), Shuyang Cao (Bloomberg), Amy Liu (University of Michigan), Sheza Munir (University of Toronto), Micah Pollens-Dempsey (University of Michigan – Ann Arbor), Yune-Ting Chiang (University of Michigan – Ann Arbor), Lucy Kates (University of Michigan – Ann Arbor), Nicholas David (University of Michigan – Ann Arbor), Sihan Chen (Carnegie Mellon University), Ruxin Yang (University of Michigan – Ann Arbor), Yuqian Yang (University of Michigan – Ann Arbor), Jihyun Gump (University of Michigan – Ann Arbor), Tessa Bialek (University of Michigan Law School), Vivek Sankaran (University of Michigan – Ann Arbor), Margo Schlanger (University of Michigan – Ann Arbor), Lu Wang (University of Michigan)
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking
Authors: Gyeongwon J Kim (Carnegie Mellon University), Alex Wilf (Carnegie Mellon University), Louis-Philippe Morency (Carnegie Mellon University), Daniel Fried (Carnegie Mellon University)
PRISM: Enhancing PRotein Inverse Folding through Fine- Grained Retrieval on Structure-Sequence Multimodal Representations
Authors: Sazan Mahbub (Carnegie Mellon University School of Computer Science), Souvik Kundu (Intel), Eric P Xing (CMU)
MAPSS: Manifold-based Assessment of Perceptual Source Separation
Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives
Authors: Zihan Wang (Amazon), Jiashun Wang (School of Computer Science, Carnegie Mellon University), Jeff Tan (Carnegie Mellon University), Yiwen Zhao (School of Computer Science, Carnegie Mellon University), Jessica Hodgins (RAI Institute), Shubham Tulsiani (Carnegie Mellon University), Deva Ramanan (School of Computer Science, Carnegie Mellon University)
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Authors: Junlong Li (The Hong Kong University of Science and Technology), Wenshuo Zhao (Zhejiang University), Jian Zhao (Beijing University of Posts and Telecommunications), Weihao Zeng (Hong Kong University of Science and Technology), Haoze Wu (Zhejiang University), Xiaochen Wang (None), Rui Ge (Shanghai Jiaotong University), Yuxuan Cao (HKUST), Yuzhen Huang (HKUST), Wei Liu (HKUST), Junteng LIU (HKUST), Zhaochen Su (The Hong Kong University of Science and Technology), Yiyang Guo (Fudan University), FAN ZHOU (Shanghai Jiao Tong University), Lueyang Zhang (The Hong Kong University of Science and Technology), Juan Michelini (Universidad de la República), Xingyao Wang (All Hands AI), Xiang Yue (Carnegie Mellon University), Shuyan Zhou (Facebook), Graham Neubig (Carnegie Mellon University), Junxian He (HKUST)
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
Authors: Yixian Zhang (Tsinghua University, Tsinghua University), Shu-ang Yu (Tsinghua University), Tonghe Zhang (Carnegie Mellon University), Mo Guang (Li Auto Inc.), Haojia Hui (Li Auto Inc.), Kaiwen Long (Li Auto Inc.), Yu Wang (Tsinghua Univ.), Chao Yu (Tsinghua University), Wenbo Ding (Tsinghua University, Tsinghua University)
Computer Vision
Multi-Object System Identification from Videos
Authors: Chunjiang Liu (Carnegie Mellon University), Xiaoyuan Wang (Carnegie Mellon University), Qingran Lin (Georgia Institute of Technology), Albert Xiao (Carnegie Mellon University), Haoyu Chen (Harvard University, Harvard University), Shizheng Wen (ETHZ – ETH Zurich), Hao Zhang (UIUC), Lu Qi (Insta360), Ming-Hsuan Yang (Google DeepMind), Laszlo A. Jeni (Carnegie Mellon University), Min Xu (Carnegie Mellon University), Yizhou Zhao (Snap Inc.)
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
Authors: Yixiang Dai (Carnegie Mellon University), Fan Jiang (AMAP, Alibaba), Chiyu Wang (Alibaba Group), Mu Xu (Alibaba Group), Yonggang Qi (Beijing University of Posts and Telecommunications)
Learning an Image Editing Model without Image Editing Pairs
Controllable Video Generation with Provable Disentanglement
Authors: Yifan Shen (Mohamed bin Zayed University of Artificial Intelligence), Peiyuan Zhu (Mohamed bin Zayed University of Artificial Intelligence), Zijian Li (Mohamed bin Zayed University of Artificial Intelligence), Shaoan Xie (Carnegie Mellon University), Namrata Deka (Carnegie Mellon University), Zongfang Liu (Zhejiang University), Zeyu Tang (Stanford University), Guangyi Chen (MBZUAI&CMU), Kun Zhang (Carnegie Mellon University & MBZUAI)
Virtual Community: An Open World for Humans, Robots, and Society
Authors: Qinhong Zhou (University of Massachusetts at Amherst), Hongxin Zhang (UMass Amherst), Xiangye Lin (University of Massachusetts at Amherst), Zheyuan Zhang (Johns Hopkins University), Yutian Chen (Carnegie Mellon University), Wenjun Liu (University of Massachusetts at Amherst), Zunzhe Zhang (Tsinghua University), Sunli Chen (University of Massachusetts at Amherst), Lixing Fang (University of Massachusetts at Amherst), Qiushi Lyu (University of Illinois, Urbana-Champaign), Xinyu Sun (South China University of Technology), Jincheng Yang (University of Maryland, College Park), Zeyuan Wang (Tsinghua University, Tsinghua University), Bao Dang (University of Massachusetts at Amherst), Zhehuan Chen (Peking University), Daksha Ladia (University of Massachusetts Amherst), Quang Dang (University of Massachusetts at Amherst), Jiageng Liu (University of Massachusetts at Amherst), Chuang Gan (MIT-IBM Watson AI Lab)
Faster Vision Transformers with Adaptive Patches
Authors: Rohan Choudhury (None), JungEun Kim (General Robotics), Jinhyung Park (Carnegie Mellon University), Eunho Yang (Korea Advanced Institute of Science & Technology), Laszlo A. Jeni (Carnegie Mellon University), Kris Kitani (Carnegie Mellon University)
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
VINCIE: Unlocking In-context Image Editing from Video
Authors: Leigang Qu (National University of Singapore), Feng Cheng (ByteDance Seed), Ziyan Yang (ByteDance Inc.), Qi Zhao (ByteDance Inc.), Shanchuan Lin (ByteDance), Yichun Shi (None), Yicong Li (National University of Singapore), Wenjie Wang (University of Science and Technology of China), Tat-Seng Chua (National University of Singapore), Lu Jiang (Carnegie Mellon University)
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
Authors: Isaac Robinson (Roboflow), Peter Robicheaux (Roboflow), Matvei Popov (Roboflow, Inc), Deva Ramanan (School of Computer Science, Carnegie Mellon University), Neehar Peri (Carnegie Mellon University)
lmgame-Bench: How Good are LLMs at Playing Games?
Authors: Lanxiang Hu (University of California, San Diego), Mingjia Huo (University of California, San Diego), Yuxuan Zhang (University of California, San Diego), Haoyang Yu (University of California San Diego), Eric P Xing (CMU), Ion Stoica (), Tajana Rosing (University of California, San Diego), Haojian Jin (None), Hao Zhang (University of California, San Diego)
ASCIIEval: Benchmarking Models’ Visual Perception in Text Strings via ASCII Art
Authors: Qi Jia (Shanghai Artificial Intelligence Laboratory), Xiang Yue (Carnegie Mellon University), Shanshan Huang (Guangzhou University), Ziheng Qin (Facebook), Yizhu Liu (Meituan), Bill Yuchen Lin (xAI), Yang You (National University of Singapore), Guangtao Zhai (Shanghai Jiao Tong University)
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
Authors: Yuyou Zhang (CMU, Carnegie Mellon University), Radu Corcodel (Mitsubishi Electric Research Labs), Chiori Hori (Mitsubishi Electric Research Labs), Anoop Cherian (Australian National University), DING ZHAO (Carnegie Mellon University)
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
Authors: Ming Zhao (Jilin University), Wenhui Dong (NanJing University), Yang Zhang (Chinese People’s Liberation Army General Hospital), wangyou (University of the Chinese Academy of Sciences), Zhonghao Zhang (Ningxia University), Zian Zhou (Zhejiang University), YUNZHI GUAN (Fudan University), Liukun Xu (Nanjing Medical University), Wei Peng (Stanford University), Zhaoyang Gong (Fudan University), Zhicheng Zhang (Chinese People’s Liberation Army General Hospital), Dachuan li (Fudan University), Xiaosheng Ma (Fudan University), Yuli Ma (Peking University), Jianing Ni (Carnegie Mellon University), Changjiang Jiang (Ant Group), Lixia Tian (Beijing Jiaotong University), Chen Qixin (Zhejiang University), Xia Kaishun (Zhejiang University of Technology), Pingping Liu (Jilin University), Tongshun Zhang (Jilin University), ZhiqiangLiu (Huazhong University of Science and Technology), Zhongan Bi (Zhejiang Lab), Chenyang Si (Nanyang Technological University), Tiansheng Sun (Chinese People’s Liberation Army General Hospital), Caifeng Shan (Nanjing University)
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Authors: Jianyi Wang (Nanyang Technological University), Shanchuan Lin (ByteDance), Zhijie Lin (Zhejiang University), Yuxi Ren (ByteDance Inc.), Meng Wei (ByteDance Inc.), Zongsheng Yue (Xi’an Jiaotong University), Shangchen Zhou (Nanyang Technological University), Hao Chen (ByteDance Inc.), Yang Zhao (Bytedance Inc.), Ceyuan Yang (ByteDance), Xuefeng Xiao (ByteDance), Chen Change Loy (Nanyang Technological University), Lu Jiang (Carnegie Mellon University)
Mixture of Contexts for Long Video Generation
Authors: Shengqu Cai (Stanford University), Ceyuan Yang (ByteDance), Lvmin Zhang (Stanford University), Yuwei Guo (The Chinese University of Hong Kong), Junfei Xiao (Johns Hopkins University), Ziyan Yang (ByteDance Inc.), Yinghao Xu (Stanford University), Zhenheng Yang (Tiktok), Alan Yuille (Johns Hopkins University), Leonidas Guibas (Stanford University), Maneesh Agrawala (Stanford University), Lu Jiang (Carnegie Mellon University), Gordon Wetzstein (Stanford University)
pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
Authors: Zhanpeng Luo (University of Pittsburgh), Ce Zhang (Carnegie Mellon University), Silong Yong (Department of Automation, Tsinghua University, Tsinghua University), Cunxi Dai (Carnegie Mellon University), Qianwei Wang (University of Michigan – Ann Arbor), Haoxi Ran (Carnegie Mellon University), Guanya Shi (CMU, Carnegie Mellon University), Katia Sycara (Carnegie Mellon University), Yaqi Xie (Carnegie Mellon University)
Sharp Monocular View Synthesis in Less Than a Second
Authors: Lars Mescheder (Apple), Wei Dong (Apple), Shiwei Li (Apple), Xuyang BAI (Apple), Marcel Santos (Apple), Peiyun Hu (Carnegie Mellon University), Bruno Lecouat (Telecom ParisTech), Mingmin Zhen (Apple), Amaël Delaunoy (Apple), Tian Fang (Hong Kong University of Science and Technology), Yanghai Tsin (Apple), Stephan Richter (Apple), Vladlen Koltun (Apple)
S2GO: Streaming Sparse Gaussian Occupancy
Authors: Jinhyung Park (Carnegie Mellon University), Chensheng Peng (University of California, Berkeley), yihan hu (Applied Intuition), Wenzhao Zheng (UC Berkeley), Kris Kitani (Carnegie Mellon University), Wei Zhan (University of California Berkeley)
Captain Cinema: Towards Short Movie Generation
Authors: Junfei Xiao (Johns Hopkins University), Ceyuan Yang (ByteDance), Lvmin Zhang (Stanford University), Shengqu Cai (Stanford University), Yang Zhao (Bytedance Inc.), Yuwei Guo (The Chinese University of Hong Kong), Gordon Wetzstein (Stanford University), Maneesh Agrawala (Stanford University), Alan Yuille (Johns Hopkins University), Lu Jiang (Carnegie Mellon University)
Deep Learning
Chimera: State Space Models Beyond Sequences
Authors: Aakash Sunil Lahoti (CMU, Carnegie Mellon University), Tanya Marwah (CMU), (None), Albert Gu (Cartesia AI CMU)
VisCoder2: Building Multi-Language Visualization Coding Agents
Authors: Yuansheng Ni (University of Waterloo), Songcheng Cai (University of Waterloo), Xiangchao Chen (University of Waterloo), Jiarong Liang (University of Waterloo), Zhiheng LYU (University of Hong Kong), Jiaqi Deng (Korea Advanced Institute of Science & Technology), Kai Zou (NetMind.AI), PING NIE (Peking University), Fei Yuan (Shanghai Artificial Intelligent Laboratory), Xiang Yue (Carnegie Mellon University), Wenhu Chen (University of Waterloo)
Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Authors: Syeda Nahida Akter (Carnegie Mellon University), Shrimai Prabhumoye (NVIDIA), Eric Nyberg (Carnegie Mellon University), Mostofa Patwary (NVIDIA), Mohammad Shoeybi (NVIDIA), Yejin Choi (Stanford University / NVIDIA), Bryan Catanzaro (NVIDIA)
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
Authors: Matthew Yang (Carnegie Mellon University), Hao Bai (University of Illinois at Urbana-Champaign), Ian Wu (Carnegie Mellon University), Gene Yang (Carnegie Mellon University), Amrith Setlur (Carnegie Mellon University), Aviral Kumar (University of California Berkeley)
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Authors: Sukjun Hwang (Carnegie Mellon University), Brandon Wang (onepot), Albert Gu (Cartesia AI CMU)
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Authors: Amrith Setlur (Carnegie Mellon University), Matthew Yang (Carnegie Mellon University), Charlie Snell (University of California, Berkeley), Jeremiah Greer (Oumi AI PBC), Ian Wu (Carnegie Mellon University), Virginia Smith (Carnegie Mellon University), Max Simchowitz (Massachusetts Institute of Technology), Aviral Kumar (University of California Berkeley)
FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel
Authors: Ran Yan (The Hong Kong University of Science and Technology), YOUHE JIANG (University of Cambridge), Zhuoming Chen (Carnegie Mellon University), Haohui Mai (Hong Kong University of Science and Technology), Beidi Chen (CMU, Carnegie Mellon University), Binhang Yuan (HKUST)
In Good GRACES: Principled Teacher Selection for Knowledge Distillation
Authors: Guo (), Songlin Yang (ShanghaiTech University), Tarushii Goel (Massachusetts Institute of Technology), Eric P Xing (CMU), Tri Dao (Princeton University), Yoon Kim (MIT)
Generalized Parallel Scaling with Interdependent Generations
Authors: Harry Dong (Carnegie Mellon University), David Brandfonbrener (NYU), Eryk Helenowski (Facebook), Yun He (Meta), Mrinal Kumar (Facebook), Han Fang (Meta GenAI), Yuejie Chi (Carnegie Mellon University), Karthik Abinav Sankararaman (Facebook)
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
Authors: Yinyi Luo (Carnegie Mellon University), Zhexian Zhou (CMU, Carnegie Mellon University), Hao Chen (Google DeepMind), Kai Qiu (CMU, Carnegie Mellon University), Marios Savvides (Carnegie Mellon University), Yixuan Li (University of Wisconsin, Madison), Jindong Wang (William & Mary)
General Machine Learning
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
Authors: Abdul Waheed (Maharaja Agrasen Institute of Technology, New Delhi), Zhen Wu (Carnegie Mellon University), Dareen Alharthi (LTI CMU), Seungone Kim (Carnegie Mellon University), Bhiksha Raj (Carnegie Mellon University)
Score-based Greedy Search for Structure Identification of Partially Observed Causal Models
Authors: Xinshuai Dong (CMU), Ignavier Ng (Carnegie Mellon University), Haoyue Dai (Carnegie Mellon University), Jiaqi Sun (CMU, Carnegie Mellon University), Xiangchen Song (Carnegie Mellon University), Peter Spirtes (Carnegie Mellon University), Kun Zhang (Carnegie Mellon University & MBZUAI)
On Code-Induced Reasoning in LLMs
Authors: Abdul Waheed (Maharaja Agrasen Institute of Technology, New Delhi), Zhen Wu (Carnegie Mellon University), Carolyn Rose (School of Computer Science, Carnegie Mellon University), Daphne Ippolito (School of Engineering and Applied Science, University of Pennsylvania)
Conditional Independent Component Analysis for Estimating Causal Structure with Latent Variables
Authors: Yewei Xia (Fudan University), Zhengming Chen (Guangdong University of Technology), Haoyue Dai (Carnegie Mellon University), Fuhong Wang (Guangdong University of Technology), Yixin Ren (Fudan University), Yiqing Li (MBZUAI), Kun Zhang (Carnegie Mellon University & MBZUAI), Shuigeng Zhou (Fudan University)
Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Authors: Akash Dhasade (EPFL), Divyansh Jhunjhunwala (Amazon), Milos Vujasinovic (EPFL), Gauri Joshi (Carnegie Mellon University), Anne-Marie Kermarrec (School of Computer and Communication Sciences, EPFL – EPF Lausanne)
Dual Perspectives on Non-Contrastive Self-Supervised Learning
Authors: Jean Ponce (NYU/ENS-PSL), Basile Terver (AMI Labs), Martial Hebert (Carnegie Mellon University), Michael Arbel (INRIA)
Multiple-Prediction-Powered Inference
Authors: Charlie Cowen-Breen (Massachusetts Institute of Technology), Alekh Agarwal (Google), Stephen Bates (Massachusetts Institute of Technology), William W. Cohen (Carnegie Mellon University), Jacob Eisenstein (Google), Amir Globerson (Google), Adam Fisch (Google DeepMind)
Command-V: Training-Free Representation Finetuning Transfer
Authors: Barry Wang (Carnegie Mellon University), Avi Schwarzschild (Carnegie Mellon University), Alexander Robey (CMU, Carnegie Mellon University), Ali Payani (Cisco Systems), Charles Fleming (Cisco), Mingjie Sun (School of Computer Science, Carnegie Mellon University), Daphne Ippolito (School of Engineering and Applied Science, University of Pennsylvania)
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Authors: Shahriar Noroozizadeh (Carnegie Mellon University), Xiaobin Shen (Carnegie Mellon University), Jeremy Weiss (National Library of Medicine), George H. Chen (Carnegie Mellon University)
Prompt-MII: Meta-Learning Instruction Induction for LLMs
Authors: Emily Xiao (Carnegie Mellon University), Yixiao Zeng (XPeng Motors / Carnegie Mellon University), Ada Chen (CMU, Carnegie Mellon University), Chin-Jou Li (Language Technologies Institute, Carnegie Mellon University), Amanda Bertsch (Carnegie Mellon University), Graham Neubig (Carnegie Mellon University)
Optimization
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
Authors: Shuhua Yu (Carnegie Mellon University), Dusan Jakovetic (University of Novi Sad Faculty of Sciences), Soummya Kar (CMU, Carnegie Mellon University)
FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization
Authors: Shengyu Feng (Carnegie Mellon University), Weiwei Sun (Carnegie Mellon University), Shanda Li (Carnegie Mellon University), Ameet Talwalkar (University of California-Los Angeles), Yiming Yang (Carnegie Mellon University)
Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making
Authors: Prince Wang (Carnegie Mellon University), Shuyi Chen (Carnegie Mellon University), Jinhao Liang (University of Virginia, Charlottesville), Ferdinando Fioretto (University of Virginia, Charlottesville), Shixiang Zhu (Carnegie Mellon University)
Reinforcement Learning
Jackpot: Align Actor-Policy Distribution for scalable and stable RL for LLM
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
Authors: Paresh Chaudhary (Department of Computer Science, University of Washington), Yancheng Liang (Department of Computer Science, University of Washington), Daphne Chen (Carnegie Mellon University), Simon Du (University of Washington), Natasha Jaques (University of Washington, Google DeepMind)
From Curiosity to Caution: Mitigating Reward Hacking for Best-of-$N$ with Pessimism
Authors: Zhuohao Yu (Peking University), Steven Wu (Carnegie Mellon University), Adam Block (Columbia University)
Online Decision Making with Generative Action Sets
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
Authors: Vansh Kapoor (School of Computer Science, Carnegie Mellon University), Aman Gupta (Carnegie Mellon University), Hao Chen (Amazon), Anurag Beniwal (Elevenlabs), Jing Huang (Stanford), Aviral Kumar (University of California Berkeley)
Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion
Authors: Dan Haramati (Brown University), Carl Qi (University of Texas at Austin
正文较长,站内已展示前半部分,完整内容请阅读原文。
Oral Papers
EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
Authors: Wayne Chi (CMU), Valerie Chen (Carnegie Mellon University), Ryan Shar (Apple), Aditya Mittal (CMU, Carnegie Mellon University), Jenny Liang (School of Computer Science, Carnegie Mellon University), Wei-Lin Chiang (UC Berkeley / LMSYS), Anastasios Angelopoulos (University of California Berkeley), Ion Stoica (), Graham Neubig (Carnegie Mellon University), Ameet Talwalkar (University of California-Los Angeles), Chris Donahue (CMU / Google DeepMind)
This work introduces EditBench, a new benchmark for testing how well AI models can edit existing code based on user instructions. Unlike prior benchmarks, it uses real-world coding tasks and contexts, including things like the surrounding code and cursor position. The benchmark includes 545 diverse problems, and results show that most models struggle—only a few achieve strong performance. The study also finds that having more realistic context significantly impacts how well models perform, highlighting the importance of evaluating code-editing in real-world settings.
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Authors: Jinchuan Tian (CMU, Carnegie Mellon University), Sang-gil Lee (NVIDIA), Zhifeng Kong (NVIDIA), Sreyan Ghosh (Nvidia), Arushi Goel (NVIDIA), Chao-Han Huck Yang (NVIDIA Research), Wenliang Dai (NVIDIA), Zihan Liu (Nvidia), Hanrong Ye (NVIDIA), Shinji Watanabe (Carnegie Mellon University), Mohammad Shoeybi (NVIDIA), Bryan Catanzaro (NVIDIA), Rafael Valle (NVIDIA), Wei Ping (Nvidia)
This paper introduces the Unified Audio Language Model (UALM), a single model designed to handle audio understanding, text-to-audio generation, and multimodal reasoning together. Instead of treating these as separate tasks, UALM learns to both interpret and generate audio, achieving performance comparable to specialized state-of-the-art models. The authors also show that combining text and audio during the model’s reasoning process improves its ability to handle complex tasks. Overall, the work demonstrates a step toward more general AI systems that can reason across both language and sound.
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Authors: Yueqi Song (CMU), Ketan Ramaneti (Amazon), Zaid Sheikh (Carnegie Mellon University), Ziru Chen (Ohio State University, Columbus), Boyu Gou (Ohio State University, Columbus), Tianbao Xie (the University of Hong Kong, University of Hong Kong), Yiheng Xu (University of Hong Kong), Danyang Zhang (Shanghai Jiao Tong University), Apurva Gandhi (Carnegie Mellon University), Fan Yang (Fujitsu), Joseph Liu (School of Computer Science, Carnegie Mellon University), Tianyue Ou (Carnegie Mellon University), Zhihao Yuan (Carnegie Mellon University), Frank F Xu (Carnegie Mellon University), Shuyan Zhou (Facebook), Xingyao Wang (All Hands AI), Xiang Yue (Carnegie Mellon University), Tao Yu (University of Hong Kong), Huan Sun (Ohio State University), Yu Su (Ohio State University), Graham Neubig (Carnegie Mellon University)
This work introduces the Agent Data Protocol (ADP), a standardized format for representing training data for AI agents. The authors argue that the main challenge isn’t a lack of data, but that existing datasets are fragmented across different formats and tools. ADP acts as a common “interlingua,” making it easier to combine diverse data sources—like coding, browsing, and tool use—into a single training pipeline. By converting 13 datasets into this unified format, the authors show that models trained on the combined data achieve improved performance.
MotionStream: Real-Time Video Generation with Interactive Motion Controls
Authors: Joonghyuk Shin (Seoul National University), Zhengqi Li (Google), Richard Zhang (Adobe), Jun-Yan Zhu (Carnegie Mellon University), Jaesik Park (Seoul National University), Eli Shechtman (Adobe), Xun Huang (Adobe Research)
This paper introduces MotionStream, a system for generating videos in real time based on motion and text inputs. Unlike prior methods that take minutes to produce a video, MotionStream can stream results at up to 29 frames per second on a single GPU. The key idea is to train a fast, causal model that can generate video continuously, using techniques that prevent quality from degrading over long sequences. As a result, users can interactively control motion—like drawing paths or moving a camera—and see the video update instantly.
OpenThoughts: Data Recipes for Reasoning Models
Authors: Etash Guha (Stanford University, Anthropic), Ryan Marten (Harbor), Sedrick Keh (Toyota Research Institute), Negin Raoof (University of California, Berkeley), Georgios Smyrnis (University of Texas, Austin), Hritik Bansal (University of California, Los Angeles), Marianna Nezhurina (Juelich Supercomputing Center, LAION, Tuebingen University), Jean Mercat (Toyota Research Institute (TRI)), Trung Vu (Google), Zayne Sprague (New York University), Ashima Suvarna (UCLA), Benjamin Feuer (Stanford University), Leon Liangyu Chen (Stanford University), Zaid Khan (University of North Carolina at Chapel Hill), Eric Frankel (Department of Computer Science, University of Washington), Sachin Grover (Arizona State University), Caroline Choi (None), Niklas Muennighoff (Stanford University), Shiye Su (Stanford University), Wanjia Zhao (Stanford University), John Yang (Princeton University), Shreyas Pimpalgaonkar (New York University), Kartik sharma (Georgia Institute of Technology), Charlie Ji (University of California, Berkeley), Yichuan Deng (Department of Computer Science, University of Washington), Sarah Pratt (University of Washington), Vivek Ramanujan (Department of Computer Science, University of Washington), Jon Saad-Falcon (Computer Science Department, Stanford University), Stutee Acharya (University of South Florida), Jeffrey Li (Carnegie Mellon University), Achal Dave (Anthropic), Alon Albalak (SynthLabs), Kushal Arora (McGill University), Blake Wulfe (Toyota Research Institute), Chinmay Hegde (New York University), Greg Durrett (New York University), Sewoong Oh (University of Washington), Mohit Bansal (UNC Chapel Hill), Saadia Gabriel (University of Washington), Aditya Grover (UCLA), Kai-Wei Chang (University of Virginia Main Campus), Vaishaal Shankar (Apple), Aaron Gokaslan (Cornell University), Mike Merrill (None), Tatsunori Hashimoto (Stanford University), Yejin Choi (Stanford University / NVIDIA), Jenia Jitsev (LAION; Juelich Supercomputing Center, Research Center Juelich), Reinhard Heckel (Technical University Munich), Maheswaran Sathiamoorthy (University of Southern California), Alex Dimakis (Electrical Engineering & Computer Science Department, University of California, Berkeley), Ludwig Schmidt (University of Washington / Stanford / Anthropic)
This work introduces the OpenThoughts project, which aims to create high-quality, open-source datasets for training reasoning-focused AI models. The authors show that models trained on their public data can match or exceed the performance of strong existing systems that rely on private datasets. By carefully studying and improving their data generation process, they build larger and better datasets that significantly boost performance across math, coding, and science benchmarks. Overall, the project demonstrates that open data alone can be enough to train highly capable reasoning models.
Mamba-3: Improved Sequence Modeling using State Space Principles
Authors: Aakash Sunil Lahoti (CMU, Carnegie Mellon University), Kevin Li (Carnegie Mellon University), Berlin Chen (Princeton University), Caitlin Wang (Princeton University), Aviv Bick (Carnegie Mellon University), Zico Kolter (Carnegie Mellon University), Tri Dao (Princeton University), Albert Gu (Cartesia AI CMU)
This paper introduces Mamba-3, a new model designed to make AI inference faster and more efficient without sacrificing performance. While many efficient alternatives to Transformers reduce computation, they often struggle with tasks like tracking long-term information; Mamba-3 addresses this with improved state modeling and a more expressive update mechanism. The model also uses a multi-input, multi-output design to boost accuracy without slowing down generation. Overall, Mamba-3 shows that it’s possible to improve both efficiency and capability at the same time, pushing forward the tradeoff between speed and performance.
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
Authors: Yuxuan Zhou (Independent Researcher), Fei Huang (Alibaba Group), Heng Li (Carnegie Mellon University), Fengyi Wu (University of Washington), Tianyu Wang (University of Washington), Jianwei Zhang (Alibaba Group), Junyang Lin (Alibaba Group), Zhi-Qi Cheng (University of Washington)
This paper introduces Hierarchical Speculative Decoding (HSD), a new method to speed up large language model inference by improving the verification step in speculative decoding while preserving exact output distributions. It addresses the challenge of “joint intractability” in sequence-level verification by organizing resampling into a hierarchy that redistributes probability mass across branches, enabling more tokens to be accepted at once. The approach is theoretically proven to be lossless and empirically shows consistent speed improvements across models and benchmarks, outperforming prior tokenwise and blockwise verification methods. Overall, HSD offers a practical and general way to accelerate decoding without sacrificing fidelity, achieving state-of-the-art efficiency when integrated into existing frameworks.
Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
Authors: Haoyue Dai (Carnegie Mellon University), Immanuel Albrecht (FernUniversität in Hagen), Peter Spirtes (Carnegie Mellon University), Kun Zhang (Carnegie Mellon University & MBZUAI)
This paper studies causal discovery in linear non-Gaussian models with latent variables and cycles, focusing on when different causal graphs are observationally indistinguishable. It provides the first general characterization of distributional equivalence in this setting, introducing new tools—especially edge rank constraints—to describe when two models generate the same observed data. Building on this theory, the authors derive practical graphical criteria and transformations to enumerate all equivalent models and propose an algorithm to recover the entire equivalence class from data. Overall, the work removes the need for strong structural assumptions and offers a general, principled framework for latent-variable causal discovery.
Revela: Dense Retriever Learning via Language Modeling
Authors: Fengyu Cai (Technische Universität Darmstadt), Tong Chen (University of Washington), Xinran Zhao (Carnegie Mellon University), Sihao Chen (Microsoft), Hongming Zhang (Tencent AI Lab Seattle), Sherry Wu (Carnegie Mellon University), Iryna Gurevych (Technical University of Darmstadt / Mohamed bin Zayed University of Artificial Intelligence), Heinz Koeppl (TU Darmstadt)
This paper introduces Revela, a self-supervised framework for training dense retrievers by leveraging language modeling objectives instead of relying on annotated query-document pairs. It augments next-token prediction with an in-batch attention mechanism that allows documents to attend to each other, enabling the retriever to learn cross-document relationships jointly with a language model. Experiments across domain-specific, reasoning-intensive, and general benchmarks show that Revela matches or surpasses supervised and API-based retrievers while using significantly less data and compute. Overall, the work demonstrates a scalable and efficient alternative for retriever learning directly from raw text with strong generalization across domains.
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
Authors: Tal Daniel (Carnegie Mellon University), Carl Qi (University of Texas at Austin), Dan Haramati (Brown University), Amir Zadeh (Lambda), Chuan Li (Lambda Labs), Aviv Tamar (Technion), Deepak Pathak (Carnegie Mellon University), David Held (Carnegie Mellon University)
This paper introduces the Latent Particle World Model (LPWM), a self-supervised, object-centric world model that learns to decompose scenes into latent particles (e.g., keypoints, masks, and object attributes) directly from raw video without supervision. It proposes a novel per-particle latent action mechanism that models stochastic dynamics, enabling the system to capture complex multi-object interactions and generate diverse future predictions. The model is trained end-to-end and supports flexible conditioning on actions, language, and goal images, achieving state-of-the-art performance on both real-world and synthetic video prediction tasks. Beyond video modeling, LPWM also demonstrates strong potential for decision-making applications such as imitation learning by leveraging its learned latent dynamics.
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
Authors: Siyuan Wang (Shanghai Jiao Tong University), Gaokai Zhang (Carnegie Mellon University), Li Lyna Zhang (Microsoft Research Asia), Ning Shang (Microsoft), Fan Yang (Microsoft Research), Dongyao Chen (Shanghai Jiaotong University), Mao Yang (Peking University)
The authors introduce LoongRL, a reinforcement learning framework designed to improve long-context reasoning in large language models by training them on challenging, synthesized tasks. They propose KeyChain, a data construction method that embeds hidden question chains within long documents, forcing models to perform multi-step planning, retrieval, and reasoning rather than relying on shortcuts. Through RL training, models develop an emergent “plan–retrieve–reason–recheck” reasoning pattern that generalizes from shorter (16K) to much longer (128K) contexts. Experiments show that LoongRL significantly boosts long-context reasoning performance while maintaining strong short-context abilities, achieving results comparable to much larger models.
Exchangeability of GNN Representations with Applications to Graph Retrieval
Authors: Kartik Nair (Carnegie Mellon University), Indradyumna Roy (IIT Bombay, Aalto University), Soumen Chakrabarti (IIT Bombay), Anirban Dasgupta (IIT Gandhinagar), Abir De (Indian Institute of Technology Bombay)
This paper introduces the concept of exchangeability in graph neural networks (GNNs), showing that the dimensions of learned node embeddings are statistically interchangeable due to random initialization and permutation-invariant training. This property implies that embedding components share identical distributions, enabling simplifications in how graph similarities are computed. Leveraging this insight, the authors approximate complex transportation-based graph distances using simpler Euclidean operations on sorted embedding values. They further propose GRAPHHASH, a locality-sensitive hashing framework that enables efficient and scalable graph retrieval, achieving strong performance compared to existing methods.
Poster Papers
Applications
TusoAI: Agentic Optimization for Scientific Methods
Authors: Alistair Turcan (School of Computer Science, Carnegie Mellon University), Kexin Huang (Stanford University), Lei Li (School of Computer Science, Carnegie Mellon University), Martin J. Zhang (Carnegie Mellon University)
AutoLibra: Agent Metric Induction from Open-Ended Human Feedback
Authors: Hao Zhu (Carnegie Mellon University), Phil Cuvin (Stanford University), Xinkai Yu (University of Pennsylvania, University of Pennsylvania), Charlotte Yan (Stanford University), Jason Zhang (Stanford University), Diyi Yang (Stanford University)
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
Authors: Kevin Han (Carnegie Mellon University), Bowen Deng (UC Berkeley), Amir Barati Farimani (CMU, Carnegie Mellon University), Gerbrand Ceder (University of California, Berkeley)
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Authors: Ganlin Yang (University of Science and Technology of China), Tianyi Zhang (Zhejiang University; Shanghai Artificial Intelligence Laboratory), Haoran Hao (Carnegie Mellon University), Weiyun Wang (Fudan University), Yibin Liu (Northeastern University), Dehui Wang (Shanghai Jiaotong University), Guanzhou Chen (Shanghai AI Laboratory, Shanghai Jiaotong University), Zijian Cai (Shenzhen University), Junting Chen (national university of singaore, National University of Singapore), Weijie Su (University of Science and Technology of China), Wengang Zhou (University of Science and Technology of China), Yu Qiao (Shanghai Aritifcal Intelligence Laboratory), Jifeng Dai (Tsinghua University, Tsinghua University), Jiangmiao Pang (Shanghai AI Laboratory), Gen Luo (Shanghai AI Laboratory), Wenhai Wang (Shanghai AI Laboratory), Yao Mu (Shanghai Jiao Tong University), Zhi Hou (Shanghai Artificial Intelligence Laboratory)
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Authors: Shivin Dass (University of Texas at Austin), Alaa Khaddaj (OpenAI), Logan Engstrom (Massachusetts Institute of Technology), Aleksander Madry (OpenAI), Andrew Ilyas (Carnegie Mellon University), Roberto Martín-Martín (University of Texas at Austin)
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
Authors: Malgorzata Gwiazda (Technical University of Munich), Yifu Cai (Millennium Management LLC), Mononito Goswami (Carnegie Mellon University), Arjun Choudhry (Georgia Institute of Technology), Artur Dubrawski (Carnegie-Mellon University)
MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation
TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis
Authors: Vijay Ekambaram (IBM), Subodh Kumar (International Business Machines), Arindam Jati (International Business Machines (IBM)), Sumanta Mukherjee (International Business Machines), Tomoya Sakai (International Business Machines), Pankaj Dayama (International Business Machines), Wesley Gifford (IBM Research), Jayant Kalagnanam (Carnegie Mellon University)
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation
Authors: Yash Jangir (Carnegie Mellon University), Yidi Zhang (), Kashu Yamazaki (CMU, Carnegie Mellon University), Chenyu Zhang (Peking University), Kuan-Hsun Tu (National Taiwan University), Tsung-Wei Ke (Department of computer science and informational engineering, National Taiwan University), Lei Ke (Carnegie Mellon University), Yonatan Bisk (Carnegie Mellon University), Katerina Fragkiadaki (CMU)
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
Authors: Weihua Du (Tsinghua University), HaileiGong (Huawei Technologies Ltd.), Zhan Ling (UC San Diego), Kang Liu (ByteDance Inc.), Lingfeng Shen (Johns Hopkins University), Xuesong Yao (ByteDance Inc.), Yufei Xu (ByteDance Inc.), Dingyuan Shi (ByteDance Inc.), Yiming Yang (Carnegie Mellon University), Jiecao Chen (ByteDance Inc.)
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
Authors: Zhaojiang Lin (Meta), YONG XU (Meta), Kai Sun (Meta), Jing Zheng (Ant Group), Yin Huang (Facebook), Surya Appini (Meta), Krish Narang (Facebook), Renjie Tao (Facebook), Ishan Jain (Facebook), Siddhant Arora (Carnegie Mellon University), Ruizhi Li (Facebook), Yiteng Huang (Facebook), Kaushik Patnaik (Apple), Wenfang Xu (Meta Platforms, Inc.), Suwon Shon (ASAPP), Yue Liu (Meta), Ahmed Aly (Facebook), Anuj Kumar (Meta), Florian Metze (Carnegie Mellon University), Xin Dong (Facebook)
A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex
Authors: Rabia Gondur (Cold Spring Harbor Laboratory), Patricia Stan (CMU, Carnegie Mellon University), Matthew A Smith (Carnegie Mellon University), Benjamin Cowley (Cold Spring Harbor Laboratory)
FictionalQA: A Dataset for Studying Memorization and Knowledge Acquisition
Authors: John Kirchenbauer (University of Maryland, College Park), Natjanan Mongkolsupawan (Carnegie Mellon University), Yuxin Wen (University of Maryland), Tom Goldstein (University of Maryland), Daphne Ippolito (School of Engineering and Applied Science, University of Pennsylvania)
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
Authors: Justin Lin (Computer Science Department, Stanford University), Eliot Jones (Gray Swan), Donovan Jasper (Stanford University), Ethan Ho (Stanford University), Anna Wu (Computer Science Department, Stanford University), Arnold Yang (Stanford University), Neil Perry (Princeton University), Andy Zou (CMU, Carnegie Mellon University), Matt Fredrikson (University of Wisconsin, Madison), Zico Kolter (Carnegie Mellon University), Percy Liang (Stanford University), Dan Boneh (Stanford University), Daniel Ho (Stanford University)
Bound by semanticity: universal laws governing the generalization-identification tradeoff
Authors: Marco Nurisso (Polytechnic University of Turin), Jesseba Fernando (Northeastern University), Raj Deshpande (Northeastern University London), Alan Perotti (Intesa Sanpaolo AI Research), Raja Marjieh (Princeton University), Steven Frankland (Dartmouth College), Richard Lewis (Carnegie Mellon University), Taylor Webb (University of California, Los Angeles), Declan Campbell (Princeton University), Francesco Vaccarino (Politecnico di Torino), Jonathan Cohen (Princeton University), Giovanni Petri (Network Science Institute, Northeastern University London)
Zero-shot Forecasting by Simulation Alone
Authors: Boris Oreshkin (Amazon), Mayank Jauhari (Amazon), Ravi Kiran Selvam (Amazon), Malcolm Wolff (Amazon), Wenhao Pan (University of Washington), Shankar Ramasubramanian (Amazon), KIN GUTIERREZ (Carnegie Mellon University), Tatiana Konstantinova (Amazon), Andres Potapczynski (New York University), Mengfei Cao (Amazon.com), Dmitry Efimov (Amazon), Michael W Mahoney (University of California Berkeley), Andrew Gordon Wilson (New York University)
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
Authors: Wenli Xiao (Carnegie Mellon University), Haotian Lin (CMU, Carnegie Mellon University), Andy Peng (University of California, Berkeley), Haoru Xue (University of California, Berkeley), Tairan He (NVIDIA), Zhengyi Luo (Carnegie Mellon University), Yuqi Xie (NVIDIA), Fengyuan Hu (NVIDIA), Jim Fan (NVIDIA), Guanya Shi (CMU, Carnegie Mellon University), Yuke Zhu (NVIDIA / UT-Austin)
Improving Attributed Long-form Question Answering with Intent Awareness
Authors: Xinran Zhao (CMU, Carnegie Mellon University), Aakanksha Naik (Allen Institute for Artificial Intelligence), Jay DeYoung (Allen Institute for Artificial Intelligence), Joseph Chee Chang (Allen Institute for Artificial Intelligence), Jena Hwang (Allen Institute for Artificial Intelligence), Sherry Wu (Carnegie Mellon University), Varsha Kishore (Cornell University)
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Authors: Yuxiao Qu (Carnegie Mellon University), Anikait Singh (Stanford University), Yoonho Lee (Stanford University), Amrith Setlur (Carnegie Mellon University), Russ Salakhutdinov (CMU), Chelsea Finn (Stanford University, Physical Intelligence), Aviral Kumar (University of California Berkeley)
Measuring LLM Novelty As The Frontier Of Original And High-Quality Output
Authors: Vishakh Padmakumar (Stanford University), Chen Yueh-Han (New York University), Jane Pan (New York University), Valerie Chen (Carnegie Mellon University), He He (New York University)
BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
Authors: Yitang Li (), Zhengyi Luo (Carnegie Mellon University), Tonghe Zhang (Carnegie Mellon University), Cunxi Dai (Carnegie Mellon University), Anssi Kanervisto (Microsoft Research), Andrea Tirinzoni (Meta, FAIR), Haoyang Weng (Tsinghua University, Tsinghua University), Kris Kitani (Carnegie Mellon University), Mateusz Guzek (Meta AI), Ahmed Touati (Meta AI Research), Alessandro Lazaric (Facebook), Matteo Pirotta (Meta), Guanya Shi (CMU, Carnegie Mellon University)
CaTS: Calibrated Test-Time Scaling for Efficient LLM Reasoning
Authors: Chengsong Huang (Washington University, Saint Louis), Langlin Huang (Washington University, Saint Louis), Jixuan Leng (Carnegie Mellon University), Jiacheng Liu (NVIDIA), Jiaxin Huang (Washington University in St. Louis)
Real-Time Reasoning Agents in Evolving Environments
Authors: Yule Wen (Tsinghua University, Tsinghua University), Yixin Ye (Shanghai Jiaotong University), Yanzhe Zhang (Georgia Institute of Technology), Diyi Yang (Stanford University), Hao Zhu (Carnegie Mellon University)
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists
Authors: Jie Ruan (University of Michigan – Ann Arbor), Inderjeet Nair (University of Michigan – Ann Arbor), Shuyang Cao (Bloomberg), Amy Liu (University of Michigan), Sheza Munir (University of Toronto), Micah Pollens-Dempsey (University of Michigan – Ann Arbor), Yune-Ting Chiang (University of Michigan – Ann Arbor), Lucy Kates (University of Michigan – Ann Arbor), Nicholas David (University of Michigan – Ann Arbor), Sihan Chen (Carnegie Mellon University), Ruxin Yang (University of Michigan – Ann Arbor), Yuqian Yang (University of Michigan – Ann Arbor), Jihyun Gump (University of Michigan – Ann Arbor), Tessa Bialek (University of Michigan Law School), Vivek Sankaran (University of Michigan – Ann Arbor), Margo Schlanger (University of Michigan – Ann Arbor), Lu Wang (University of Michigan)
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking
Authors: Gyeongwon J Kim (Carnegie Mellon University), Alex Wilf (Carnegie Mellon University), Louis-Philippe Morency (Carnegie Mellon University), Daniel Fried (Carnegie Mellon University)
PRISM: Enhancing PRotein Inverse Folding through Fine- Grained Retrieval on Structure-Sequence Multimodal Representations
Authors: Sazan Mahbub (Carnegie Mellon University School of Computer Science), Souvik Kundu (Intel), Eric P Xing (CMU)
MAPSS: Manifold-based Assessment of Perceptual Source Separation
Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives
Authors: Zihan Wang (Amazon), Jiashun Wang (School of Computer Science, Carnegie Mellon University), Jeff Tan (Carnegie Mellon University), Yiwen Zhao (School of Computer Science, Carnegie Mellon University), Jessica Hodgins (RAI Institute), Shubham Tulsiani (Carnegie Mellon University), Deva Ramanan (School of Computer Science, Carnegie Mellon University)
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Authors: Junlong Li (The Hong Kong University of Science and Technology), Wenshuo Zhao (Zhejiang University), Jian Zhao (Beijing University of Posts and Telecommunications), Weihao Zeng (Hong Kong University of Science and Technology), Haoze Wu (Zhejiang University), Xiaochen Wang (None), Rui Ge (Shanghai Jiaotong University), Yuxuan Cao (HKUST), Yuzhen Huang (HKUST), Wei Liu (HKUST), Junteng LIU (HKUST), Zhaochen Su (The Hong Kong University of Science and Technology), Yiyang Guo (Fudan University), FAN ZHOU (Shanghai Jiao Tong University), Lueyang Zhang (The Hong Kong University of Science and Technology), Juan Michelini (Universidad de la República), Xingyao Wang (All Hands AI), Xiang Yue (Carnegie Mellon University), Shuyan Zhou (Facebook), Graham Neubig (Carnegie Mellon University), Junxian He (HKUST)
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
Authors: Yixian Zhang (Tsinghua University, Tsinghua University), Shu-ang Yu (Tsinghua University), Tonghe Zhang (Carnegie Mellon University), Mo Guang (Li Auto Inc.), Haojia Hui (Li Auto Inc.), Kaiwen Long (Li Auto Inc.), Yu Wang (Tsinghua Univ.), Chao Yu (Tsinghua University), Wenbo Ding (Tsinghua University, Tsinghua University)
Computer Vision
Multi-Object System Identification from Videos
Authors: Chunjiang Liu (Carnegie Mellon University), Xiaoyuan Wang (Carnegie Mellon University), Qingran Lin (Georgia Institute of Technology), Albert Xiao (Carnegie Mellon University), Haoyu Chen (Harvard University, Harvard University), Shizheng Wen (ETHZ – ETH Zurich), Hao Zhang (UIUC), Lu Qi (Insta360), Ming-Hsuan Yang (Google DeepMind), Laszlo A. Jeni (Carnegie Mellon University), Min Xu (Carnegie Mellon University), Yizhou Zhao (Snap Inc.)
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
Authors: Yixiang Dai (Carnegie Mellon University), Fan Jiang (AMAP, Alibaba), Chiyu Wang (Alibaba Group), Mu Xu (Alibaba Group), Yonggang Qi (Beijing University of Posts and Telecommunications)
Learning an Image Editing Model without Image Editing Pairs
Controllable Video Generation with Provable Disentanglement
Authors: Yifan Shen (Mohamed bin Zayed University of Artificial Intelligence), Peiyuan Zhu (Mohamed bin Zayed University of Artificial Intelligence), Zijian Li (Mohamed bin Zayed University of Artificial Intelligence), Shaoan Xie (Carnegie Mellon University), Namrata Deka (Carnegie Mellon University), Zongfang Liu (Zhejiang University), Zeyu Tang (Stanford University), Guangyi Chen (MBZUAI&CMU), Kun Zhang (Carnegie Mellon University & MBZUAI)
Virtual Community: An Open World for Humans, Robots, and Society
Authors: Qinhong Zhou (University of Massachusetts at Amherst), Hongxin Zhang (UMass Amherst), Xiangye Lin (University of Massachusetts at Amherst), Zheyuan Zhang (Johns Hopkins University), Yutian Chen (Carnegie Mellon University), Wenjun Liu (University of Massachusetts at Amherst), Zunzhe Zhang (Tsinghua University), Sunli Chen (University of Massachusetts at Amherst), Lixing Fang (University of Massachusetts at Amherst), Qiushi Lyu (University of Illinois, Urbana-Champaign), Xinyu Sun (South China University of Technology), Jincheng Yang (University of Maryland, College Park), Zeyuan Wang (Tsinghua University, Tsinghua University), Bao Dang (University of Massachusetts at Amherst), Zhehuan Chen (Peking University), Daksha Ladia (University of Massachusetts Amherst), Quang Dang (University of Massachusetts at Amherst), Jiageng Liu (University of Massachusetts at Amherst), Chuang Gan (MIT-IBM Watson AI Lab)
Faster Vision Transformers with Adaptive Patches
Authors: Rohan Choudhury (None), JungEun Kim (General Robotics), Jinhyung Park (Carnegie Mellon University), Eunho Yang (Korea Advanced Institute of Science & Technology), Laszlo A. Jeni (Carnegie Mellon University), Kris Kitani (Carnegie Mellon University)
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
VINCIE: Unlocking In-context Image Editing from Video
Authors: Leigang Qu (National University of Singapore), Feng Cheng (ByteDance Seed), Ziyan Yang (ByteDance Inc.), Qi Zhao (ByteDance Inc.), Shanchuan Lin (ByteDance), Yichun Shi (None), Yicong Li (National University of Singapore), Wenjie Wang (University of Science and Technology of China), Tat-Seng Chua (National University of Singapore), Lu Jiang (Carnegie Mellon University)
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
Authors: Isaac Robinson (Roboflow), Peter Robicheaux (Roboflow), Matvei Popov (Roboflow, Inc), Deva Ramanan (School of Computer Science, Carnegie Mellon University), Neehar Peri (Carnegie Mellon University)
lmgame-Bench: How Good are LLMs at Playing Games?
Authors: Lanxiang Hu (University of California, San Diego), Mingjia Huo (University of California, San Diego), Yuxuan Zhang (University of California, San Diego), Haoyang Yu (University of California San Diego), Eric P Xing (CMU), Ion Stoica (), Tajana Rosing (University of California, San Diego), Haojian Jin (None), Hao Zhang (University of California, San Diego)
ASCIIEval: Benchmarking Models’ Visual Perception in Text Strings via ASCII Art
Authors: Qi Jia (Shanghai Artificial Intelligence Laboratory), Xiang Yue (Carnegie Mellon University), Shanshan Huang (Guangzhou University), Ziheng Qin (Facebook), Yizhu Liu (Meituan), Bill Yuchen Lin (xAI), Yang You (National University of Singapore), Guangtao Zhai (Shanghai Jiao Tong University)
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
Authors: Yuyou Zhang (CMU, Carnegie Mellon University), Radu Corcodel (Mitsubishi Electric Research Labs), Chiori Hori (Mitsubishi Electric Research Labs), Anoop Cherian (Australian National University), DING ZHAO (Carnegie Mellon University)
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
Authors: Ming Zhao (Jilin University), Wenhui Dong (NanJing University), Yang Zhang (Chinese People’s Liberation Army General Hospital), wangyou (University of the Chinese Academy of Sciences), Zhonghao Zhang (Ningxia University), Zian Zhou (Zhejiang University), YUNZHI GUAN (Fudan University), Liukun Xu (Nanjing Medical University), Wei Peng (Stanford University), Zhaoyang Gong (Fudan University), Zhicheng Zhang (Chinese People’s Liberation Army General Hospital), Dachuan li (Fudan University), Xiaosheng Ma (Fudan University), Yuli Ma (Peking University), Jianing Ni (Carnegie Mellon University), Changjiang Jiang (Ant Group), Lixia Tian (Beijing Jiaotong University), Chen Qixin (Zhejiang University), Xia Kaishun (Zhejiang University of Technology), Pingping Liu (Jilin University), Tongshun Zhang (Jilin University), ZhiqiangLiu (Huazhong University of Science and Technology), Zhongan Bi (Zhejiang Lab), Chenyang Si (Nanyang Technological University), Tiansheng Sun (Chinese People’s Liberation Army General Hospital), Caifeng Shan (Nanjing University)
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Authors: Jianyi Wang (Nanyang Technological University), Shanchuan Lin (ByteDance), Zhijie Lin (Zhejiang University), Yuxi Ren (ByteDance Inc.), Meng Wei (ByteDance Inc.), Zongsheng Yue (Xi’an Jiaotong University), Shangchen Zhou (Nanyang Technological University), Hao Chen (ByteDance Inc.), Yang Zhao (Bytedance Inc.), Ceyuan Yang (ByteDance), Xuefeng Xiao (ByteDance), Chen Change Loy (Nanyang Technological University), Lu Jiang (Carnegie Mellon University)
Mixture of Contexts for Long Video Generation
Authors: Shengqu Cai (Stanford University), Ceyuan Yang (ByteDance), Lvmin Zhang (Stanford University), Yuwei Guo (The Chinese University of Hong Kong), Junfei Xiao (Johns Hopkins University), Ziyan Yang (ByteDance Inc.), Yinghao Xu (Stanford University), Zhenheng Yang (Tiktok), Alan Yuille (Johns Hopkins University), Leonidas Guibas (Stanford University), Maneesh Agrawala (Stanford University), Lu Jiang (Carnegie Mellon University), Gordon Wetzstein (Stanford University)
pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
Authors: Zhanpeng Luo (University of Pittsburgh), Ce Zhang (Carnegie Mellon University), Silong Yong (Department of Automation, Tsinghua University, Tsinghua University), Cunxi Dai (Carnegie Mellon University), Qianwei Wang (University of Michigan – Ann Arbor), Haoxi Ran (Carnegie Mellon University), Guanya Shi (CMU, Carnegie Mellon University), Katia Sycara (Carnegie Mellon University), Yaqi Xie (Carnegie Mellon University)
Sharp Monocular View Synthesis in Less Than a Second
Authors: Lars Mescheder (Apple), Wei Dong (Apple), Shiwei Li (Apple), Xuyang BAI (Apple), Marcel Santos (Apple), Peiyun Hu (Carnegie Mellon University), Bruno Lecouat (Telecom ParisTech), Mingmin Zhen (Apple), Amaël Delaunoy (Apple), Tian Fang (Hong Kong University of Science and Technology), Yanghai Tsin (Apple), Stephan Richter (Apple), Vladlen Koltun (Apple)
S2GO: Streaming Sparse Gaussian Occupancy
Authors: Jinhyung Park (Carnegie Mellon University), Chensheng Peng (University of California, Berkeley), yihan hu (Applied Intuition), Wenzhao Zheng (UC Berkeley), Kris Kitani (Carnegie Mellon University), Wei Zhan (University of California Berkeley)
Captain Cinema: Towards Short Movie Generation
Authors: Junfei Xiao (Johns Hopkins University), Ceyuan Yang (ByteDance), Lvmin Zhang (Stanford University), Shengqu Cai (Stanford University), Yang Zhao (Bytedance Inc.), Yuwei Guo (The Chinese University of Hong Kong), Gordon Wetzstein (Stanford University), Maneesh Agrawala (Stanford University), Alan Yuille (Johns Hopkins University), Lu Jiang (Carnegie Mellon University)
Deep Learning
Chimera: State Space Models Beyond Sequences
Authors: Aakash Sunil Lahoti (CMU, Carnegie Mellon University), Tanya Marwah (CMU), (None), Albert Gu (Cartesia AI CMU)
VisCoder2: Building Multi-Language Visualization Coding Agents
Authors: Yuansheng Ni (University of Waterloo), Songcheng Cai (University of Waterloo), Xiangchao Chen (University of Waterloo), Jiarong Liang (University of Waterloo), Zhiheng LYU (University of Hong Kong), Jiaqi Deng (Korea Advanced Institute of Science & Technology), Kai Zou (NetMind.AI), PING NIE (Peking University), Fei Yuan (Shanghai Artificial Intelligent Laboratory), Xiang Yue (Carnegie Mellon University), Wenhu Chen (University of Waterloo)
Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Authors: Syeda Nahida Akter (Carnegie Mellon University), Shrimai Prabhumoye (NVIDIA), Eric Nyberg (Carnegie Mellon University), Mostofa Patwary (NVIDIA), Mohammad Shoeybi (NVIDIA), Yejin Choi (Stanford University / NVIDIA), Bryan Catanzaro (NVIDIA)
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
Authors: Matthew Yang (Carnegie Mellon University), Hao Bai (University of Illinois at Urbana-Champaign), Ian Wu (Carnegie Mellon University), Gene Yang (Carnegie Mellon University), Amrith Setlur (Carnegie Mellon University), Aviral Kumar (University of California Berkeley)
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Authors: Sukjun Hwang (Carnegie Mellon University), Brandon Wang (onepot), Albert Gu (Cartesia AI CMU)
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Authors: Amrith Setlur (Carnegie Mellon University), Matthew Yang (Carnegie Mellon University), Charlie Snell (University of California, Berkeley), Jeremiah Greer (Oumi AI PBC), Ian Wu (Carnegie Mellon University), Virginia Smith (Carnegie Mellon University), Max Simchowitz (Massachusetts Institute of Technology), Aviral Kumar (University of California Berkeley)
FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel
Authors: Ran Yan (The Hong Kong University of Science and Technology), YOUHE JIANG (University of Cambridge), Zhuoming Chen (Carnegie Mellon University), Haohui Mai (Hong Kong University of Science and Technology), Beidi Chen (CMU, Carnegie Mellon University), Binhang Yuan (HKUST)
In Good GRACES: Principled Teacher Selection for Knowledge Distillation
Authors: Guo (), Songlin Yang (ShanghaiTech University), Tarushii Goel (Massachusetts Institute of Technology), Eric P Xing (CMU), Tri Dao (Princeton University), Yoon Kim (MIT)
Generalized Parallel Scaling with Interdependent Generations
Authors: Harry Dong (Carnegie Mellon University), David Brandfonbrener (NYU), Eryk Helenowski (Facebook), Yun He (Meta), Mrinal Kumar (Facebook), Han Fang (Meta GenAI), Yuejie Chi (Carnegie Mellon University), Karthik Abinav Sankararaman (Facebook)
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
Authors: Yinyi Luo (Carnegie Mellon University), Zhexian Zhou (CMU, Carnegie Mellon University), Hao Chen (Google DeepMind), Kai Qiu (CMU, Carnegie Mellon University), Marios Savvides (Carnegie Mellon University), Yixuan Li (University of Wisconsin, Madison), Jindong Wang (William & Mary)
General Machine Learning
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
Authors: Abdul Waheed (Maharaja Agrasen Institute of Technology, New Delhi), Zhen Wu (Carnegie Mellon University), Dareen Alharthi (LTI CMU), Seungone Kim (Carnegie Mellon University), Bhiksha Raj (Carnegie Mellon University)
Score-based Greedy Search for Structure Identification of Partially Observed Causal Models
Authors: Xinshuai Dong (CMU), Ignavier Ng (Carnegie Mellon University), Haoyue Dai (Carnegie Mellon University), Jiaqi Sun (CMU, Carnegie Mellon University), Xiangchen Song (Carnegie Mellon University), Peter Spirtes (Carnegie Mellon University), Kun Zhang (Carnegie Mellon University & MBZUAI)
On Code-Induced Reasoning in LLMs
Authors: Abdul Waheed (Maharaja Agrasen Institute of Technology, New Delhi), Zhen Wu (Carnegie Mellon University), Carolyn Rose (School of Computer Science, Carnegie Mellon University), Daphne Ippolito (School of Engineering and Applied Science, University of Pennsylvania)
Conditional Independent Component Analysis for Estimating Causal Structure with Latent Variables
Authors: Yewei Xia (Fudan University), Zhengming Chen (Guangdong University of Technology), Haoyue Dai (Carnegie Mellon University), Fuhong Wang (Guangdong University of Technology), Yixin Ren (Fudan University), Yiqing Li (MBZUAI), Kun Zhang (Carnegie Mellon University & MBZUAI), Shuigeng Zhou (Fudan University)
Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Authors: Akash Dhasade (EPFL), Divyansh Jhunjhunwala (Amazon), Milos Vujasinovic (EPFL), Gauri Joshi (Carnegie Mellon University), Anne-Marie Kermarrec (School of Computer and Communication Sciences, EPFL – EPF Lausanne)
Dual Perspectives on Non-Contrastive Self-Supervised Learning
Authors: Jean Ponce (NYU/ENS-PSL), Basile Terver (AMI Labs), Martial Hebert (Carnegie Mellon University), Michael Arbel (INRIA)
Multiple-Prediction-Powered Inference
Authors: Charlie Cowen-Breen (Massachusetts Institute of Technology), Alekh Agarwal (Google), Stephen Bates (Massachusetts Institute of Technology), William W. Cohen (Carnegie Mellon University), Jacob Eisenstein (Google), Amir Globerson (Google), Adam Fisch (Google DeepMind)
Command-V: Training-Free Representation Finetuning Transfer
Authors: Barry Wang (Carnegie Mellon University), Avi Schwarzschild (Carnegie Mellon University), Alexander Robey (CMU, Carnegie Mellon University), Ali Payani (Cisco Systems), Charles Fleming (Cisco), Mingjie Sun (School of Computer Science, Carnegie Mellon University), Daphne Ippolito (School of Engineering and Applied Science, University of Pennsylvania)
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Authors: Shahriar Noroozizadeh (Carnegie Mellon University), Xiaobin Shen (Carnegie Mellon University), Jeremy Weiss (National Library of Medicine), George H. Chen (Carnegie Mellon University)
Prompt-MII: Meta-Learning Instruction Induction for LLMs
Authors: Emily Xiao (Carnegie Mellon University), Yixiao Zeng (XPeng Motors / Carnegie Mellon University), Ada Chen (CMU, Carnegie Mellon University), Chin-Jou Li (Language Technologies Institute, Carnegie Mellon University), Amanda Bertsch (Carnegie Mellon University), Graham Neubig (Carnegie Mellon University)
Optimization
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
Authors: Shuhua Yu (Carnegie Mellon University), Dusan Jakovetic (University of Novi Sad Faculty of Sciences), Soummya Kar (CMU, Carnegie Mellon University)
FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization
Authors: Shengyu Feng (Carnegie Mellon University), Weiwei Sun (Carnegie Mellon University), Shanda Li (Carnegie Mellon University), Ameet Talwalkar (University of California-Los Angeles), Yiming Yang (Carnegie Mellon University)
Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making
Authors: Prince Wang (Carnegie Mellon University), Shuyi Chen (Carnegie Mellon University), Jinhao Liang (University of Virginia, Charlottesville), Ferdinando Fioretto (University of Virginia, Charlottesville), Shixiang Zhu (Carnegie Mellon University)
Reinforcement Learning
Jackpot: Align Actor-Policy Distribution for scalable and stable RL for LLM
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
Authors: Paresh Chaudhary (Department of Computer Science, University of Washington), Yancheng Liang (Department of Computer Science, University of Washington), Daphne Chen (Carnegie Mellon University), Simon Du (University of Washington), Natasha Jaques (University of Washington, Google DeepMind)
From Curiosity to Caution: Mitigating Reward Hacking for Best-of-$N$ with Pessimism
Authors: Zhuohao Yu (Peking University), Steven Wu (Carnegie Mellon University), Adam Block (Columbia University)
Online Decision Making with Generative Action Sets