Brandon B. May

Staff Applied Scientist — AI Robotics

Embodied AI · World Foundation Models · VLAs

About

Most robotic systems are brittle, failing the moment reality deviates from their training data. General robotics demands models that can first anticipate the physical world, then act on it. As a Staff Applied Scientist at General Motors, I build foundation models that teach robots to perceive, reason about, and manipulate objects well enough to do real work.

My career has traced an arc from sensing to understanding to acting: math and physics at Skidmore, imaging science at RIT, computational imaging and 3D vision at MITRE, perception and generative AI at STR, and foundation models for robotic manipulation at the Robotics & AI Institute (RAI). Want to talk shop? Reach out.

Publications

SIMIFY: Generative Real-to-Sim Enables Multi-Object Spatial and Physical Reasoning

Brandon B. May et al.

arXiv, March 2026

3D Reconstruction Manipulation Robot Learning Simulation

TL;DR: A training-free, test-time framework that reconstructs simulation-ready assets from a single RGB-D image using 3D generative and vision-language models, then launches thousands of parallel physics rollouts with evolutionary search to optimize object arrangements for language-specified tasks. Achieves 67% success on real-robot hardware, surpassing baselines using off-the-shelf foundation models.

Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin for Real-World Robot Policy Evaluation

Jad Abou-Chakra, Lingfeng Sun, Krishan Rana, Brandon B. May, Karl Schmeckpeper, Niko Sünderhauf, Maria Vittoria Minniti, Laura Herlant

ICRA 2026

Manipulation Robot Learning Simulation

arXiv Website

TL;DR: We invert the traditional sim-to-real paradigm by building a dynamic digital twin powered by an Embodied Gaussian simulator that synchronizes with the real world at 60Hz. Instead of training on real hardware or adapting simulations post-hoc, policies always execute on a virtual robot while the physical robot mirrors its joint states. Continuous real-world measurements correct the simulation on the fly, and we validate the approach on long-horizon manipulation tasks like PushT, showing tight alignment between virtual evaluations and physical results.

Learning Equivariant Neural-Augmented Object Dynamics from Few Interactions

Sergio Orozco, Brandon B. May, Tushar Kusnur, George Konidaris, Laura Herlant

RINO @ CoRL 2025 Best Extended Abstract

Manipulation Robot Learning Simulation World Models

OpenReview

TL;DR: PIEGraph learns physically grounded dynamics for rigid and deformable objects from just a few minutes of human interaction data. It combines a physics-informed spring-mass prior with an action-conditioned equivariant graph neural network, maintaining physical plausibility (no interpenetration, shape preservation) where standard particle-based models break down. Demonstrated on ropes, cloth, stuffed animals, and rigid bodies for robotic planning.

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Jinghuan Shang, Karl Schmeckpeper, Brandon B. May, Maria Vittoria Minniti, Tarik Kelestemur, David Watkins, Laura Herlant

CoRL 2024

Computer Vision Foundation Models Robot Learning

arXiv Website Code

TL;DR: Theia distills multiple off-the-shelf vision foundation models (DINOv2, CLIP, SAM, Depth-Anything, and more) into a single compact model optimized for robot learning. The result outperforms any individual teacher model while being smaller and requiring less training data. We also find that higher entropy in feature norm distributions correlates with better downstream robot performance, offering a practical proxy for representation quality. Pre-trained models available on Hugging Face.

Salient Conditional Diffusion for Defending Against Backdoor Attacks

Brandon B. May, N. Joseph Tatro, Piyush Kumar et al.

Backdoor Attacks & Defenses @ ICLR 2023 Spotlight

Adversarial ML Diffusion Models

arXiv

TL;DR: Sancdifi defends against backdoor attacks by using a denoising diffusion model to degrade and recover images, with saliency-based conditioning that concentrates the diffusion on the most important regions. This strips backdoor triggers while preserving legitimate features. Crucially, it works as a black-box defense with no access to the target model's internals.

Comprehensive Dataset of Synthetic and Manipulated Overhead Imagery for Development and Evaluation of Forensic Tools

Brandon B. May, Kirill Trapeznikov, Shengbang Fang, Matthew C. Stamm

IH&MMSec 2023 Best Paper

Computer Vision Diffusion Models

arXiv Website

TL;DR: We release a first-of-its-kind dataset of real, fully synthetic, and partially manipulated overhead imagery for forensic research. The synthetic images are generated from a custom diffusion model trained across multiple zoom levels and data sources, enabling research into detecting and localizing manipulated satellite imagery.

Explainable Face Recognition

Jonathan R. Williford, Brandon B. May, Jeffrey Byrne

ECCV 2020

Computer Vision Explainability

arXiv Website Code

TL;DR: We introduce the first comprehensive benchmark for explainable face recognition, including “the inpainting game,” a standardized evaluation protocol of 3,648 triplets where facial features are synthetically modified to create ground truth. We propose two new attention methods, subtree EBP and DISE (Density-based Input Sampling for Explanation), which significantly outperform prior techniques at revealing which facial regions drive a network's matching decisions.

Interested in collaborating or just want to say hello?

LinkedIn Email