A survey of supervised, weakly-supervised, and unsupervised approaches to predicting dense depth from a single RGB image, comparing methods and outlining open directions.
Evaluation of agentic AI systems generation quality, agentic behaviors, and safety.
Senior Machine Learning Researcher, Apple · Seattle, WA
My research addresses a central problem in modern AI: how to rigorously evaluate agentic systems once they move beyond single-turn answers into long-horizon, multi-step tool-use with error recovery. In this regime the conventional axes of quality - correctness, completeness, relevancy - are necessary but no longer sufficient. I develop compound measures of agent behavior, including user utility, user-perceived defects, and the evaluation of multi-tool planning, and pursue a complementary line on cost and latency-aware routing that reserves frontier-model capacity for the hardest tasks while holding generation quality invariant. Together these threads aim to make agentic systems measurably more trustworthy, efficient, and accountable at production scale.
I pursue this work as a Senior Machine Learning Researcher in Apple's Human-Centered AI organization, where I scale agentic evaluation for Apple Media Products. Earlier, as an Applied Scientist at Amazon, I built LLM-driven conversational systems, large-scale insight generation, and multi-modal fraud detection across the Returns & Recommerce and Seller Partner Services organizations. I hold an M.S. in Computer Science from the University of Illinois at Chicago, where I was advised by Xinhua Zhang and worked at the intersection of few-shot machine learning and computer vision.
(Coming Soon)
Citation counts via Google Scholar (266 total citations; h-index 4, as of June 2026).
A survey of supervised, weakly-supervised, and unsupervised approaches to predicting dense depth from a single RGB image, comparing methods and outlining open directions.
A survey of action localization in video — determining what action is performed, and when and where — across algorithms, datasets, and the most promising directions.
An exploration of pre-processing, features, and memory-network methods for classifying sentiment toward a specific aspect within a sentence.
A comparison of neural and universal style-transfer approaches, focused on real-time transfer and generalization to unseen styles.
Peer reviewer for workshops at flagship machine-learning venues (ICML, KDD).
Scaling agentic evaluations for Apple Media Products.
LLMs for conversational returns, large-scale insight generation, and multi-modal fraud detection; evaluation of generative-AI listings and Seller Assistant.
Image classification and segmentation for automotive damage assessment.
Machine learning & computer vision; advised by Xinhua Zhang. Thesis: Invariant Kernels for Few-shot Learning (Outstanding Thesis Award).