Aditya Chinchure

Publications

Spotlight: Identifying and Localizing Video Generation Errors Using VLMs

Aditya Chinchure, Sahithya Ravi, Pushkar Shukla, Vered Shwartz, Leonid Sigal

Jun 01, 2026

Spotlight is a benchmark for evaluating whether VLMs can detect, localize, and explain fine-grained errors in high-fidelity text-to-video generations, with 600 videos and over 1,600 annotated error localizations.

ECCV 2026

SPIKE-RL: Video-LLMs meet Bayesian Surprise

Sahithya Ravi, Aditya Chinchure, Raymond Ng, Leonid Sigal, Vered Shwartz

Apr 01, 2026

SPIKE and SPIKE-RL use Bayesian Surprise to identify memorable moments in videos and guide surprise-weighted frame sampling, improving Video-LLM performance across downstream benchmarks.

ICLR 2026

Position: World Models must live in Parallel Worlds

Sahithya Ravi*, Aditya Chinchure*, Pushkar Shukla, Vered Shwartz, Leonid Sigal (*equal contribution)

Dec 01, 2025

We argue that world models need counterfactual simulation: the ability to reason across alternative realities and what-if scenarios, enabling safer, more capable, and more creative agents in novel environments.

NeurIPS 2025 LAW Workshop

Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models

Pushkar Shukla*, Aditya Chinchure*, Emily Diana, Alexander Tolbert, Kartik Hosanagar, Vineeth Balasubramanian, Leonid Sigal, Matthew Turk (*equal contribution)

Sep 01, 2025

The biases exhibited by text-to-image (TTI) models are often treated as independent, though in reality, they may be deeply interrelated. Fixing bias along one dimension can inadvertently affect another. We show measure and quantify such interactions, and propose InterMit, an intersectional bias mitigation algorithm that leverages these insights to achieves superior results with fewer steps.

EMNLP 2025

Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events

Aditya Chinchure*, Sahithya Ravi*, Raymond Ng, Vered Shwartz, Boyang Li, Leonid Sigal (*equal contribution)

Jun 01, 2025

Black Swan is a benchmark for evaluating vision-language models' (VLMs) commonsense reasoning capabilities in videos, particularly in abductive and defeasible reasoning tasks. It focuses on atypical events, requiring models to reason about unexpected occurrences and adapt their hypotheses based on new information.

CVPR 2025

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

Aditya Chinchure*, Pushkar Shukla*, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, Leonid Sigal, Matthew Turk (*equal contribution)

Jul 09, 2024

We propose a general approach to study and quantify a broad spectrum of biases, for any TTI model and for any prompt, using counterfactual reasoning. Unlike other works that evaluate generated images on a predefined set of bias axes, our approach automatically identifies potential biases that might be relevant to the given prompt, and measures those biases.

ECCV 2024

From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models

Mehar Bhatia, Sahithya Ravi*, Aditya Chinchure*, Eunjeong Hwang, Vered Shwartz (*equal contribution)

Jun 28, 2024

We introduce the GlobalRG benchmark, comprising two challenging tasks, retrieval across universals and cultural visual grounding, to test VL models' cultural inclusivity.

EMNLP 2024

VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

Sahithya Ravi*, Aditya Chinchure*, Leonid Sigal, Renjie Liao, Vered Shwartz (*equal contribution)

Oct 24, 2022

We present a new Vision-Language-Commonsense transformer model, VLC-BERT, that incorporates contextualized knowledge using Commonsense Transformer (COMET) to solve Visual Question Answering (VQA) tasks that require commonsense reasoning.

WACV 2023

Academic Projects

DE-TensoRF: Data-efficient and fast NeRFs

Apr 28, 2023

Developed DE-TensoRF, a model that can render 3D objects with as few as 3 images, and in under 15 min on a single GPU. We achieved the highest grade in our class, and led to collaboration efforts with Dr. Helge Rhodin’s research group.

VisualCOMET+: Visual Commonsense Generation & its incorporation into a Multimodal Topic Modeling algorithm

Dec 09, 2022

Developed an extension to VisualCOMET to generate general-purpose commonsense knowledge from images. Showed improvements on coherence and diversity scores of a novel topic modelling algorithm using the generated knowledge

VL-BERT-Graph: Graph-enhanced Transformers for Referring Expressions Comprehension

Apr 28, 2022

Incorporated Graph Neural Networks in a visual-linguistic Transformer

Investigating extensions to VLC-BERT and comparing it with GPT-3

Apr 20, 2022

This project extends VLC-BERT with pointer generator networks and object detection models. Furthermore, we compare the performance of VLC-BERT with GPT-3 on the OK-VQA dataset.

Learning faster Genetic Algorithms with dynamic mutation power

Dec 03, 2021

This project introduces a modification to the GA algorithm to introduce dynamic mutation power, to solve the Lunar Lander evironment on OpenAI Gym in 30 generations.

A Summary of Recent Text Summarization Techniques

Dec 03, 2020

In this project paper, we surveyed text summarization models by evaluating existing extractive and abstractive models. We studied the metrics and datasets used to evaluate the latest models and evaluated upcoming abstractive techniques. Finally, we highlighted future pathways for text summarization and suggested areas for improvement

Other Projects

Universal Machine Learning API

Apr 28, 2022

A powerful Python API template, built on Flask, for plug-and-play use with machine learning models.