
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
Oct 23, 2022We present a new Vision-Language-Commonsense transformer model, VLC-BERT, that incorporates contextualized knowledge using Commonsense Transformer (COMET) to solve Visual Question Answering (VQA) tasks that require commonsense reasoning.
WACV 2023