Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.00982
Cited By
v1
v2 (latest)
The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale
2 November 2018
Alina Kuznetsova
H. Rom
N. Alldrin
J. Uijlings
Ivan Krasin
Jordi Pont-Tuset
Shahab Kamali
S. Popov
Matteo Malloci
Alexander Kolesnikov
Tom Duerig
V. Ferrari
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale"
50 / 356 papers shown
Title
Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention
Ziming Liu
Song Guo
Jingcai Guo
Yuanyuan Xu
Fushuo Huo
122
23
0
07 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
Peipei Zhu
Tianlin Li
Yong Luo
Zhenglong Sun
Wei-Shi Zheng
Yaowei Wang
Chen Chen
102
12
0
07 Mar 2022
Attribute Descent: Simulating Object-Centric Datasets on the Content Level and Beyond
Yue Yao
Liang Zheng
Xiaodong Yang
Milind Napthade
Tom Gedeon
87
17
0
28 Feb 2022
Speciesist bias in AI -- How AI applications perpetuate discrimination and unfair outcomes against animals
Thilo Hagendorff
L. Bossert
Yip Fai Tse
P. Singer
FaML
77
40
0
22 Feb 2022
Fairness Indicators for Systematic Assessments of Visual Feature Extractors
Priya Goyal
Adriana Romero Soriano
C. Hazirbas
Levent Sagun
Nicolas Usunier
EGVM
75
31
0
15 Feb 2022
Using Social Media Images for Building Function Classification
E. J. Hoffmann
Karam Abdulahhad
Xiao Xiang Zhu
47
30
0
15 Feb 2022
Object-Guided Day-Night Visual Localization in Urban Scenes
Assia Benbihi
C´edric Pradalier
Ondřej Chum
39
4
0
09 Feb 2022
Recent Trends in 2D Object Detection and Applications in Video Event Recognition
Prithwish Jana
Partha Pratim Mohanta
38
1
0
07 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
260
884
0
07 Feb 2022
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
63
7
0
02 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
136
101
0
31 Jan 2022
MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning
Zejun Li
Zhihao Fan
Huaixiao Tou
Jingjing Chen
Zhongyu Wei
Xuanjing Huang
86
18
0
29 Jan 2022
RelTR: Relation Transformer for Scene Graph Generation
Yuren Cong
M. Yang
Bodo Rosenhahn
ViT
181
145
0
27 Jan 2022
CrossRectify: Leveraging Disagreement for Semi-supervised Object Detection
Cheng Ma
Xingjia Pan
QiXiang Ye
Fan Tang
Weiming Dong
Changsheng Xu
103
15
0
26 Jan 2022
Visual Identification of Problematic Bias in Large Label Spaces
Alex Bauerle
Aybuke Turker
Ken Burke
Osman Aka
Timo Ropinski
Christina Greer
Mani Varadarajan
63
1
0
17 Jan 2022
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
VLM
CLIP
77
127
0
13 Jan 2022
SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining
Saksham Suri
Sai Saketh Rambhatla
Rama Chellappa
Abhinav Shrivastava
ObjD
92
12
0
12 Jan 2022
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
CLIP
VLM
131
621
0
07 Jan 2022
Equalized Focal Loss for Dense Long-Tailed Object Detection
Yue Liu
Yongqiang Yao
Jingru Tan
Gang Zhang
F. Yu
Jianwei Lu
Ye Luo
100
99
0
07 Jan 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
127
102
0
23 Dec 2021
HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images
A. Athar
Jonathon Luiten
Alexander Hermans
Deva Ramanan
Bastian Leibe
VOS
148
27
0
16 Dec 2021
Simple and Robust Loss Design for Multi-Label Learning with Missing Labels
Youcai Zhang
Y. Cheng
Xinyu Huang
Fei Wen
Rui Feng
Yaqian Li
Yandong Guo
VLM
58
34
0
13 Dec 2021
Holistic Interpretation of Public Scenes Using Computer Vision and Temporal Graphs to Identify Social Distancing Violations
Gihan Chanaka Jayatilaka
Jameel Hassan
Suren Sritharan
J. B. Senanayaka
H. Weligampola
Roshan Godaliyadda
Parakrama Ekanayake
Vijitha Herath
Janaka Ekanayake
S. Dharmaratne
102
6
0
13 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViT
VLM
86
91
0
09 Dec 2021
Visual Persuasion in COVID-19 Social Media Content: A Multi-Modal Characterization
Mesut Erhan Unal
Adriana Kovashka
Wen-Ting Chung
Yu-Ru Lin
65
4
0
05 Dec 2021
Optimization of phase-only holograms calculated with scaled diffraction calculation through deep neural networks
Yoshiyuki Ishii
Tomoyoshi Shimobaba
David Blinder
Tobias Birnbaum
P. Schelkens
Takashi Kakue
T. Ito
33
11
0
02 Dec 2021
Object-Aware Cropping for Self-Supervised Learning
Shlok Kumar Mishra
Anshul B. Shah
Ankan Bansal
Abhyuday N. Jagannatha
Janit Anjaria
Abhishek Sharma
David Jacobs
Dilip Krishnan
SSL
108
24
0
01 Dec 2021
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
109
12
0
24 Nov 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
88
296
0
24 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
146
117
0
23 Nov 2021
Class-agnostic Object Detection with Multi-modal Transformer
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
Rao Muhammad Anwer
Ming-Hsuan Yang
173
97
0
22 Nov 2021
Achieving Human Parity on Visual Question Answering
Ming Yan
Haiyang Xu
Chenliang Li
Junfeng Tian
Bin Bi
...
Ji Zhang
Songfang Huang
Fei Huang
Luo Si
Rong Jin
63
13
0
17 Nov 2021
INTERN: A New Learning Paradigm Towards General Vision
Jing Shao
Siyu Chen
Yangguang Li
Kun Wang
Zhen-fei Yin
...
F. Yu
Junjie Yan
Dahua Lin
Xiaogang Wang
Yu Qiao
110
34
0
16 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLM
CLIP
98
308
0
16 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
203
356
0
11 Nov 2021
Resource-Efficient Federated Learning
A. Abdelmoniem
Atal Narayan Sahu
Marco Canini
Suhaib A. Fahmy
FedML
91
57
0
01 Nov 2021
Multi-label Classification with Partial Annotations using Class-aware Selective Loss
Emanuel Ben-Baruch
T. Ridnik
Itamar Friedman
Avi Ben-Cohen
Nadav Zamir
Asaf Noy
Lihi Zelnik-Manor
73
40
0
21 Oct 2021
Noisy Annotation Refinement for Object Detection
Jiafeng Mao
Qing Yu
Yoko Yamakata
Kiyoharu Aizawa
NoLa
122
11
0
20 Oct 2021
EBJR: Energy-Based Joint Reasoning for Adaptive Inference
Mohammad Akbari
Amin Banitalebi-Dehkordi
Yong Zhang
BDL
MQ
85
7
0
20 Oct 2021
The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color
Cory Paik
Stéphane Aroca-Ouellette
Alessandro Roncone
Katharina Kann
69
34
0
15 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
434
1,115
0
13 Oct 2021
Inferring Offensiveness In Images From Natural Language Supervision
P. Schramowski
Kristian Kersting
48
2
0
08 Oct 2021
FooDI-ML: a large multi-language dataset of food, drinks and groceries images and descriptions
David Amat Olóndriz
Ponç Puigdevall
A. S. Palau
VLM
97
7
0
05 Oct 2021
PASS: An ImageNet replacement for self-supervised pretraining without humans
Yuki M. Asano
Christian Rupprecht
Andrew Zisserman
Andrea Vedaldi
VLM
SSL
100
58
0
27 Sep 2021
PETA: Photo Albums Event Recognition using Transformers Attention
Tamar Glaser
Emanuel Ben-Baruch
Gilad Sharir
Nadav Zamir
Asaf Noy
Lihi Zelnik-Manor
ViT
48
2
0
26 Sep 2021
Visual Scene Graphs for Audio Source Separation
Moitreya Chatterjee
Jonathan Le Roux
Narendra Ahuja
A. Cherian
105
37
0
24 Sep 2021
Discovering and Validating AI Errors With Crowdsourced Failure Reports
Ángel Alexander Cabrera
Abraham J. Druck
Jason I. Hong
Adam Perer
HAI
109
57
0
23 Sep 2021
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
307
351
0
22 Sep 2021
Deep Joint Source-Channel Coding for Multi-Task Network
Mengyang Wang
Zhicong Zhang
Jiahui Li
Mengyao Ma
Xiaopeng Fan
116
29
0
13 Sep 2021
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
146
23
0
10 Sep 2021
Previous
1
2
3
4
5
6
7
8
Next