ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.00823
  4. Cited By
STAIR Captions: Constructing a Large-Scale Japanese Image Caption
  Dataset

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

2 May 2017
Yuya Yoshikawa
Yutaro Shigeto
A. Takeuchi
    3DV
ArXivPDFHTML

Papers citing "STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset"

50 / 61 papers shown
Title
A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling
A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling
Kyle Buettner
Jacob Emmerson
Adriana Kovashka
25
0
0
19 Apr 2025
A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Qing Zhou
Tao Yang
Junyu Gao
W. Ni
Junzheng Wu
Qi Wang
53
0
0
06 Mar 2025
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with
  Captions in 28 Languages
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
Youssef Mohamed
Runjia Li
Ibrahim Said Ahmad
Kilichbek Haydarov
Philip Torr
Kenneth Church
Mohamed Elhoseiny
VLM
38
7
0
06 Nov 2024
Quantifying the Gaps Between Translation and Native Perception in
  Training for Multimodal, Multilingual Retrieval
Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval
Kyle Buettner
Adriana Kovashka
VLM
42
4
0
02 Oct 2024
FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Yuki Imajuku
Yoko Yamakata
Kiyoharu Aizawa
39
1
0
27 Sep 2024
Cross-Lingual and Cross-Cultural Variation in Image Descriptions
Cross-Lingual and Cross-Cultural Variation in Image Descriptions
Uri Berger
Edoardo M. Ponti
31
0
0
25 Sep 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding
  Evaluation
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLM
CoGe
35
5
0
01 Jul 2024
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with
  1-to-K Contrastive Learning
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Zhijie Nie
Richong Zhang
Zhangchi Feng
Hailang Huang
Xudong Liu
40
1
0
26 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
39
3
0
13 Jun 2024
Image captioning in different languages
Image captioning in different languages
Emiel van Miltenburg
VLM
41
0
0
31 May 2024
Constructing Multilingual Visual-Text Datasets Revealing Visual
  Multilingual Ability of Vision Language Models
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Jesse Atuhurra
Iqra Ali
Tatsuya Hiraoka
Hidetaka Kamigaito
Tomoya Iwakura
Taro Watanabe
44
1
0
29 Mar 2024
A Gaze-grounded Visual Question Answering Dataset for Clarifying
  Ambiguous Japanese Questions
A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
Shun Inadumi
Seiya Kawano
Akishige Yuguchi
Yasutomo Kawanishi
Koichiro Yoshino
36
1
0
26 Mar 2024
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
Anh-Cuong Pham
Van-Quang Nguyen
Thi-Hong Vuong
Quang-Thuy Ha
29
1
0
16 Jan 2024
CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual
  Knowledge Transfer
CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer
Yabing Wang
Fan Wang
Jianfeng Dong
Hao Luo
VLM
32
9
0
14 Dec 2023
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures
  for Image Captioning Models
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models
Yuiga Wada
Kanta Kaneda
Komei Sugiura
25
4
0
07 Nov 2023
Semantic and Expressive Variation in Image Captions Across Languages
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
61
3
0
22 Oct 2023
NLLB-CLIP -- train performant multilingual image retrieval model on a
  budget
NLLB-CLIP -- train performant multilingual image retrieval model on a budget
Alexander Visheratin
VLM
32
18
0
04 Sep 2023
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language
  Representations
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
Gregor Geigle
Radu Timofte
Goran Glavas
VLM
MLLM
36
5
0
14 Jun 2023
Document Understanding Dataset and Evaluation (DUDE)
Document Understanding Dataset and Evaluation (DUDE)
Jordy Van Landeghem
Rubèn Pérez Tito
Łukasz Borchmann
Michal Pietruszka
Pawel Józiak
...
Bertrand Ackaert
Ernest Valveny
Matthew Blaschko
Sien Moens
Tomasz Stanislawek
VGen
24
53
0
15 May 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in
  Image Re-creation
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
Zhiwei Zhang
Yuliang Liu
MLLM
30
0
0
10 Mar 2023
A Large-Scale Multilingual Study of Visual Constraints on Linguistic
  Selection of Descriptions
A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions
Uri Berger
Lea Frermann
Gabriel Stanovsky
Omri Abend
52
1
0
09 Feb 2023
Universal Multimodal Representation for Language Understanding
Universal Multimodal Representation for Language Understanding
Zhuosheng Zhang
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
Z. Li
Hai Zhao
SSL
19
21
0
09 Jan 2023
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X2^22-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
Yan Zeng
Xinsong Zhang
Hang Li
Jiawei Wang
Jipeng Zhang
Hkust Wangchunshu Zhou
VLM
MLLM
34
14
0
22 Nov 2022
GLAMI-1M: A Multilingual Image-Text Fashion Dataset
GLAMI-1M: A Multilingual Image-Text Fashion Dataset
Vaclav Kosar
A. Hoskovec
Milan Šulc
Radek Bartyzal
VLM
32
3
0
17 Nov 2022
Multilingual Multimodality: A Taxonomical Survey of Datasets,
  Techniques, Challenges and Opportunities
Multilingual Multimodality: A Taxonomical Survey of Datasets, Techniques, Challenges and Opportunities
Khyathi Raghavi Chandu
A. Geramifard
40
3
0
30 Oct 2022
MaXM: Towards Multilingual Visual Question Answering
MaXM: Towards Multilingual Visual Question Answering
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
33
8
0
12 Sep 2022
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal
  Pre-training
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
Yan Zeng
Wangchunshu Zhou
Ao Luo
Ziming Cheng
Xinsong Zhang
VLM
29
30
0
01 Jun 2022
Generalizing Multimodal Pre-training into Multilingual via Language
  Acquisition
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition
Liang Zhang
Anwen Hu
Qin Jin
VLM
33
5
0
29 May 2022
Recent Advances in Neural Text Generation: A Task-Agnostic Survey
Recent Advances in Neural Text Generation: A Task-Agnostic Survey
Chen Tang
Frank Guerin
Chenghua Lin
AI4CE
OOD
28
19
0
06 Mar 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
  Languages
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
Edoardo Ponti
Ivan Vulić
MLLM
VLM
ELM
50
62
0
27 Jan 2022
Visually Grounded Reasoning across Languages and Cultures
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
Edoardo Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
111
168
0
28 Sep 2021
Towards Zero-shot Cross-lingual Image Retrieval and Tagging
Towards Zero-shot Cross-lingual Image Retrieval and Tagging
Pranav Aggarwal
Ritiz Tambi
Ajinkya Kale
VLM
13
6
0
15 Sep 2021
Bornon: Bengali Image Captioning with Transformer-based Deep learning
  approach
Bornon: Bengali Image Captioning with Transformer-based Deep learning approach
Faisal Muhammad Shah
Mayeesha Humaira
Md Abidur Rahman Khan Jim
Amit Saha Ami
Shimul Paul
29
17
0
11 Sep 2021
MURAL: Multimodal, Multitask Retrieval Across Languages
MURAL: Multimodal, Multitask Retrieval Across Languages
Aashi Jain
Mandy Guo
Krishna Srinivasan
Ting-Li Chen
Sneha Kudugunta
Chao Jia
Yinfei Yang
Jason Baldridge
VLM
115
52
0
10 Sep 2021
Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake
  Monitoring
Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring
Jianing Qiu
Frank P.-W. Lo
Xiao Gu
M. Jobarteh
Wenyan Jia
...
M. McCrory
Edward Sazonov
Mingui Sun
Gary Frost
Benny Lo
EgoV
38
18
0
01 Jul 2021
Grounding 'Grounding' in NLP
Grounding 'Grounding' in NLP
Khyathi Raghavi Chandu
Yonatan Bisk
A. Black
30
51
0
04 Jun 2021
Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text
  Representations Without Parallel Corpora
Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora
Mikhail Fain
Niall Twomey
Danushka Bollegala
19
2
0
11 May 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language
  Pre-training
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
31
89
0
01 Apr 2021
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time
  Image-Text Retrieval
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
38
82
0
16 Mar 2021
Improved Bengali Image Captioning via deep convolutional neural network
  based encoder-decoder model
Improved Bengali Image Captioning via deep convolutional neural network based encoder-decoder model
Mohammad Faiyaz Khan
S. M. S. Shifath
Md. Saiful Islam
VLM
33
18
0
14 Feb 2021
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision
  and Language Research in Turkish
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
25
8
0
13 Dec 2020
Towards Zero-shot Cross-lingual Image Retrieval
Towards Zero-shot Cross-lingual Image Retrieval
Pranav Aggarwal
Ajinkya Kale
VLM
19
25
0
24 Nov 2020
Curious Case of Language Generation Evaluation Metrics: A Cautionary
  Tale
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
ELM
39
35
0
26 Oct 2020
A Corpus for English-Japanese Multimodal Neural Machine Translation with
  Comparable Sentences
A Corpus for English-Japanese Multimodal Neural Machine Translation with Comparable Sentences
Andrew C. Merritt
Chenhui Chu
Yuki Arase
17
5
0
17 Oct 2020
Denoising Large-Scale Image Captioning from Alt-text Data using Content
  Selection Models
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models
Khyathi Raghavi Chandu
Piyush Sharma
Soravit Changpinyo
Ashish V. Thapliyal
Radu Soricut
DiffM
VLM
32
3
0
10 Sep 2020
M3P: Learning Universal Representations via Multitask Multilingual
  Multimodal Pre-training
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
Minheng Ni
Haoyang Huang
Lin Su
Edward Cui
Taroon Bharti
Lijuan Wang
Jianfeng Gao
Dongdong Zhang
Nan Duan
29
7
0
04 Jun 2020
Captioning Images Taken by People Who Are Blind
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
22
181
0
20 Feb 2020
UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image
  Captioning
UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning
Q. Lam
Q. Le
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
25
19
0
01 Feb 2020
Multimodal Machine Translation through Visuals and Speech
Multimodal Machine Translation through Visuals and Speech
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
49
73
0
28 Nov 2019
Bootstrapping Disjoint Datasets for Multilingual Multimodal
  Representation Learning
Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning
Ákos Kádár
Grzegorz Chrupała
A. Alishahi
Desmond Elliott
21
1
0
09 Nov 2019
12
Next