ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.15918
  4. Cited By
Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions
v1v2 (latest)

Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions

22 April 2025
Chang Zong
Bin Li
Shoujun Zhou
Jian Wan
Lei Zhang
ArXiv (abs)PDFHTML

Papers citing "Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions"

38 / 38 papers shown
Title
RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding
Xichen Tan
Yunfan Ye
Yuanjing Luo
Qian Wan
Fang Liu
Zhiping Cai
VLM
116
1
0
11 Mar 2025
Learning Musical Representations for Music Performance Question Answering
Xingjian Diao
Chunhui Zhang
Tingxuan Wu
Ming Cheng
Z. Ouyang
Weiyi Wu
Jiang Gui
134
12
0
10 Feb 2025
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot
Xuejiao Zhao
Siyan Liu
Su-Yin Yang
Chunyan Miao
278
14
0
06 Feb 2025
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
  Fast, Memory Efficient, and Long Context Finetuning and Inference
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
162
130
0
18 Dec 2024
R-Bot: An LLM-based Query Rewrite System
R-Bot: An LLM-based Query Rewrite System
Zhaoyan Sun
Xuanhe Zhou
Guoliang Li
114
6
0
02 Dec 2024
Does Prompt Formatting Have Any Impact on LLM Performance?
Does Prompt Formatting Have Any Impact on LLM Performance?
Jia He
Mukund Rungta
David Koleczek
Arshdeep Sekhon
Franklin X Wang
Sadid Hasan
LLMAGLRM
102
59
0
15 Nov 2024
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video
  Large Language Models
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Haibo Wang
Zhiyang Xu
Yu Cheng
Shizhe Diao
Yufan Zhou
Yixin Cao
Qifan Wang
Weifeng Ge
Lifu Huang
91
26
0
04 Oct 2024
ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large
  Language Models
ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models
Mengxue Qu
Xiaodong Chen
Wu Liu
Alicia Li
Yao Zhao
88
18
0
01 Oct 2024
Training-free Video Temporal Grounding using Large-scale Pre-trained
  Models
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng
Xinhao Cai
Qingchao Chen
Yuxin Peng
Yang Liu
70
5
0
29 Aug 2024
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented
  Generation for Efficient Information Extraction
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction
Bhaskarjit Sarmah
Benika Hall
Rohan Rao
Sunil Patel
Stefano Pasquali
Dhagash Mehta
101
52
0
09 Aug 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large
  Multimodal Models
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLMVLM
132
233
0
10 Jul 2024
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
Hao Li
Chenghao Yang
An Zhang
Yang Deng
Xiang Wang
Tat-Seng Chua
LLMAG
172
33
0
09 Jun 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
Evaluating Very Long-Term Conversational Memory of LLM Agents
A. Maharana
Dong-Ho Lee
Sergey Tulyakov
Mohit Bansal
Francesco Barbieri
Yuwei Fang
LLMAG
86
81
0
27 Feb 2024
Instruction-tuned Language Models are Better Knowledge Learners
Instruction-tuned Language Models are Better Knowledge Learners
Zhengbao Jiang
Zhiqing Sun
Weijia Shi
Pedro Rodriguez
Chunting Zhou
Graham Neubig
Xi Lin
Wen-tau Yih
Srinivasan Iyer
KELM
92
41
0
20 Feb 2024
Synthetic Dialogue Dataset Generation using LLM Agents
Synthetic Dialogue Dataset Generation using LLM Agents
Yelaman Abdullin
Diego Mollá Aliod
B. Ofoghi
John Yearwood
Qingyang Li
65
35
0
30 Jan 2024
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via
  Zero-Shot LLM-to-LLM Interactions
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions
Zahra Abbasiantaeb
Yifei Yuan
Evangelos Kanoulas
Mohammad Aliannejadi
101
67
0
05 Dec 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLMMLLM
371
711
0
16 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
86
65
0
30 Oct 2023
Self-RAG: Learning to Retrieve, Generate, and Critique through
  Self-Reflection
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai
Zeqiu Wu
Yizhong Wang
Avirup Sil
Hannaneh Hajishirzi
RALM
281
782
0
17 Oct 2023
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and
  Language Models
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
MLLM
148
661
0
08 Jun 2023
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu
Liangchen Luo
Jayakumar Hoskere
Yun Zhu
Canoee Liu
Simon Tong
Jindong Chen
Lei Meng
KELMLRM
95
51
0
25 May 2023
Query Rewriting for Retrieval-Augmented Large Language Models
Query Rewriting for Retrieval-Augmented Large Language Models
Xinbei Ma
Yeyun Gong
Pengcheng He
Hai Zhao
Nan Duan
KELMLRM
109
115
0
23 May 2023
Active Retrieval Augmented Generation
Active Retrieval Augmented Generation
Zhengbao Jiang
Frank F. Xu
Luyu Gao
Zhiqing Sun
Qian Liu
Jane Dwivedi-Yu
Yiming Yang
Jamie Callan
Graham Neubig
RALM
102
294
0
11 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
447
4,668
0
30 Jan 2023
VindLU: A Recipe for Effective Video-and-Language Pretraining
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
125
81
0
09 Dec 2022
GPT-3-driven pedagogical agents for training children's curious
  question-asking skills
GPT-3-driven pedagogical agents for training children's curious question-asking skills
Rania Abdelghani
Yen-Hsiang Wang
Xingdi Yuan
Tong Wang
Pauline Lucas
Hélene Sauzéon
Pierre-Yves Oudeyer
118
106
0
25 Nov 2022
Visual Answer Localization with Cross-modal Mutual Knowledge Transfer
Visual Answer Localization with Cross-modal Mutual Knowledge Transfer
Yixuan Weng
Bin Li
104
6
0
26 Oct 2022
Learning to Locate Visual Answer in Video Corpus Using Question
Learning to Locate Visual Answer in Video Corpus Using Question
Bin Li
Yixuan Weng
Bin Sun
Shutao Li
145
5
0
11 Oct 2022
Learning to Retrieve Videos by Asking Questions
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
85
16
0
11 May 2022
Towards Visual-Prompt Temporal Answering Grounding in Medical
  Instructional Video
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
Bin Li
Yixuan Weng
Bin Sun
Shutao Li
147
33
0
13 Mar 2022
A Dataset for Medical Instructional Video Classification and Question
  Answering
A Dataset for Medical Instructional Video Classification and Question Answering
D. Gupta
Kush Attal
Dina Demner-Fushman
108
33
0
30 Jan 2022
Hierarchical Modeling for Task Recognition and Action Segmentation in
  Weakly-Labeled Instructional Videos
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
Reza Ghoddoosian
S. Sayed
V. Athitsos
76
15
0
12 Oct 2021
Natural Language Video Localization: A Revisit in Span-based Question
  Answering Framework
Natural Language Video Localization: A Revisit in Span-based Question Answering Framework
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
221
87
0
26 Feb 2021
Span-based Localizing Network for Natural Language Video Localization
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
110
316
0
29 Apr 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
311
7,575
0
02 Oct 2019
Cross-task weakly supervised learning from instructional videos
Cross-task weakly supervised learning from instructional videos
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
SSL
180
250
0
19 Mar 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.9K
95,531
0
11 Oct 2018
To Find Where You Talk: Temporal Sentence Localization in Video with
  Attention Based Location Regression
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression
Yitian Yuan
Tao Mei
Wenwu Zhu
95
333
0
19 Apr 2018
1