ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.04618
  4. Cited By
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
  and LLMs Evaluations

Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations

7 June 2023
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
ArXivPDFHTML

Papers citing "Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations"

50 / 70 papers shown
Title
Robo-Troj: Attacking LLM-based Task Planners
Robo-Troj: Attacking LLM-based Task Planners
Mohaiminul Al Nahian
Zainab Altaweel
David Reitano
Sabbir Ahmed
Saumitra Lohokare
Shiqi Zhang
Adnan Siraj Rakin
AAML
68
0
0
23 Apr 2025
Adversarial Training of Reward Models
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Zihan Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
32
0
0
08 Apr 2025
Cognitive Debiasing Large Language Models for Decision-Making
Cognitive Debiasing Large Language Models for Decision-Making
Yougang Lyu
Shijie Ren
Yue Feng
Zihan Wang
Z. Chen
Z. Z. Ren
Maarten de Rijke
43
0
0
05 Apr 2025
Simple yet Effective Node Property Prediction on Edge Streams under Distribution Shifts
Simple yet Effective Node Property Prediction on Edge Streams under Distribution Shifts
Jongha Lee
Taehyung Kwon
Heechan Moon
Kijung Shin
AI4TS
46
0
0
01 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Zihan Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
95
4
0
01 Apr 2025
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Athiya Deviyani
Fernando Diaz
39
0
0
25 Mar 2025
COPA: Comparing the Incomparable to Explore the Pareto Front
COPA: Comparing the Incomparable to Explore the Pareto Front
Adrián Javaloy
Antonio Vergari
Isabel Valera
67
0
0
18 Mar 2025
Robustness and Cybersecurity in the EU Artificial Intelligence Act
Robustness and Cybersecurity in the EU Artificial Intelligence Act
Henrik Nolte
Miriam Rateike
Michèle Finck
38
1
0
22 Feb 2025
The Philosophical Foundations of Growing AI Like A Child
The Philosophical Foundations of Growing AI Like A Child
Dezhi Luo
Yijiang Li
Hokin Deng
ReLM
LRM
48
2
0
15 Feb 2025
Out-of-Distribution Detection using Synthetic Data Generation
Out-of-Distribution Detection using Synthetic Data Generation
Momin Abbas
Muneeza Azmat
R. Horesh
Mikhail Yurochkin
47
1
0
05 Feb 2025
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Rui Min
Tianyu Pang
Chao Du
Qian Liu
Minhao Cheng
Min Lin
AAML
57
4
0
29 Jan 2025
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
80
6
0
28 Jan 2025
A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Younes Yousef
Lukas Galke
A. Scherp
49
0
0
23 Jan 2025
Fine-Grained Appropriate Reliance: Human-AI Collaboration with a Multi-Step Transparent Decision Workflow for Complex Task Decomposition
Fine-Grained Appropriate Reliance: Human-AI Collaboration with a Multi-Step Transparent Decision Workflow for Complex Task Decomposition
Gaole He
Patrick Hemmer
Michael Vossing
Max Schemmer
U. Gadiraju
47
0
0
19 Jan 2025
Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
Martin Pawelczyk
Lillian Sun
Zhenting Qi
Aounon Kumar
Himabindu Lakkaraju
49
1
0
03 Jan 2025
Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection
Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection
D. Roussinov
Serge Sharoff
Nadezhda Puchnina
32
0
0
31 Dec 2024
VideoICL: Confidence-based Iterative In-context Learning for
  Out-of-Distribution Video Understanding
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding
Kangsan Kim
G. Park
Youngwan Lee
Woongyeong Yeo
Sung Ju Hwang
97
3
0
03 Dec 2024
Ño' Matters: Out-of-Distribution Detection in Multimodality Long
  Dialogue
Ño' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Rena Gao
Xuetong Wu
Siwen Luo
Caren Han
Feng Liu
OODD
47
0
0
31 Oct 2024
Fine-Tuning Pre-trained Language Models for Robust Causal Representation
  Learning
Fine-Tuning Pre-trained Language Models for Robust Causal Representation Learning
Jialin Yu
Yuxiang Zhou
Yulan He
Nevin L. Zhang
Ricardo Silva
38
0
0
18 Oct 2024
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Shashank Yadav
Rohan Tomar
Garvit Jain
Chirag Ahooja
Shubham Chaudhary
Charles Elkan
33
0
0
05 Oct 2024
In-Context Transfer Learning: Demonstration Synthesis by Transferring
  Similar Tasks
In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks
Dingzirui Wang
Xuanliang Zhang
Qiguang Chen
Longxu Dou
Xiao Xu
...
Qingfu Zhu
Wanxiang Che
Binhua Li
Fei Huang
Yongbin Li
48
0
0
02 Oct 2024
Broadening Access to Simulations for End-Users via Large Language
  Models: Challenges and Opportunities
Broadening Access to Simulations for End-Users via Large Language Models: Challenges and Opportunities
Philippe J. Giabbanelli
Jose J. Padilla
Ameeta Agrawal
32
2
0
03 Sep 2024
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in
  Vision Language Models
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
Gracjan Góral
Alicja Ziarko
Michal Nauman
Maciej Wołczyk
LRM
33
2
0
02 Sep 2024
Can Large Language Models Understand Symbolic Graphics Programs?
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu
Weiyang Liu
Haiwen Feng
Zhen Liu
Tim Z. Xiao
Katherine M. Collins
J. Tenenbaum
Adrian Weller
Michael J. Black
Bernhard Schölkopf
48
11
0
15 Aug 2024
LocalValueBench: A Collaboratively Built and Extensible Benchmark for
  Evaluating Localized Value Alignment and Ethical Safety in Large Language
  Models
LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models
Achintya Gopal
Nicholas Wai Long Lau
Eva Adelina Susanto
Chi Lok Yu
Aditya Paul
ELM
25
7
0
27 Jul 2024
AgentPeerTalk: Empowering Students through Agentic-AI-Driven Discernment
  of Bullying and Joking in Peer Interactions in Schools
AgentPeerTalk: Empowering Students through Agentic-AI-Driven Discernment of Bullying and Joking in Peer Interactions in Schools
Aditya Paul
Chi Lok Yu
Eva Adelina Susanto
Nicholas Wai Long Lau
Gwenyth Isobel Meadows
LLMAG
35
3
0
27 Jul 2024
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Bac Nguyen
Stefan Uhlich
Fabien Cardinaux
Lukas Mauch
Marzieh Edraki
Aaron Courville
OODD
CLL
VLM
57
3
0
03 Jul 2024
When Search Engine Services meet Large Language Models: Visions and
  Challenges
When Search Engine Services meet Large Language Models: Visions and Challenges
Haoyi Xiong
Jiang Bian
Yuchen Li
Xuhong Li
Mengnan Du
Shuaiqiang Wang
Dawei Yin
Sumi Helal
53
29
0
28 Jun 2024
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with
  Sparse Mixture-of-Experts
MoE-RBench\texttt{MoE-RBench}MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
76
5
0
17 Jun 2024
Evaluating the Generalization Ability of Quantized LLMs: Benchmark,
  Analysis, and Toolbox
Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox
Yijun Liu
Yuan Meng
Fang Wu
Shenhao Peng
Hang Yao
Chaoyu Guan
Chen Tang
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
62
7
0
15 Jun 2024
Adversarial Evasion Attack Efficiency against Large Language Models
Adversarial Evasion Attack Efficiency against Large Language Models
João Vitorino
Eva Maia
Isabel Praça
AAML
43
2
0
12 Jun 2024
A Survey on Medical Large Language Models: Technology, Application,
  Trustworthiness, and Future Directions
A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions
Lei Liu
Xiaoyan Yang
Junchi Lei
Xiaoyang Liu
Yue Shen
...
Peng Wei
Jinjie Gu
Zhixuan Chu
Zhan Qin
Kui Ren
LM&MA
AILaw
46
14
0
06 Jun 2024
Understanding Linear Probing then Fine-tuning Language Models from NTK
  Perspective
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
Akiyoshi Tomihari
Issei Sato
30
4
0
27 May 2024
Cross-Platform Hate Speech Detection with Weakly Supervised Causal
  Disentanglement
Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement
Paras Sheth
Tharindu Kumarage
Raha Moraffah
Amanat Chadha
Huan Liu
34
1
0
17 Apr 2024
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A
  Multifaceted Statistical Approach
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach
Kun Sun
Rong Wang
Anders Sogaard
37
3
0
22 Mar 2024
A Survey on Evaluation of Out-of-Distribution Generalization
A Survey on Evaluation of Out-of-Distribution Generalization
Han Yu
Jiashuo Liu
Xingxuan Zhang
Jiayun Wu
Peng Cui
OOD
47
8
0
04 Mar 2024
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for
  Ultra-Low-Parameter Fine-Tuning of Large Language Models
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
Yifan Yang
Jiajun Zhou
Ngai Wong
Zheng Zhang
29
7
0
18 Feb 2024
Improving Black-box Robustness with In-Context Rewriting
Improving Black-box Robustness with In-Context Rewriting
Kyle O'Brien
Nathan Ng
Isha Puri
Jorge Mendez
Hamid Palangi
Yoon Kim
Marzyeh Ghassemi
Tom Hartvigsen
52
6
0
13 Feb 2024
The Risk of Federated Learning to Skew Fine-Tuning Features and
  Underperform Out-of-Distribution Robustness
The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness
Mengyao Du
Miao Zhang
Yuwen Pu
Kai Xu
Shouling Ji
Quanjun Yin
40
1
0
25 Jan 2024
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality
  Assurance
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
Tinghui Ouyang
AprilPyone Maungmaung
Koichi Konishi
Yoshiki Seo
Isao Echizen
AI4MH
28
5
0
15 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
63
57
0
11 Jan 2024
A Survey on Large Language Model (LLM) Security and Privacy: The Good,
  the Bad, and the Ugly
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
52
476
0
04 Dec 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided
  Code-Vision Representation
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Yangyi Chen
Xingyao Wang
Manling Li
Derek Hoiem
Heng Ji
30
11
0
22 Nov 2023
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Ashim Gupta
Rishanth Rajendhran
Nathan Stringham
Vivek Srikumar
Ana Marasović
AAML
31
3
0
16 Nov 2023
Data Similarity is Not Enough to Explain Language Model Performance
Data Similarity is Not Enough to Explain Language Model Performance
Gregory Yauney
Emily Reif
David M. Mimno
47
6
0
15 Nov 2023
How Abilities in Large Language Models are Affected by Supervised
  Fine-tuning Data Composition
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition
Guanting Dong
Hongyi Yuan
Keming Lu
Chengpeng Li
Mingfeng Xue
Dayiheng Liu
Wei Wang
Zheng Yuan
Chang Zhou
Jingren Zhou
LRM
CLL
34
121
0
09 Oct 2023
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
  Toolsets
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
Lifan Yuan
Yangyi Chen
Xingyao Wang
Yi R. Fung
Hao Peng
Heng Ji
LLMAG
KELM
38
58
0
29 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
43
66
0
21 Sep 2023
How to Handle Different Types of Out-of-Distribution Scenarios in
  Computational Argumentation? A Comprehensive and Fine-Grained Field Study
How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study
Andreas Waldis
Yufang Hou
Iryna Gurevych
30
2
0
15 Sep 2023
Making Pre-trained Language Models both Task-solvers and
  Self-calibrators
Making Pre-trained Language Models both Task-solvers and Self-calibrators
Yangyi Chen
Xingyao Wang
Heng Ji
20
0
0
21 Jul 2023
12
Next