Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.05528
Cited By
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
11 October 2022
Neeraj Varshney
Chitta Baral
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems"
30 / 30 papers shown
Title
Bi-directional Model Cascading with Proxy Confidence
David Warren
Mark Dras
49
0
0
27 Apr 2025
EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing
Yizhang Zhu
Runzhi Jiang
Boyan Li
Nan Tang
Yuyu Luo
34
2
0
28 Mar 2025
I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning
Stephan Rabanser
Nathalie Rauschmayr
Achin Kulshrestha
Petra Poklukar
Wittawat Jitkrittum
Sean Augenstein
Congchao Wang
Federico Tombari
42
0
0
26 Feb 2025
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
Zhijun Chen
Jingzheng Li
Pengpeng Chen
Zhuoran Li
Kai Sun
Yuankai Luo
Qianren Mao
Dingqi Yang
Hailong Sun
Philip S. Yu
ELM
55
5
0
25 Feb 2025
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
António Farinhas
Nuno M. Guerreiro
Sweta Agrawal
Ricardo Rei
André F. T. Martins
53
0
0
18 Feb 2025
A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck
Maximilian Baader
Martin Vechev
60
2
0
17 Feb 2025
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
Nikhil Bhendawade
Mahyar Najibi
Devang Naik
Irina Belousova
MoE
85
0
0
04 Feb 2025
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
63
23
0
10 Sep 2024
Cascade-Aware Training of Language Models
Congchao Wang
Sean Augenstein
Keith Rush
Wittawat Jitkrittum
Harikrishna Narasimhan
A. S. Rawat
A. Menon
Alec Go
36
4
0
29 May 2024
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
Seungyeon Kim
Neha Gupta
A. Menon
Sanjiv Kumar
LRM
44
6
0
29 May 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
Anton Van Den Hengel
VLM
43
1
0
27 May 2024
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
53
42
0
15 Apr 2024
Online Cascade Learning for Efficient Inference over Streams
Lunyiu Nie
Zhimin Ding
Erdong Hu
Christopher M. Jermaine
Swarat Chaudhuri
35
4
0
07 Feb 2024
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Murong Yue
Jie Zhao
Min Zhang
Liang Du
Ziyu Yao
LRM
35
56
0
04 Oct 2023
Can NLP Models Ídentify', 'Distinguish', and 'Justify' Questions that Don't have a Definitive Answer?
Ayushi Agarwal
Nisarg Patel
Neeraj Varshney
Mihir Parmar
Pavan Mallina
Aryan Bhavin Shah
Srihari Sangaraju
Tirth Patel
Nihar Thakkar
Chitta Baral
ELM
26
3
0
08 Sep 2023
Making Pre-trained Language Models both Task-solvers and Self-calibrators
Yangyi Chen
Xingyao Wang
Heng Ji
20
0
0
21 Jul 2023
When Does Confidence-Based Cascade Deferral Suffice?
Wittawat Jitkrittum
Neha Gupta
A. Menon
Harikrishna Narasimhan
A. S. Rawat
Surinder Kumar
22
18
0
06 Jul 2023
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings
Daniel Rotem
Michael Hassid
Jonathan Mamou
Roy Schwartz
25
5
0
04 Jun 2023
Selectively Answering Ambiguous Questions
Jeremy R. Cole
Michael J.Q. Zhang
D. Gillick
Julian Martin Eisenschlos
Bhuwan Dhingra
Jacob Eisenstein
UQLM
26
26
0
24 May 2023
Efficient Prompting via Dynamic In-Context Learning
Wangchunshu Zhou
Yuchen Eleanor Jiang
Ryan Cotterell
Mrinmaya Sachan
29
19
0
18 May 2023
A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution
Neeraj Varshney
Himanshu Gupta
Eric Robertson
Bin Liu
Chitta Baral
27
1
0
08 May 2023
Post-Abstention: Towards Reliably Re-Attempting the Abstained Instances in QA
Neeraj Varshney
Chitta Baral
39
13
0
02 May 2023
Batch Prompting: Efficient Inference with Large Language Model APIs
Zhoujun Cheng
Jungo Kasai
Tao Yu
LRM
11
72
0
19 Jan 2023
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
21
11
0
23 Nov 2022
Unsupervised Natural Language Inference Using PHL Triplet Generation
Neeraj Varshney
Pratyay Banerjee
Tejas Gokhale
Chitta Baral
23
9
0
16 Oct 2021
Towards Zero-Label Language Learning
Zirui Wang
Adams Wei Yu
Orhan Firat
Yuan Cao
SyDa
188
102
0
19 Sep 2021
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Zhangyang Wang
Michael Carbin
156
345
0
23 Jul 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,589
0
21 Jan 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
236
576
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1