ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.05528
  4. Cited By
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of
  NLP Systems

Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems

11 October 2022
Neeraj Varshney
Chitta Baral
ArXivPDFHTML

Papers citing "Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems"

30 / 30 papers shown
Title
Bi-directional Model Cascading with Proxy Confidence
Bi-directional Model Cascading with Proxy Confidence
David Warren
Mark Dras
49
0
0
27 Apr 2025
EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing
EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing
Yizhang Zhu
Runzhi Jiang
Boyan Li
Nan Tang
Yuyu Luo
34
2
0
28 Mar 2025
I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning
I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning
Stephan Rabanser
Nathalie Rauschmayr
Achin Kulshrestha
Petra Poklukar
Wittawat Jitkrittum
Sean Augenstein
Congchao Wang
Federico Tombari
42
0
0
26 Feb 2025
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
Zhijun Chen
Jingzheng Li
Pengpeng Chen
Zhuoran Li
Kai Sun
Yuankai Luo
Qianren Mao
Dingqi Yang
Hailong Sun
Philip S. Yu
ELM
55
5
0
25 Feb 2025
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
António Farinhas
Nuno M. Guerreiro
Sweta Agrawal
Ricardo Rei
André F. T. Martins
53
0
0
18 Feb 2025
A Unified Approach to Routing and Cascading for LLMs
A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck
Maximilian Baader
Martin Vechev
60
2
0
17 Feb 2025
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
Nikhil Bhendawade
Mahyar Najibi
Devang Naik
Irina Belousova
MoE
85
0
0
04 Feb 2025
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
63
23
0
10 Sep 2024
Cascade-Aware Training of Language Models
Cascade-Aware Training of Language Models
Congchao Wang
Sean Augenstein
Keith Rush
Wittawat Jitkrittum
Harikrishna Narasimhan
A. S. Rawat
A. Menon
Alec Go
36
4
0
29 May 2024
Faster Cascades via Speculative Decoding
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
Seungyeon Kim
Neha Gupta
A. Menon
Sanjiv Kumar
LRM
44
6
0
29 May 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
Anton Van Den Hengel
VLM
43
1
0
27 May 2024
Language Model Cascades: Token-level uncertainty and beyond
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
53
42
0
15 Apr 2024
Online Cascade Learning for Efficient Inference over Streams
Online Cascade Learning for Efficient Inference over Streams
Lunyiu Nie
Zhimin Ding
Erdong Hu
Christopher M. Jermaine
Swarat Chaudhuri
35
4
0
07 Feb 2024
Large Language Model Cascades with Mixture of Thoughts Representations
  for Cost-efficient Reasoning
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Murong Yue
Jie Zhao
Min Zhang
Liang Du
Ziyu Yao
LRM
35
56
0
04 Oct 2023
Can NLP Models Ídentify', 'Distinguish', and 'Justify' Questions that
  Don't have a Definitive Answer?
Can NLP Models Ídentify', 'Distinguish', and 'Justify' Questions that Don't have a Definitive Answer?
Ayushi Agarwal
Nisarg Patel
Neeraj Varshney
Mihir Parmar
Pavan Mallina
Aryan Bhavin Shah
Srihari Sangaraju
Tirth Patel
Nihar Thakkar
Chitta Baral
ELM
26
3
0
08 Sep 2023
Making Pre-trained Language Models both Task-solvers and
  Self-calibrators
Making Pre-trained Language Models both Task-solvers and Self-calibrators
Yangyi Chen
Xingyao Wang
Heng Ji
20
0
0
21 Jul 2023
When Does Confidence-Based Cascade Deferral Suffice?
When Does Confidence-Based Cascade Deferral Suffice?
Wittawat Jitkrittum
Neha Gupta
A. Menon
Harikrishna Narasimhan
A. S. Rawat
Surinder Kumar
22
18
0
06 Jul 2023
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
  in Low Resource Settings
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings
Daniel Rotem
Michael Hassid
Jonathan Mamou
Roy Schwartz
25
5
0
04 Jun 2023
Selectively Answering Ambiguous Questions
Selectively Answering Ambiguous Questions
Jeremy R. Cole
Michael J.Q. Zhang
D. Gillick
Julian Martin Eisenschlos
Bhuwan Dhingra
Jacob Eisenstein
UQLM
26
26
0
24 May 2023
Efficient Prompting via Dynamic In-Context Learning
Efficient Prompting via Dynamic In-Context Learning
Wangchunshu Zhou
Yuchen Eleanor Jiang
Ryan Cotterell
Mrinmaya Sachan
29
19
0
18 May 2023
A Unified Evaluation Framework for Novelty Detection and Accommodation
  in NLP with an Instantiation in Authorship Attribution
A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution
Neeraj Varshney
Himanshu Gupta
Eric Robertson
Bin Liu
Chitta Baral
27
1
0
08 May 2023
Post-Abstention: Towards Reliably Re-Attempting the Abstained Instances
  in QA
Post-Abstention: Towards Reliably Re-Attempting the Abstained Instances in QA
Neeraj Varshney
Chitta Baral
39
13
0
02 May 2023
Batch Prompting: Efficient Inference with Large Language Model APIs
Batch Prompting: Efficient Inference with Large Language Model APIs
Zhoujun Cheng
Jungo Kasai
Tao Yu
LRM
11
72
0
19 Jan 2023
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like
  Humans?
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
21
11
0
23 Nov 2022
Unsupervised Natural Language Inference Using PHL Triplet Generation
Unsupervised Natural Language Inference Using PHL Triplet Generation
Neeraj Varshney
Pratyay Banerjee
Tejas Gokhale
Chitta Baral
23
9
0
16 Oct 2021
Towards Zero-Label Language Learning
Towards Zero-Label Language Learning
Zirui Wang
Adams Wei Yu
Orhan Firat
Yuan Cao
SyDa
188
102
0
19 Sep 2021
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Zhangyang Wang
Michael Carbin
156
345
0
23 Jul 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,589
0
21 Jan 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
236
576
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1