ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.11670
  4. Cited By
GIO: Gradient Information Optimization for Training Dataset Selection
v1v2 (latest)

GIO: Gradient Information Optimization for Training Dataset Selection

20 June 2023
Dante Everaert
Christopher Potts
ArXiv (abs)PDFHTML

Papers citing "GIO: Gradient Information Optimization for Training Dataset Selection"

28 / 28 papers shown
Title
MixMin: Finding Data Mixtures via Convex Minimization
MixMin: Finding Data Mixtures via Convex Minimization
Anvith Thudi
Evianne Rovers
Yangjun Ruan
Tristan Thrush
Chris J. Maddison
101
0
0
14 Feb 2025
Improving Pretraining Data Using Perplexity Correlations
Improving Pretraining Data Using Perplexity Correlations
Tristan Thrush
Christopher Potts
Tatsunori Hashimoto
92
22
0
09 Sep 2024
Data Selection for Language Models via Importance Resampling
Data Selection for Language Models via Importance Resampling
Sang Michael Xie
Shibani Santurkar
Tengyu Ma
Percy Liang
123
196
0
06 Feb 2023
Beyond neural scaling laws: beating power law scaling via data pruning
Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher
Robert Geirhos
Shashank Shekhar
Surya Ganguli
Ari S. Morcos
100
444
0
29 Jun 2022
Dataset Pruning: Reducing Training Data by Examining Generalization
  Influence
Dataset Pruning: Reducing Training Data by Examining Generalization Influence
Shuo Yang
Zeke Xie
Hanyu Peng
Minjing Xu
Mingming Sun
P. Li
DD
228
114
0
19 May 2022
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late
  Interaction
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
Keshav Santhanam
Omar Khattab
Jon Saad-Falcon
Christopher Potts
Matei A. Zaharia
112
417
0
02 Dec 2021
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
  Framework
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Xingcheng Yao
Yanan Zheng
Xiaocong Yang
Zhilin Yang
59
45
0
07 Nov 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text
  Generation
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
281
108
0
14 Sep 2021
Deep Learning on a Data Diet: Finding Important Examples Early in
  Training
Deep Learning on a Data Diet: Finding Important Examples Early in Training
Mansheej Paul
Surya Ganguli
Gintare Karolina Dziugaite
121
462
0
15 Jul 2021
What's in the Box? A Preliminary Analysis of Undesirable Content in the
  Common Crawl Corpus
What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus
A. Luccioni
J. Viviano
86
118
0
06 May 2021
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
389
6,813
0
23 Dec 2020
NeuSpell: A Neural Spelling Correction Toolkit
NeuSpell: A Neural Spelling Correction Toolkit
Sai Muralidhar Jayanthi
Danish Pruthi
Graham Neubig
KELMLRM
82
67
0
21 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
904
42,520
0
28 May 2020
MPNet: Masked and Permuted Pre-training for Language Understanding
MPNet: Masked and Permuted Pre-training for Language Understanding
Kaitao Song
Xu Tan
Tao Qin
Jianfeng Lu
Tie-Yan Liu
111
1,138
0
20 Apr 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
186
1,284
0
25 Feb 2020
Scalable and Generalizable Social Bot Detection through Data Selection
Scalable and Generalizable Social Bot Detection through Data Selection
Kai-Cheng Yang
Onur Varol
Pik-Mai Hui
Filippo Menczer
68
327
0
20 Nov 2019
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
119
658
0
01 Nov 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMatVLM
266
10,880
0
29 Oct 2019
BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian
  Active Learning
BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning
Andreas Kirsch
Joost R. van Amersfoort
Y. Gal
FedML
89
629
0
19 Jun 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLMFaML
132
3,156
0
01 Apr 2019
Impact of Data Pruning on Machine Learning Algorithm Performance
Impact of Data Pruning on Machine Learning Algorithm Performance
Arun Thundyill Saseendran
Lovish Setia
V. Chhabria
D. Chakraborty
A. Roy
30
6
0
11 Jan 2019
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
808
132,725
0
12 Jun 2017
Spelling Correction as a Foreign Language
Spelling Correction as a Foreign Language
Yingbo Zhou
U. Porwal
Roberto Konow
45
21
0
21 May 2017
Deep Bayesian Active Learning with Image Data
Deep Bayesian Active Learning with Image Data
Y. Gal
Riashat Islam
Zoubin Ghahramani
BDLUQCV
75
1,739
0
08 Mar 2017
Billion-scale similarity search with GPUs
Billion-scale similarity search with GPUs
Jeff Johnson
Matthijs Douze
Hervé Jégou
257
3,741
0
28 Feb 2017
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.3K
194,641
0
10 Dec 2015
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
238
7,765
0
31 Aug 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,433
0
22 Dec 2014
1