ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.01613
  4. Cited By
Nomic Embed: Training a Reproducible Long Context Text Embedder

Nomic Embed: Training a Reproducible Long Context Text Embedder

2 February 2024
Zach Nussbaum
John X. Morris
Brandon Duderstadt
Andriy Mulyar
ArXivPDFHTML

Papers citing "Nomic Embed: Training a Reproducible Long Context Text Embedder"

50 / 73 papers shown
Title
The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems
The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems
Petr Kasalický
Martin Spišák
Vojtěch Vančura
Daniel Bohuněk
Rodrigo Alves
Pavel Kordík
12
0
0
16 May 2025
S-DAT: A Multilingual, GenAI-Driven Framework for Automated Divergent Thinking Assessment
S-DAT: A Multilingual, GenAI-Driven Framework for Automated Divergent Thinking Assessment
J. Haase
P. Hanel
Sebastian Pokutta
LRM
21
0
0
14 May 2025
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
Xiwen Chen
Wenhui Zhu
Peijie Qiu
Xuanzhao Dong
Hao Wang
Haiyu Wu
Huayu Li
Aristeidis Sotiras
Yunhong Wang
Abolfazl Razi
ALM
42
0
0
14 May 2025
Hakim: Farsi Text Embedding Model
Hakim: Farsi Text Embedding Model
Mehran Sarmadi
Morteza Alikhani
Erfan Zinvandi
Zahra Pourbahman
VLM
28
0
0
13 May 2025
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Benjamin Raphael Ernhofer
Daniil Prokhorov
Jannica Langner
Dominik Bollmann
39
0
0
09 May 2025
Griffin: Towards a Graph-Centric Relational Database Foundation Model
Griffin: Towards a Graph-Centric Relational Database Foundation Model
Yanbo Wang
Xiyuan Wang
Quan Gan
Minjie Wang
Qibin Yang
David Wipf
Muhan Zhang
88
0
0
08 May 2025
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
Albert Ge
Tzu-Heng Huang
John Cooper
Avi Trost
Ziyi Chu
Satya Sai Srinath Namburi GNVV
Ziyang Cai
Kendall Park
Nicholas Roberts
Frederic Sala
53
0
0
01 May 2025
Optimization of embeddings storage for RAG systems using quantization and dimensionality reduction techniques
Optimization of embeddings storage for RAG systems using quantization and dimensionality reduction techniques
Naamán Huerga-Pérez
Rubén Álvarez
Rubén Ferrero-Guillén
Alberto Martínez-Gutiérrez
Javier Díez-González
MQ
24
0
0
30 Apr 2025
MIEB: Massive Image Embedding Benchmark
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
K. Enevoldsen
Niklas Muennighoff
VLM
37
0
0
14 Apr 2025
Out of Style: RAG's Fragility to Linguistic Variation
Out of Style: RAG's Fragility to Linguistic Variation
Tianyu Cao
Neel Bhandari
Akhila Yerukola
Akari Asai
Maarten Sap
27
0
0
11 Apr 2025
SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog
SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog
Jennifer D’Souza
Sameer Sadruddin
Holger Israel
Mathias Begoin
Diana Slawig
62
5
0
09 Apr 2025
Can LLM-Driven Hard Negative Sampling Empower Collaborative Filtering? Findings and Potentials
Can LLM-Driven Hard Negative Sampling Empower Collaborative Filtering? Findings and Potentials
Chu Zhao
Enneng Yang
Yuting Liu
Jianzhe Zhao
G. Guo
Xingwei Wang
28
0
0
07 Apr 2025
Toward a digital twin of U.S. Congress
Toward a digital twin of U.S. Congress
Hayden Helm
Tianyi Chen
Harvey McGuinness
Paige Lee
Brandon Duderstadt
Carey E. Priebe
29
0
0
04 Apr 2025
Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data
Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data
Waris Gill
Justin Cechmanek
Tyler Hutcherson
Srijith Rajamohan
Jen Agarwal
Muhammad Ali Gulzar
Manvinder Singh
Benoit Dion
35
0
0
03 Apr 2025
InteractiveSurvey: An LLM-based Personalized and Interactive Survey Paper Generation System
InteractiveSurvey: An LLM-based Personalized and Interactive Survey Paper Generation System
Zhiyuan Wen
Jiannong Cao
Zian Wang
Beichen Guo
Ruosong Yang
Shuaiqi Liu
36
0
0
31 Mar 2025
SemEval-2025 Task 9: The Food Hazard Detection Challenge
SemEval-2025 Task 9: The Food Hazard Detection Challenge
Korbinian Randl
John Pavlopoulos
Aron Henriksson
Tony Lindgren
Juli Bakagianni
35
2
0
25 Mar 2025
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Tiansheng Wen
Yifei Wang
Zequn Zeng
Zhong Peng
Yudi Su
Xinyang Liu
Bo Chen
Hongwei Liu
Stefanie Jegelka
Chenyu You
CLL
68
3
0
03 Mar 2025
Granite Embedding Models
Granite Embedding Models
Parul Awasthy
Aashka Trivedi
Yulong Li
Mihaela A. Bornea
David D. Cox
...
Sukriti Sharma
Avirup Sil
Kate Soule
Arafat Sultan
Radu Florian
RALM
67
1
0
27 Feb 2025
DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers
DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers
Xueguang Ma
Xi Lin
Barlas Oğuz
Jimmy Lin
Wen-tau Yih
Xilun Chen
RALM
85
3
0
25 Feb 2025
LettuceDetect: A Hallucination Detection Framework for RAG Applications
LettuceDetect: A Hallucination Detection Framework for RAG Applications
Adam Kovacs
Gábor Recski
45
2
0
24 Feb 2025
Enhancing Domain-Specific Retrieval-Augmented Generation: Synthetic Data Generation and Evaluation using Reasoning Models
Enhancing Domain-Specific Retrieval-Augmented Generation: Synthetic Data Generation and Evaluation using Reasoning Models
Aryan Jadon
Avinash Patil
Shashank Kumar
SyDa
52
1
0
21 Feb 2025
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
Pengcheng Jiang
Lang Cao
Ruike Zhu
Minhao Jiang
Yunyi Zhang
Jiashuo Sun
J. Han
RALM
85
0
0
16 Feb 2025
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Shreyan Biswas
Alexander Erlei
U. Gadiraju
105
4
0
13 Feb 2025
Knowledge Graph-Guided Retrieval Augmented Generation
Knowledge Graph-Guided Retrieval Augmented Generation
Xiangrong Zhu
Yuexiang Xie
Yi Liu
Yaliang Li
Wei Hu
RALM
45
0
0
08 Feb 2025
Consistent estimation of generative model representations in the data kernel perspective space
Consistent estimation of generative model representations in the data kernel perspective space
Aranyak Acharyya
M. Trosset
Carey E. Priebe
Hayden Helm
DiffM
65
3
0
20 Jan 2025
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
  Fast, Memory Efficient, and Long Context Finetuning and Inference
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
88
79
0
18 Dec 2024
LLMs are Also Effective Embedding Models: An In-depth Overview
LLMs are Also Effective Embedding Models: An In-depth Overview
Chongyang Tao
Tao Shen
Shen Gao
Junshuo Zhang
Zhen Li
Zhengwei Tao
Shuai Ma
83
7
0
17 Dec 2024
Experimenting with Multi-modal Information to Predict Success of Indian
  IPOs
Experimenting with Multi-modal Information to Predict Success of Indian IPOs
Sohom Ghosh
Arnab Maji
N Harsha Vardhan
S. Naskar
69
0
0
08 Dec 2024
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
Tarun Suresh
R. Reddy
Yifei Xu
Zach Nussbaum
Andriy Mulyar
Brandon Duderstadt
Heng Ji
89
3
0
01 Dec 2024
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
Ali Shiraee Kasmaee
Mohammad Khodadad
Mohammad Arshi Saloot
Nick Sherck
Stephen Dokas
H. Mahyar
Soheila Samiee
ELM
180
0
0
30 Nov 2024
Length-Induced Embedding Collapse in Transformer-based Models
Length-Induced Embedding Collapse in Transformer-based Models
Yuqi Zhou
Sunhao Dai
Zhanshuo Cao
Xiao Zhang
Jun Xu
47
0
0
31 Oct 2024
HyQE: Ranking Contexts with Hypothetical Query Embeddings
HyQE: Ranking Contexts with Hypothetical Query Embeddings
Weichao Zhou
Jiaxin Zhang
Hilaf Hasson
Anu Singh
Wenchao Li
RALM
25
1
0
20 Oct 2024
The Large Language Model GreekLegalRoBERTa
The Large Language Model GreekLegalRoBERTa
Vasileios Saketos
D. Pantazi
Manolis Koubarakis
AILaw
34
0
0
10 Oct 2024
Do You Know What You Are Talking About? Characterizing Query-Knowledge
  Relevance For Reliable Retrieval Augmented Generation
Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation
Zhuohang Li
Jiaxin Zhang
Chao Yan
Kamalika Das
Sricharan Kumar
Murat Kantarcioglu
Bradley Malin
RALM
21
1
0
10 Oct 2024
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
Pengcheng Jiang
Cao Xiao
Minhao Jiang
Parminder Bhatia
Taha A. Kass-Hout
Jimeng Sun
Jiawei Han
RALM
AI4MH
43
4
0
06 Oct 2024
Contextual Document Embeddings
Contextual Document Embeddings
John X. Morris
Alexander M. Rush
19
7
0
03 Oct 2024
Better Instruction-Following Through Minimum Bayes Risk
Better Instruction-Following Through Minimum Bayes Risk
Ian Wu
Patrick Fernandes
Amanda Bertsch
Seungone Kim
Sina Pakazad
Graham Neubig
48
9
0
03 Oct 2024
ASAG2024: A Combined Benchmark for Short Answer Grading
ASAG2024: A Combined Benchmark for Short Answer Grading
Gérôme Meyer
Philip Breuer
Jonathan Fürst
ELM
19
1
0
27 Sep 2024
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset
  Comparison
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
Judy Hanwen Shen
Archit Sharma
Jun Qin
42
4
0
15 Sep 2024
Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks
  against RAG-based Inference in Scale and Severity Using Jailbreaking
Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking
Stav Cohen
Ron Bitton
Ben Nassi
44
4
0
12 Sep 2024
Ruri: Japanese General Text Embeddings
Ruri: Japanese General Text Embeddings
Hayato Tsukagoshi
Ryohei Sasano
24
1
0
12 Sep 2024
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding
  Models
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models
Michael Gunther
Isabelle Mohr
Daniel James Williams
Bo Wang
Han Xiao
27
9
0
07 Sep 2024
Understanding Generative AI Content with Embedding Models
Understanding Generative AI Content with Embedding Models
Max Vargas
Reilly Cannon
A. Engel
Anand D. Sarwate
Tony Chiang
52
3
0
19 Aug 2024
A New Pipeline For Generating Instruction Dataset via RAG and Self
  Fine-Tuning
A New Pipeline For Generating Instruction Dataset via RAG and Self Fine-Tuning
Chih-Wei Song
Yu-Kai Lee
Yin-Te Tsai
SyDa
ALM
32
4
0
12 Aug 2024
BioRAG: A RAG-LLM Framework for Biological Question Reasoning
BioRAG: A RAG-LLM Framework for Biological Question Reasoning
Chengrui Wang
Qingqing Long
Meng Xiao
Xunxin Cai
Chengjun Wu
Zhen Meng
Xuezhi Wang
Yuanchun Zhou
44
26
0
02 Aug 2024
mGTE: Generalized Long-Context Text Representation and Reranking Models
  for Multilingual Text Retrieval
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval
Xin Zhang
Yanzhao Zhang
Dingkun Long
Wen Xie
Ziqi Dai
...
Pengjun Xie
Fei Huang
Meishan Zhang
Wenjie Li
Min Zhang
42
73
0
29 Jul 2024
Enhancing Code Translation in Language Models with Few-Shot Learning via
  Retrieval-Augmented Generation
Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation
Manish Bhattarai
Javier E. Santos
Shawn Jones
Ayan Biswas
Boian Alexandrov
Dan O’Malley
37
9
0
29 Jul 2024
Embedding And Clustering Your Data Can Improve Contrastive Pretraining
Embedding And Clustering Your Data Can Improve Contrastive Pretraining
Luke Merrick
13
3
0
26 Jul 2024
APTNESS: Incorporating Appraisal Theory and Emotion Support Strategies
  for Empathetic Response Generation
APTNESS: Incorporating Appraisal Theory and Emotion Support Strategies for Empathetic Response Generation
Yuxuan Hu
Minghuan Tan
Chenwei Zhang
Zixuan Li
Xiaodan Liang
Min Yang
Chengming Li
Xiping Hu
27
1
0
23 Jul 2024
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Peng-Tao Xu
Ming-Yu Liu
Xianchao Wu
Zihan Liu
M. Shoeybi
Mohammad Shoeybi
Bryan Catanzaro
RALM
52
14
0
19 Jul 2024
12
Next