ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.01116
  4. Cited By
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora
  with Web Data, and Web Data Only

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

1 June 2023
Guilherme Penedo
Quentin Malartic
Daniel Hesslow
Ruxandra-Aimée Cojocaru
Alessandro Cappelli
Hamza Alobeidli
B. Pannier
Ebtesam Almazrouei
Julien Launay
ArXivPDFHTML

Papers citing "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only"

50 / 587 papers shown
Title
Universal Vulnerabilities in Large Language Models: Backdoor Attacks for
  In-context Learning
Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
Shuai Zhao
Meihuizi Jia
Anh Tuan Luu
Fengjun Pan
Jinming Wen
AAML
31
36
0
11 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
63
57
0
11 Jan 2024
TeleChat Technical Report
TeleChat Technical Report
Zhongjiang He
Zihan Wang
Xinzhan Liu
Shixuan Liu
Yitong Yao
...
Zilu Huang
Sishi Xiong
Yuxiang Zhang
Chao Wang
Shuangyong Song
AI4MH
LRM
ALM
66
3
0
08 Jan 2024
LightHouse: A Survey of AGI Hallucination
LightHouse: A Survey of AGI Hallucination
Feng Wang
LRM
HILM
VLM
32
3
0
08 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion
  Recognition
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Guoying Zhao
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Bin Liu
Jianhua Tao
26
12
0
07 Jan 2024
PIXAR: Auto-Regressive Language Modeling in Pixel Space
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Yintao Tai
Xiyang Liao
Alessandro Suglia
Antonio Vergari
MLLM
26
7
0
06 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
309
0
05 Jan 2024
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese
  Prompt-Based Task
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
Gabriel Lino Garcia
P. H. Paiola
Luis Henrique Morelli
Giovani Candido
Arnaldo Cândido Júnior
D. Jodas
Luis C. S. Afonso
I. R. Guilherme
B. Penteado
João Paulo Papa
24
11
0
05 Jan 2024
PLLaMa: An Open-source Large Language Model for Plant Science
PLLaMa: An Open-source Large Language Model for Plant Science
Xianjun Yang
Junfeng Gao
Wenxin Xue
Erik Alexandersson
38
19
0
03 Jan 2024
Quokka: An Open-source Large Language Model ChatBot for Material Science
Quokka: An Open-source Large Language Model ChatBot for Material Science
Xianjun Yang
Stephen D. Wilson
Linda R. Petzold
OSLM
37
2
0
02 Jan 2024
Taking the Next Step with Generative Artificial Intelligence: The
  Transformative Role of Multimodal Large Language Models in Science Education
Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education
Arne Bewersdorff
Christian Hartmann
Marie Hornberger
Kathrin Seßler
Maria Bannert
Enkelejda Kasneci
Gjergji Kasneci
Xiaoming Zhai
Claudia Nerdel
29
29
0
01 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document
  understanding
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
22
52
0
31 Dec 2023
Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of
  LLMs
Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs
Shaojie Zhu
Zhaobin Wang
Chengxiang Zhuo
Hui Lu
Bo Hu
Zang Li
LRM
32
0
0
29 Dec 2023
LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner
  States Analysis
LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis
Jinwen He
Yujia Gong
Kai-xiang Chen
Zijin Lin
Chengán Wei
Yue Zhao
32
3
0
27 Dec 2023
PersianLLaMA: Towards Building First Persian Large Language Model
PersianLLaMA: Towards Building First Persian Large Language Model
Mohammad Amin Abbasi
A. Ghafouri
Mahdi Firouzmandi
Hassan Naderi
B. Minaei-Bidgoli
27
9
0
25 Dec 2023
EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models
  with Semi-structured Data
EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data
Shirong Ma
Shen Huang
Shulin Huang
Xiaobin Wang
Yangning Li
Hai-Tao Zheng
Pengjun Xie
Fei Huang
Yong-jia Jiang
48
6
0
25 Dec 2023
YAYI 2: Multilingual Open-Source Large Language Models
YAYI 2: Multilingual Open-Source Large Language Models
Yin Luo
Qingchao Kong
Nan Xu
Jia Cao
Bao Hao
...
Zhaoxin Yu
Zhengda Luo
Wenji Mao
Lei Wang
Dajun Zeng
ALM
OSLM
45
7
0
22 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
176
943
0
21 Dec 2023
Fine-tuning Large Language Models for Adaptive Machine Translation
Fine-tuning Large Language Models for Adaptive Machine Translation
Yasmin Moslem
Rejwanul Haque
Andy Way
28
25
0
20 Dec 2023
Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is
  Needed?
Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?
Tannon Kew
Florian Schottmann
Rico Sennrich
LRM
34
36
0
20 Dec 2023
ALMANACS: A Simulatability Benchmark for Language Model Explainability
ALMANACS: A Simulatability Benchmark for Language Model Explainability
Edmund Mills
Shiye Su
Stuart J. Russell
Scott Emmons
56
7
0
20 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
48
29
0
19 Dec 2023
Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation
  in low-data regimes
Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes
Nabeel Seedat
Nicolas Huynh
B. V. Breugel
M. Schaar
30
25
0
19 Dec 2023
Paloma: A Benchmark for Evaluating Language Model Fit
Paloma: A Benchmark for Evaluating Language Model Fit
Ian H. Magnusson
Akshita Bhagia
Valentin Hofmann
Luca Soldaini
A. Jha
...
Iz Beltagy
Hanna Hajishirzi
Noah A. Smith
Kyle Richardson
Jesse Dodge
134
21
0
16 Dec 2023
WordScape: a Pipeline to extract multilingual, visually rich Documents
  with Layout Annotations from Web Crawl Data
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data
Maurice Weber
Carlo Siebenschuh
Rory Butler
Anton Alexandrov
Valdemar Thanner
...
Haris Jabbar
Ian Foster
Bo-wen Li
Rick L. Stevens
Ce Zhang
21
4
0
15 Dec 2023
LLM-MARS: Large Language Model for Behavior Tree Generation and
  NLP-enhanced Dialogue in Multi-Agent Robot Systems
LLM-MARS: Large Language Model for Behavior Tree Generation and NLP-enhanced Dialogue in Multi-Agent Robot Systems
Artem Lykov
Maria Dronova
Nikolay Naglov
Mikhail Litvinov
Sergei Satsevich
Artem Bazhenov
Vladimir Berman
Aleksei Shcherbak
Dzmitry Tsetserukou
LLMAG
LM&Ro
29
14
0
14 Dec 2023
LLM360: Towards Fully Transparent Open-Source LLMs
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
Willie Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric Xing
49
70
0
11 Dec 2023
NLLG Quarterly arXiv Report 09/23: What are the most influential current
  AI Papers?
NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Ran Zhang
Aida Kostikova
Christoph Leiter
Jonas Belouadi
Daniil Larionov
Yanran Chen
Vivian Fresen
Steffen Eger
42
0
0
09 Dec 2023
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion
  Recognition
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition
Zheng Lian
Guoying Zhao
Haiyang Sun
Kang Chen
Zhuofan Wen
Hao Gu
Bin Liu
Jianhua Tao
31
29
0
07 Dec 2023
Towards Measuring Representational Similarity of Large Language Models
Towards Measuring Representational Similarity of Large Language Models
Max Klabunde
Mehdi Ben Amor
Michael Granitzer
Florian Lemmerich
42
2
0
05 Dec 2023
Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability,
  Explainability, and Safety
Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety
Manas Gaur
Amit P. Sheth
26
17
0
05 Dec 2023
A Survey on Large Language Model (LLM) Security and Privacy: The Good,
  the Bad, and the Ugly
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
44
476
0
04 Dec 2023
Zero- and Few-Shots Knowledge Graph Triplet Extraction with Large
  Language Models
Zero- and Few-Shots Knowledge Graph Triplet Extraction with Large Language Models
Andrea Papaluca
Daniel Krefl
Sergio Mendez Rodriguez
Artem Lenskiy
Hanna Suominen
26
2
0
04 Dec 2023
Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural
  Scrambled Text
Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text
Qi Cao
Takeshi Kojima
Yutaka Matsuo
Yusuke Iwasawa
17
18
0
30 Nov 2023
Zero-shot Conversational Summarization Evaluations with small Large
  Language Models
Zero-shot Conversational Summarization Evaluations with small Large Language Models
R. Manuvinakurike
Saurav Sahay
Sangeeta Manepalli
L. Nachman
ELM
LM&MA
30
0
0
29 Nov 2023
Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned
  Language Model
Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model
Yen-Ting Lin
Yun-Nung Chen
40
20
0
29 Nov 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
  Catching up?
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Chenyu You
ELM
CLL
AI4MH
LRM
ALM
85
27
0
28 Nov 2023
The Claire French Dialogue Dataset
The Claire French Dialogue Dataset
Julie Hunter
Jérôme Louradour
Virgile Rennard
Ismail Harrando
Guokan Shang
Jean-Pierre Lorré
29
1
0
28 Nov 2023
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware
  Direct Preference Optimization
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
Zhiyuan Zhao
Bin Wang
Linke Ouyang
Xiao-wen Dong
Jiaqi Wang
Conghui He
MLLM
VLM
32
106
0
28 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
67
21
0
28 Nov 2023
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Zeming Chen
Alejandro Hernández Cano
Angelika Romanou
Antoine Bonnet
Kyle Matoba
...
Axel Marmet
Syrielle Montariol
Mary-Anne Hartley
Martin Jaggi
Antoine Bosselut
LM&MA
AI4MH
MedIm
53
179
0
27 Nov 2023
YUAN 2.0: A Large Language Model with Localized Filtering-based
  Attention
YUAN 2.0: A Large Language Model with Localized Filtering-based Attention
Shaohua Wu
Xudong Zhao
Shenling Wang
Jiangang Luo
Lingjun Li
...
Wei Wang
Tong Yu
Rongguo Zhang
Jiahua Zhang
Chao Wang
OSLM
53
6
0
27 Nov 2023
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage
  and Sharing in LLMs
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
Yunxin Li
Baotian Hu
Wei Wang
Xiaochun Cao
Min Zhang
24
4
0
27 Nov 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
180
1,029
0
25 Nov 2023
PrivateLoRA For Efficient Privacy Preserving LLM
PrivateLoRA For Efficient Privacy Preserving LLM
Yiming Wang
Yu Lin
Xiaodong Zeng
Guannan Zhang
66
11
0
23 Nov 2023
Oasis: Data Curation and Assessment System for Pretraining of Large
  Language Models
Oasis: Data Curation and Assessment System for Pretraining of Large Language Models
Tong Zhou
Yubo Chen
Pengfei Cao
Kang Liu
Jun Zhao
Shengping Liu
29
3
0
21 Nov 2023
AcademicGPT: Empowering Academic Research
AcademicGPT: Empowering Academic Research
Shufa Wei
Xiaolong Xu
Xianbiao Qi
Xi Yin
Jun Xia
...
Chihao Dai
Lihua Wang
Xiaohui Liu
Lei Zhang
Yutao Xie
LM&MA
47
3
0
21 Nov 2023
MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and
  Classification
MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and Classification
Chadi Helwe
Tom Calamai
Pierre-Henri Paris
Chloé Clavel
Fabian M. Suchanek
23
1
0
16 Nov 2023
P^3SUM: Preserving Author's Perspective in News Summarization with
  Diffusion Language Models
P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models
Yuhan Liu
Shangbin Feng
Xiaochuang Han
Vidhisha Balachandran
Chan Young Park
Sachin Kumar
Yulia Tsvetkov
DiffM
44
2
0
16 Nov 2023
You don't need a personality test to know these models are unreliable:
  Assessing the Reliability of Large Language Models on Psychometric
  Instruments
You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments
Bangzhao Shu
Lechen Zhang
Minje Choi
Lavinia Dunagan
Lajanugen Logeswaran
Moontae Lee
Dallas Card
David Jurgens
27
33
0
16 Nov 2023
Previous
123...101112789
Next