A Primer in BERTology: What we know about how BERT works

27 February 2020

Papers citing "A Primer in BERTology: What we know about how BERT works"

50 / 224 papers shown

Title
Jekyll-and-Hyde Tipping Point in an AI's Behavior Neil F. Johnson Frank Yingjie Huo 46 0 0 29 Apr 2025
Deep Learning with Pretrained Ínternal World' Layers: A Gemma 3-Based Modular Architecture for Wildfire Prediction Ayoub Jadouli Chaker El Amrani KELM AI4TS 81 0 0 20 Apr 2025
Statistical Deficiency for Task Inclusion Estimation Loïc Fosse Frédéric Béchet Benoit Favre Géraldine Damnati Gwénolé Lecorvé Maxime Darrin Philippe Formont Pablo Piantanida 136 0 0 07 Mar 2025
A Survey of Model Architectures in Information Retrieval Zhichao Xu Fengran Mo Zhiqi Huang Crystina Zhang Puxuan Yu Bei Wang Jimmy J. Lin Vivek Srikumar KELM 3DV 56 2 0 21 Feb 2025
Integrating Language Models for Enhanced Network State Monitoring in DRL-Based SFC Provisioning Parisa Fard Moshiri Murat Arda Onsu Poonam Lohan Burak Kantarci Emil Janulewicz 39 0 0 16 Feb 2025
The Geometry of Tokens in Internal Representations of Large Language Models Karthik Viswanathan Yuri Gardinazzi Giada Panerai Alberto Cazzaniga Matteo Biagetti AIFin 94 4 0 17 Jan 2025
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Michael Toker Ido Galil Hadas Orgad Rinon Gal Yoad Tewel Gal Chechik Yonatan Belinkov DiffM 54 2 0 12 Jan 2025
Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models Kushal Tatariya Vladimir Araujo Thomas Bauwens Miryam de Lhoneux VLM 33 0 0 15 Oct 2024
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs Tongtian Yue Longteng Guo Jie Cheng Xuange Gao J. Liu MoE 36 0 0 14 Oct 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages Nadav Borenstein Anej Svete R. Chan Josef Valvoda Franz Nowak Isabelle Augenstein Eleanor Chodroff Ryan Cotterell 42 11 0 06 Jun 2024
Exploring Multilingual Large Language Models for Enhanced TNM classification of Radiology Report in lung cancer staging Hidetoshi Matsuo Mizuho Nishio Takaaki Matsunaga Koji Fujimoto Takamichi Murakami LM&MA 42 5 0 05 Jun 2024
Standards for Belief Representations in LLMs Daniel A. Herrmann B. Levinstein 39 7 0 31 May 2024
Are queries and keys always relevant? A case study on Transformer wave functions Riccardo Rende Luciano Loris Viteritti 24 5 0 29 May 2024
PhilHumans: Benchmarking Machine Learning for Personal Health Vadim Liventsev Vivek Kumar Allmin Pradhap Singh Susaiyah Zixiu "Alex" Wu Ivan Rodin ... Milan Petkovic Diego Reforgiato Recupero Ehud Reiter Daniele Riboni Raymond Sterling AI4MH LM&MA 34 0 0 04 May 2024
ViTHSD: Exploiting Hatred by Targets for Hate Speech Detection on Vietnamese Social Media Texts Cuong Nhat Vo Khanh Bao Huynh Son T. Luu Trong-Hop Do 45 1 0 30 Apr 2024
Large language models and linguistic intentionality J. Grindrod 38 5 0 15 Apr 2024
Transformers for molecular property prediction: Lessons learned from the past five years Afnan Sultan Jochen Sieg M. Mathea Andrea Volkamer AI4CE 29 10 0 05 Apr 2024
CSEPrompts: A Benchmark of Introductory Computer Science Prompts Md. Nishat Raihan Dhiman Goswami Sadiya Sayara Chowdhury Puspo Christian D. Newman Tharindu Ranasinghe Marcos Zampieri ELM 41 2 0 03 Apr 2024
Toward Informal Language Processing: Knowledge of Slang in Large Language Models Zhewei Sun Qian Hu Rahul Gupta Richard Zemel Yang Xu 38 1 0 02 Apr 2024
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia Giovanni Monea Maxime Peyrard Martin Josifoski Vishrav Chaudhary Jason Eisner Emre Kiciman Hamid Palangi Barun Patra Robert West KELM 51 12 0 04 Dec 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing Michael A. Lepori Thomas Serre Ellie Pavlick 75 7 0 07 Nov 2023
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots Ruixiang Tang Jiayi Yuan Yiming Li Zirui Liu Rui Chen Xia Hu AAML 36 13 0 28 Oct 2023
Codebook Features: Sparse and Discrete Interpretability for Neural Networks Alex Tamkin Mohammad Taufeeque Noah D. Goodman 32 27 0 26 Oct 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models Morris Alper Hadar Averbuch-Elor 33 10 0 25 Oct 2023
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models Yifan Hou Jiaoda Li Yu Fei Alessandro Stolfo Wangchunshu Zhou Guangtao Zeng Antoine Bosselut Mrinmaya Sachan LRM 30 40 0 23 Oct 2023
Bridging Information-Theoretic and Geometric Compression in Language Models Emily Cheng Corentin Kervadec Marco Baroni 34 16 0 20 Oct 2023
The Temporal Structure of Language Processing in the Human Brain Corresponds to The Layered Hierarchy of Deep Language Models Ariel Goldstein Eric Ham Mariano Schain Samuel A. Nastase Zaid Zada ... Avinatan Hassidim O. Devinsky A. Flinker Omer Levy Uri Hasson AI4CE 15 10 0 11 Oct 2023
Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings Timothee Mickus Raúl Vázquez 20 2 0 10 Oct 2023
Recurrent Neural Language Models as Probabilistic Finite-state Automata Anej Svete Ryan Cotterell 32 2 0 08 Oct 2023
Language Models Represent Space and Time Wes Gurnee Max Tegmark 33 141 0 03 Oct 2023
BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP M. Kabir Mohammed Saidul Islam Md Tahmid Rahman Laskar Mir Tafseer Nayeem M Saiful Bari Enamul Hoque LM&MA 24 15 0 22 Sep 2023
Feature Engineering in Learning-to-Rank for Community Question Answering Task Nafis Sajid Md Rashidul Hasan Muhammad Ibrahim 21 3 0 14 Sep 2023
A Comparative Analysis of Pretrained Language Models for Text-to-Speech M. G. Moya Panagiota Karanasou S. Karlapati Bastian Schnell Nicole Peinelt Alexis Moinet Thomas Drugman 37 3 0 04 Sep 2023
A User-Centered Evaluation of Spanish Text Simplification Adrian de Wynter Anthony Hevia Si-Qing Chen 28 0 0 15 Aug 2023
Intelligent Assistant Language Understanding On Device Cecilia Aas Hisham Abdelsalam Irina Belousova Shruti Bhargava Jianpeng Cheng ... John Torr Marco Del Vecchio Jay Wacker Jason D. Williams Hong-ye Yu 13 2 0 07 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? Ari Holtzman Peter West Luke Zettlemoyer AI4CE 30 14 0 31 Jul 2023
Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity Katharina Hämmerl Alina Fastowski Jindrich Libovický Alexander M. Fraser 20 6 0 01 Jun 2023
A Method for Studying Semantic Construal in Grammatical Constructions with Interpretable Contextual Embedding Spaces Gabriella Chronis Kyle Mahowald K. Erk 18 8 0 29 May 2023
Plug-and-Play Document Modules for Pre-trained Models Chaojun Xiao Zhengyan Zhang Xu Han Chi-Min Chan Yankai Lin Zhiyuan Liu Xiangyang Li Zhonghua Li Zhao Cao Maosong Sun KELM 22 5 0 28 May 2023
Structural Ambiguity and its Disambiguation in Language Model Based Parsers: the Case of Dutch Clause Relativization G. Wijnholds M. Moortgat 10 3 0 24 May 2023
Automatic Readability Assessment for Closely Related Languages Joseph Marvin Imperial E. Kochmar 22 8 0 22 May 2023
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions Byung-Doh Oh William Schuler 29 2 0 17 May 2023
Explaining black box text modules in natural language with language models Chandan Singh Aliyah R. Hsu Richard Antonello Shailee Jain Alexander G. Huth Bin-Xia Yu Jianfeng Gao MILM 26 46 0 17 May 2023
Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space Filip Klubicka Vasudevan Nedumpozhimana John D. Kelleher 33 4 0 27 Apr 2023
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering T. M. Thai Son T. Luu 37 0 0 22 Mar 2023
An Overview on Language Models: Recent Developments and Outlook Chengwei Wei Yun Cheng Wang Bin Wang C.-C. Jay Kuo 25 42 0 10 Mar 2023
STA: Self-controlled Text Augmentation for Improving Text Classifications Congcong Wang Gonzalo Fiz Pontiveros Steven Derby Tri Kurniawan Wijaya 40 3 0 24 Feb 2023
A Scalable Space-efficient In-database Interpretability Framework for Embedding-based Semantic SQL Queries P. Kudva R. Bordawekar Apoorva Nitsure 12 0 0 23 Feb 2023
Mask-guided BERT for Few Shot Text Classification Wenxiong Liao Zheng Liu Haixing Dai Zihao Wu Yiyang Zhang ... Dajiang Zhu Tianming Liu Sheng R. Li Xiang Li Hongmin Cai VLM 47 39 0 21 Feb 2023
Dynamic Named Entity Recognition Tristan Luiggi Laure Soulier Vincent Guigue Siwar Jendoubi Aurélien Baelde 28 0 0 16 Feb 2023