ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.02344
  4. Cited By
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
v1v2 (latest)

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

4 November 2024
Md Rifat Arefin
G. Subbaraj
Nicolas Angelard-Gontier
Yann LeCun
Irina Rish
Ravid Shwartz-Ziv
C. Pal
    LRM
ArXiv (abs)PDFHTML

Papers citing "Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning"

27 / 27 papers shown
Title
Does Representation Matter? Exploring Intermediate Layers in Large
  Language Models
Does Representation Matter? Exploring Intermediate Layers in Large Language Models
Oscar Skean
Md Rifat Arefin
Yann LeCun
Ravid Shwartz-Ziv
134
12
0
12 Dec 2024
Dissecting Multiplication in Transformers: Insights into LLMs
Dissecting Multiplication in Transformers: Insights into LLMs
Luyu Qiu
Jianing Li
Chi Su
C. Zhang
Lei Chen
81
3
0
22 Jul 2024
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by
  Step
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
Yuntian Deng
Yejin Choi
Stuart M. Shieber
ReLMLRM
88
76
0
23 May 2024
Let's Think Dot by Dot: Hidden Computation in Transformer Language
  Models
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
Jacob Pfau
William Merrill
Samuel R. Bowman
LRM
100
83
0
24 Apr 2024
The Impact of Reasoning Step Length on Large Language Models
The Impact of Reasoning Step Length on Large Language Models
Mingyu Jin
Qinkai Yu
Dong Shu
Haiyan Zhao
Wenyue Hua
Yanda Meng
Yongfeng Zhang
Jundong Li
ReLMLRM
180
113
0
10 Jan 2024
The Linear Representation Hypothesis and the Geometry of Large Language
  Models
The Linear Representation Hypothesis and the Geometry of Large Language Models
Kiho Park
Yo Joong Choe
Victor Veitch
LLMSVMILM
176
190
0
07 Nov 2023
Implicit Chain of Thought Reasoning via Knowledge Distillation
Implicit Chain of Thought Reasoning via Knowledge Distillation
Yuntian Deng
Kiran Prasad
Roland Fernandez
P. Smolensky
Vishrav Chaudhary
Stuart M. Shieber
ReLMLRM
74
52
0
02 Nov 2023
Think before you speak: Training Language Models With Pause Tokens
Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal
Ziwei Ji
A. S. Rawat
A. Menon
Sanjiv Kumar
Vaishnavh Nagarajan
LRM
111
122
0
03 Oct 2023
GPT Can Solve Mathematical Problems Without a Calculator
GPT Can Solve Mathematical Problems Without a Calculator
Zhiyong Yang
Ming Ding
Qingsong Lv
Zhihuan Jiang
Zehai He
Yuyi Guo
Jinfeng Bai
Jie Tang
RALMLRM
114
56
0
06 Sep 2023
Variance-Covariance Regularization Improves Representation Learning
Variance-Covariance Regularization Improves Representation Learning
Jiachen Zhu
Katrina Evtimova
Yubei Chen
Ravid Shwartz-Ziv
Yann LeCun
SSL
90
7
0
23 Jun 2023
Representational Strengths and Limitations of Transformers
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
76
93
0
05 Jun 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALMOffRLLRM
239
1,241
0
31 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
141
261
0
24 May 2023
DiME: Maximizing Mutual Information by a Difference of Matrix-Based
  Entropies
DiME: Maximizing Mutual Information by a Difference of Matrix-Based Entropies
Oscar Skean
J. Hoyos-Osorio
A. Brockmeier
L. S. Giraldo
80
13
0
19 Jan 2023
The Parallelism Tradeoff: Limitations of Log-Precision Transformers
The Parallelism Tradeoff: Limitations of Log-Precision Transformers
William Merrill
Ashish Sabharwal
108
116
0
02 Jul 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLMBDLLRMAI4CE
594
3,759
0
21 Mar 2022
Information Theory with Kernel Methods
Information Theory with Kernel Methods
Francis R. Bach
62
43
0
17 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
1.0K
9,813
0
28 Jan 2022
Understanding Dimensional Collapse in Contrastive Self-supervised
  Learning
Understanding Dimensional Collapse in Contrastive Self-supervised Learning
Li Jing
Pascal Vincent
Yann LeCun
Yuandong Tian
SSL
124
359
0
18 Oct 2021
VICReg: Variance-Invariance-Covariance Regularization for
  Self-Supervised Learning
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
Adrien Bardes
Jean Ponce
Yann LeCun
SSLDML
206
946
0
11 May 2021
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Jure Zbontar
Li Jing
Ishan Misra
Yann LeCun
Stéphane Deny
SSL
362
2,377
0
04 Mar 2021
Bootstrap your own latent: A new approach to self-supervised Learning
Bootstrap your own latent: A new approach to self-supervised Learning
Jean-Bastien Grill
Florian Strub
Florent Altché
Corentin Tallec
Pierre Harvey Richemond
...
M. G. Azar
Bilal Piot
Koray Kavukcuoglu
Rémi Munos
Michal Valko
SSL
484
6,871
0
13 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.1K
42,712
0
28 May 2020
COMET: Commonsense Transformers for Automatic Knowledge Graph
  Construction
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Antoine Bosselut
Hannah Rashkin
Maarten Sap
Chaitanya Malaviya
Asli Celikyilmaz
Yejin Choi
84
914
0
12 Jun 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.9K
95,604
0
11 Oct 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
976
133,429
0
12 Jun 2017
Measures of Entropy from Data Using Infinitely Divisible Kernels
Measures of Entropy from Data Using Infinitely Divisible Kernels
Luis G. Sanchez Giraldo
M. Rao
José C. Príncipe
119
128
0
11 Nov 2012
1