Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning

28 March 2025

Abstract

Unsupervised representation learning has been widely explored across various modalities, including neural architectures, where it plays a key role in downstream applications like Neural Architecture Search (NAS). These methods typically learn an unsupervised representation space before generating/ sampling architectures for the downstream search. A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space, however, sampling from these spaces often leads to a high percentage of invalid or duplicate neural architectures. This could be due to the unnatural mapping of inherently discrete architectural space onto a continuous space, which emphasizes the need for a robust discrete representation of these architectures. To address this, we introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures. In contrast to VAEs, VQ-VAEs (i) map each architecture into a discrete code sequence and (ii) allow the prior to be learned by any generative model rather than assuming a normal distribution. We then represent these architecture latent codes as numerical sequences and train a text-to-text model leveraging a Large Language Model to learn and generate sequences representing architectures. We experiment our method with Inception/ ResNet-like cell-based search spaces, namely NAS-Bench-101 and NAS-Bench-201. Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201. Finally, we demonstrate the applicability of our method in NAS employing a sequence-modeling-based NAS algorithm.

View on arXiv

@article{poddenige2025_2503.22063,
  title={ Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning },
  author={ Deshani Geethika Poddenige and Sachith Seneviratne and Damith Senanayake and Mahesan Niranjan and PN Suganthan and Saman Halgamuge },
  journal={arXiv preprint arXiv:2503.22063},
  year={ 2025 }
}

Comments on this paper