LLM-BT: Back-Translation as a Framework for Terminology Standardization and Dynamic Semantic Embedding
The rapid growth of English technical terms challenges traditional expert-driven standardization, especially in fast-evolving fields like AI and quantum computing. Manual methods struggle to ensure multilingual consistency. We propose \textbf{LLM-BT}, a back-translation framework powered by large language models (LLMs) to automate terminology verification and standardization via cross-lingual semantic alignment. Our contributions are: \textbf{(1) Term-Level Consistency Validation:} Using English intermediate language English back-translation, LLM-BT achieves high term consistency across models (e.g., GPT-4, DeepSeek, Grok), with case studies showing over 90\% exact or semantic matches. \textbf{(2) Multi-Path Verification Workflow:} A novel ``Retrieve--Generate--Verify--Optimize'' pipeline integrates serial (e.g., EN ZHcn ZHtw EN) and parallel (e.g., EN Chinese/Portuguese EN) BT routes. BLEU and term accuracy indicate strong cross-lingual robustness (BLEU 0.45; Portuguese accuracy 100\%). \textbf{(3) Back-Translation as Semantic Embedding:} BT is conceptualized as dynamic semantic embedding, revealing latent meaning trajectories. Unlike static embeddings, LLM-BT provides transparent path-based embeddings shaped by model evolution. LLM-BT transforms back-translation into an active engine for multilingual terminology standardization, enabling human--AI collaboration: machines ensure semantic fidelity, humans guide cultural interpretation. This infrastructure supports terminology governance across scientific and technological fields worldwide.
View on arXiv@article{weigang2025_2506.08174, title={ LLM-BT-Terms: Back-Translation as a Framework for Terminology Standardization and Dynamic Semantic Embedding }, author={ Li Weigang and Pedro Carvalho Brom }, journal={arXiv preprint arXiv:2506.08174}, year={ 2025 } }