ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.09509
25
8

PETCI: A Parallel English Translation Dataset of Chinese Idioms

19 February 2022
Kenan Tang
ArXivPDFHTML
Abstract

Idioms are an important language phenomenon in Chinese, but idiom translation is notoriously hard. Current machine translation models perform poorly on idiom translation, while idioms are sparse in many translation datasets. We present PETCI, a parallel English translation dataset of Chinese idioms, aiming to improve idiom translation by both human and machine. The dataset is built by leveraging human and machine effort. Baseline generation models show unsatisfactory abilities to improve translation, but structure-aware classification models show good performance on distinguishing good translations. Furthermore, the size of PETCI can be easily increased without expertise. Overall, PETCI can be helpful to language learners and machine translation systems.

View on arXiv
Comments on this paper