ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.11312
17
0

APE-then-QE: Correcting then Filtering Pseudo Parallel Corpora for MT Training Data Creation

18 December 2023
Akshay Batheja
S. Deoghare
Diptesh Kanojia
Pushpak Bhattacharyya
ArXivPDFHTML
Abstract

Automatic Post-Editing (APE) is the task of automatically identifying and correcting errors in the Machine Translation (MT) outputs. We propose a repair-filter-use methodology that uses an APE system to correct errors on the target side of the MT training data. We select the sentence pairs from the original and corrected sentence pairs based on the quality scores computed using a Quality Estimation (QE) model. To the best of our knowledge, this is a novel adaptation of APE and QE to extract quality parallel corpus from the pseudo-parallel corpus. By training with this filtered corpus, we observe an improvement in the Machine Translation system's performance by 5.64 and 9.91 BLEU points, for English-Marathi and Marathi-English, over the baseline model. The baseline model is the one that is trained on the whole pseudo-parallel corpus. Our work is not limited by the characteristics of English or Marathi languages; and is language pair-agnostic, given the necessary QE and APE data.

View on arXiv
Comments on this paper