Innamark: A Whitespace Replacement Information-Hiding Method

18 February 2025

Abstract

Large language models (LLMs) have gained significant popularity in recent years. Differentiating between a text written by a human and one generated by an LLM has become almost impossible. Information-hiding techniques such as digital watermarking or steganography can help by embedding information inside text in a form that is unlikely to be noticed. However, existing techniques, such as linguistic-based or format-based methods, change the semantics or cannot be applied to pure, unformatted text. In this paper, we introduce a novel method for information hiding called Innamark, which can conceal any byte-encoded sequence within a sufficiently long cover text. This method is implemented as a multi-platform library using the Kotlin programming language, which is accompanied by a command-line tool and a web interface. By substituting conventional whitespace characters with visually similar Unicode whitespace characters, our proposed scheme preserves the semantics of the cover text without changing the number of characters. Furthermore, we propose a specified structure for secret messages that enables configurable compression, encryption, hashing, and error correction. An experimental benchmark comparison on a dataset of 1000000 Wikipedia articles compares ten algorithms. The results demonstrate the robustness of our proposed Innamark method in various applications and the imperceptibility of its watermarks to humans. We discuss the limits to the embedding capacity and robustness of the algorithm and how these could be addressed in future work.

View on arXiv

@article{hellmeier2025_2502.12710,
  title={ Innamark: A Whitespace Replacement Information-Hiding Method },
  author={ Malte Hellmeier and Hendrik Norkowski and Ernst-Christoph Schrewe and Haydar Qarawlus and Falk Howar },
  journal={arXiv preprint arXiv:2502.12710},
  year={ 2025 }
}

Comments on this paper