ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.07790
44
0

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

14 August 2024
Seung Hyun Lee
Junjie Ke
Yinxiao Li
Junfeng He
Steven Hickson
Katie Datsenko
Sangpil Kim
Ming Yang
Irfan Essa
Sangpil Kim
Ming-Hsuan Yang
Irfan Essa
Feng Yang
    VLM
ArXivPDFHTML
Abstract

The goal of image cropping is to identify visually appealing crops in an image. Conventional methods are trained on specific datasets and fail to adapt to new requirements. Recent breakthroughs in large vision-language models (VLMs) enable visual in-context learning without explicit training. However, downstream tasks with VLMs remain under explored. In this paper, we propose an effective approach to leverage VLMs for image cropping. First, we propose an efficient prompt retrieval mechanism for image cropping to automate the selection of in-context examples. Second, we introduce an iterative refinement strategy to iteratively enhance the predicted crops. The proposed framework, we refer to as Cropper, is applicable to a wide range of cropping tasks, including free-form cropping, subject-aware cropping, and aspect ratio-aware cropping. Extensive experiments demonstrate that Cropper significantly outperforms state-of-the-art methods across several benchmarks.

View on arXiv
@article{lee2025_2408.07790,
  title={ Cropper: Vision-Language Model for Image Cropping through In-Context Learning },
  author={ Seung Hyun Lee and Jijun Jiang and Yiran Xu and Zhuofang Li and Junjie Ke and Yinxiao Li and Junfeng He and Steven Hickson and Katie Datsenko and Sangpil Kim and Ming-Hsuan Yang and Irfan Essa and Feng Yang },
  journal={arXiv preprint arXiv:2408.07790},
  year={ 2025 }
}
Comments on this paper