ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.01846
  4. Cited By
KptLLM: Unveiling the Power of Large Language Model for Keypoint
  Comprehension

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

4 November 2024
Jie Yang
Wang Zeng
Sheng Jin
Lumin Xu
Wentao Liu
Chen Qian
Ruimao Zhang
    MLLM
ArXiv (abs)PDFHTML

Papers citing "KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension"

46 / 46 papers shown
Title
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
Yawen Shao
Wei-dong Zhai
Yuhang Yang
Hongchen Luo
Yang Cao
Zheng-jun Zha
152
1
0
29 Nov 2024
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
Jianzhu Guo
Dingyun Zhang
Xiaoqiang Liu
Zhizhou Zhong
Yuan Zhang
Pengfei Wan
Di Zhang
VGen
120
68
0
03 Jul 2024
What matters when building vision-language models?
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
103
177
0
03 May 2024
PerceptionGPT: Effectively Fusing Visual Perception into LLM
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
75
36
0
11 Nov 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language
  Models
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLMVLM
72
39
0
13 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjDMLLMVLM
113
328
0
11 Oct 2023
Neural Interactive Keypoint Detection
Neural Interactive Keypoint Detection
Jie Yang
Ailing Zeng
Feng Li
Siyi Liu
Ruimao Zhang
Lei Zhang
3DV3DH
69
14
0
20 Aug 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
154
238
0
07 Jul 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
121
764
0
26 Jun 2023
DetGPT: Detect What You Need via Reasoning
DetGPT: Detect What You Need via Reasoning
Renjie Pi
Jiahui Gao
Shizhe Diao
Boyao Wang
Hanze Dong
...
Lewei Yao
Jianhua Han
Hang Xu
Lingpeng Kong Tong Zhang
Tong Zhang
LRMLM&Ro
72
98
0
23 May 2023
Boosting Human-Object Interaction Detection with Text-to-Image Diffusion
  Model
Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model
Jie Yang
Bing Li
Fengyu Yang
Ailing Zeng
Lei Zhang
Ruimao Zhang
VLMDiffM
105
17
0
20 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
145
2,098
0
11 May 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLMMLLM
295
956
0
27 Apr 2023
HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image
  Generation
HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
Xu Ju
Ailing Zeng
Chenchen Zhao
Jianan Wang
Lei Zhang
Qian Xu
DiffM
72
92
0
09 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIPVLM
257
1,200
0
27 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,761
0
15 Mar 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
184
4,180
1
10 Feb 2023
Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation
Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation
Jie Yang
Ailing Zeng
Siyi Liu
Feng Li
Ruimao Zhang
Lei Zhang
88
58
0
03 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
432
4,656
0
30 Jan 2023
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,610
0
29 Apr 2022
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
92
538
0
26 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
535
6,301
0
05 Apr 2022
AP-10K: A Benchmark for Animal Pose Estimation in the Wild
AP-10K: A Benchmark for Animal Pose Estimation in the Wild
Hang Yu
Yufei Xu
Jing Zhang
Wei Zhao
Ziyu Guan
Dacheng Tao
86
113
0
28 Aug 2021
Human Pose Regression with Residual Log-likelihood Estimation
Human Pose Regression with Residual Log-likelihood Estimation
Jiefeng Li
Siyuan Bian
Ailing Zeng
Can Wang
Bo Pang
Wentao Liu
Cewu Lu
71
197
0
23 Jul 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
502
10,526
0
17 Jun 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.0K
29,926
0
26 Feb 2021
InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose
  Estimation from a Single RGB Image
InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image
Gyeongsik Moon
Shoou-I Yu
He Wen
Takaaki Shiratori
Kyoung Mu Lee
3DH
86
293
0
21 Aug 2020
Differentiable Hierarchical Graph Grouping for Multi-Person Pose
  Estimation
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation
Sheng Jin
Wentao Liu
Enze Xie
Wenhai Wang
Chao Qian
Wanli Ouyang
Ping Luo
3DH
94
126
0
23 Jul 2020
Whole-Body Human Pose Estimation in the Wild
Whole-Body Human Pose Estimation in the Wild
Sheng Jin
Lumin Xu
Jin Xu
Can Wang
Wentao Liu
Chao Qian
Wanli Ouyang
Ping Luo
3DH
204
247
0
23 Jul 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
904
42,463
0
28 May 2020
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human
  Pose Estimation
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
Bowen Cheng
Bin Xiao
Jingdong Wang
Humphrey Shi
Thomas S. Huang
Lei Zhang
3DH
81
676
0
27 Aug 2019
Single-Stage Multi-Person Pose Machines
Single-Stage Multi-Person Pose Machines
Xuecheng Nie
Jianfeng Zhang
Shuicheng Yan
Jiashi Feng
3DH
165
221
0
24 Aug 2019
Multi-person Articulated Tracking with Spatial and Temporal Embeddings
Multi-person Articulated Tracking with Spatial and Temporal Embeddings
Sheng Jin
Wentao Liu
Wanli Ouyang
Chao Qian
124
75
0
21 Mar 2019
Deep High-Resolution Representation Learning for Human Pose Estimation
Deep High-Resolution Representation Learning for Human Pose Estimation
Ke Sun
Bin Xiao
Dong Liu
Jingdong Wang
3DV
138
4,063
0
25 Feb 2019
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation,
  Segmentation and Re-Identification of Clothing Images
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
Yuying Ge
Ruimao Zhang
Lingyun Wu
Xiaogang Wang
Xiaoou Tang
Ping Luo
57
350
0
23 Jan 2019
CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark
CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark
Jiefeng Li
Can Wang
Hao Zhu
Yihuan Mao
Haoshu Fang
Cewu Lu
70
512
0
02 Dec 2018
Look at Boundary: A Boundary-Aware Face Alignment Algorithm
Look at Boundary: A Boundary-Aware Face Alignment Algorithm
Wayne Wu
Chao Qian
Shuo Yang
Quan Wang
Yici Cai
Qiang-feng Zhou
CVBM3DV
75
439
0
26 May 2018
Simple Baselines for Human Pose Estimation and Tracking
Simple Baselines for Human Pose Estimation and Tracking
Bin Xiao
Haiping Wu
Yichen Wei
3DHVOT
128
1,793
0
17 Apr 2018
Cascaded Pyramid Network for Multi-Person Pose Estimation
Cascaded Pyramid Network for Multi-Person Pose Estimation
Yilun Chen
Zhicheng Wang
Yuxiang Peng
Zhiqiang Zhang
Gang Yu
Jian Sun
144
1,428
0
20 Nov 2017
Compositional Human Pose Regression
Compositional Human Pose Regression
Xiao Sun
Jiaxiang Shang
Shuang Liang
Yichen Wei
3DH
104
531
0
01 Apr 2017
Prototypical Networks for Few-shot Learning
Prototypical Networks for Few-shot Learning
Jake C. Snell
Kevin Swersky
R. Zemel
305
8,164
0
15 Mar 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
833
11,961
0
09 Mar 2017
Stacked Hourglass Networks for Human Pose Estimation
Stacked Hourglass Networks for Human Pose Estimation
Alejandro Newell
Kaiyu Yang
Jia Deng
3DH
119
5,037
0
22 Mar 2016
Convolutional Pose Machines
Convolutional Pose Machines
S. Wei
V. Ramakrishna
T. Kanade
Yaser Sheikh
3DVSSL
112
2,749
0
30 Jan 2016
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
437
43,875
0
01 May 2014
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural Networks
Alexander Toshev
Christian Szegedy
3DH
186
2,781
0
17 Dec 2013
1