Efficient LLM Training and Serving with Heterogeneous Context Sharding
  among Attention Heads

Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

Papers citing "Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads"