ICML 2024
Meta Info
Homepage: https://icml.cc/Conferences/2024
Papers
Large Language Models (LLMs)
Serving LLMs
HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment [Personal Notes] [arXiv] [Code]
HKUST & ETH & CMU
Support asymmetric tensor model parallelism and pipeline parallelism under the heterogeneous setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree).
Propose a heuristic-based evolutionary algorithm to search for the optimal layout.
APIServe: Efficient API Support for Large-Language Model Inferencing [arXiv]
UCSD
Speculative decoding
Online Speculative Decoding [arXiv]
UC Berkeley & UCSD & Sisu Data & SJTU
Video generation
References
Last updated