ICML 2024
Last updated
Was this helpful?
Last updated
Was this helpful?
Homepage:
Serving LLMs
HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment [] [] []
HKUST & ETH & CMU
Support asymmetric tensor model parallelism and pipeline parallelism under the heterogeneous setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree).
Propose a heuristic-based evolutionary algorithm to search for the optimal layout.
MuxServe: Flexible Spatial-Temporal Multiplexing for LLM Serving [] []
CUHK & Shanghai AI Lab & HUST & SJTU & PKU & UC Berkeley & UCSD
Colocate LLMs considering their popularity to multiplex memory resources.
APIServe: Efficient API Support for Large-Language Model Inferencing []
UCSD
Benchmark
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference [] []
UC Berkeley
Speculative decoding
Online Speculative Decoding []
UC Berkeley & UCSD & Sisu Data & SJTU
Video generation
VideoPoet: A Large Language Model for Zero-Shot Video Generation [] []
Google & CMU
Employ a decoder-only transformer architecture that processes multimodal inputs – including images, videos, text, and audio.
The pre-trained LLM is adapted to a range of video generation tasks.
Image retrieval
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions [] [] []
OSU & Google DeepMind
Enable multimodality-to-image, image-to-image, and text-to-image retrieval.