githubEdit

ICML 2024

Meta Info

Homepage: https://icml.cc/Conferences/2024arrow-up-right

Papers

Large Language Models (LLMs)

  • Serving LLMs

    • HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment [Personal Notes] [arXivarrow-up-right] [Codearrow-up-right]

      • HKUST & ETH & CMU

        • Support asymmetric tensor model parallelism and pipeline parallelism under the heterogeneous setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree).

          • Propose a heuristic-based evolutionary algorithm to search for the optimal layout.

    • MuxServe: Flexible Spatial-Temporal Multiplexing for LLM Serving [arXivarrow-up-right] [Codearrow-up-right]

      • CUHK & Shanghai AI Lab & HUST & SJTU & PKU & UC Berkeley & UCSD

      • Colocate LLMs considering their popularity to multiplex memory resources.

    • APIServe: Efficient API Support for Large-Language Model Inferencing [arXivarrow-up-right]

      • UCSD

  • Benchmark

  • Speculative decoding

  • Video generation

    • VideoPoet: A Large Language Model for Zero-Shot Video Generation [Paperarrow-up-right] [Homepagearrow-up-right]

      • Google & CMU

      • Employ a decoder-only transformer architecture that processes multimodal inputs – including images, videos, text, and audio.

      • The pre-trained LLM is adapted to a range of video generation tasks.

  • Image retrieval

References

Last updated