# ICML 2024

## Meta Info

Homepage: <https://icml.cc/Conferences/2024>

### Papers

### Large Language Models (LLMs)

* Serving LLMs
  * HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment \[[Personal Notes](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023/hexgen)] \[[arXiv](https://arxiv.org/abs/2311.11514)] \[[Code](https://github.com/Relaxed-System-Lab/HexGen)]
    * HKUST & ETH & CMU
      * Support *asymmetric* tensor model parallelism and pipeline parallelism under the *heterogeneous* setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree).
        * Propose *a heuristic-based evolutionary algorithm* to search for the optimal layout.
  * MuxServe: Flexible Spatial-Temporal Multiplexing for LLM Serving \[[arXiv](https://arxiv.org/abs/2404.02015)] \[[Code](https://github.com/hao-ai-lab/MuxServe)]
    * CUHK & Shanghai AI Lab & HUST & SJTU & PKU & UC Berkeley & UCSD
    * Colocate LLMs considering their popularity to multiplex memory resources.
  * APIServe: Efficient API Support for Large-Language Model Inferencing \[[arXiv](https://arxiv.org/abs/2402.01869)]
    * UCSD
* Benchmark
  * Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference \[[arXiv](https://www.google.com/url?sa=t\&source=web\&rct=j\&opi=89978449\&url=https://arxiv.org/abs/2403.04132\&ved=2ahUKEwinqvnbiruHAxWZmO4BHQAfAaMQFnoECAgQAQ\&usg=AOvVaw0xl2m0cvjY2iAKescRSm3P)] \[[Demo](https://chat.lmsys.org)]
    * UC Berkeley
* Speculative decoding
  * Online Speculative Decoding \[[arXiv](https://arxiv.org/abs/2310.07177)]
    * UC Berkeley & UCSD & Sisu Data & SJTU
* Video generation
  * VideoPoet: A Large Language Model for Zero-Shot Video Generation \[[Paper](https://proceedings.mlr.press/v235/kondratyuk24a.html)] \[[Homepage](https://sites.research.google/videopoet/)]
    * Google & CMU
    * Employ a decoder-only transformer architecture that processes multimodal inputs – including images, videos, text, and audio.
    * The pre-trained LLM is adapted to a range of video generation tasks.
* Image retrieval
  * MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions \[[Paper](https://proceedings.mlr.press/v235/zhang24an.html)] \[[Homepage](https://open-vision-language.github.io/MagicLens/)] \[[Code](https://github.com/google-deepmind/magiclens)]
    * OSU & Google DeepMind
    * Enable multimodality-to-image, image-to-image, and text-to-image retrieval.

## References

* [Google DeepMind at ICML 2024, 2024/07/19](https://deepmind.google/discover/blog/google-deepmind-at-icml-2024/)
