> For the complete documentation index, see [llms.txt](https://paper.lingyunyang.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023.md).

# 2023

- [HexGen: Generative inference of foundation model over heterogeneous decentralized environment](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023/hexgen.md)
- [High-throughput generative inference of large language models with a single GPU](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023/flexgen.md): An offloading framework for high-throughput LLM inference.