> For the complete documentation index, see [llms.txt](https://paper.lingyunyang.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://paper.lingyunyang.com/reading-notes/miscellaneous.md).

# Miscellaneous

- [arXiv](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv.md): A free distribution service and an open-access archive.
- [2024](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2024.md)
- [Efficiently programming large language models using SGLang](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2024/sglang.md): LLM Inference
- [2023](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023.md)
- [HexGen: Generative inference of foundation model over heterogeneous decentralized environment](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023/hexgen.md)
- [High-throughput generative inference of large language models with a single GPU](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023/flexgen.md): An offloading framework for high-throughput LLM inference.
- [2022](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022.md)
- [DisaggRec: Architecting disaggregated systems for large-scale personalized recommendation](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/disaggrec.md): #deep\_learning\_recommender\_system #memory\_disaggregation #total\_cost\_of\_ownership #RDMA
- [A case for disaggregation of ML data processing](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/tf-data.md)
- [Singularity: Planet-scale, preemptive and elastic scheduling of AI workloads](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/singularity.md): Live GPU job migration.
- [Aryl: An elastic cluster scheduler for deep learning](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/aryl.md)
- [2016](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016.md)
- [Wide & deep learning for recommender systems](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016/wide-and-deep-learning-for-recommender-systems.md): A recommender system with a wide & deep model (WDL).
- [Training deep nets with sublinear memory cost](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016/training-deep-nets-with-sublinear-memory-cost.md): Reduce memory cost to store intermediate results and gradients.
- [MSR Technical Report](https://paper.lingyunyang.com/reading-notes/miscellaneous/msr-technical-report.md)
- [2011](https://paper.lingyunyang.com/reading-notes/miscellaneous/msr-technical-report/2011.md)
- [Heuristics for vector bin packing](https://paper.lingyunyang.com/reading-notes/miscellaneous/msr-technical-report/2011/heuristics-for-vector-bin-packing.md)