> For the complete documentation index, see [llms.txt](https://paper.lingyunyang.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://paper.lingyunyang.com/reading-notes.md).

# Reading Notes

- [Conference](https://paper.lingyunyang.com/reading-notes/conference.md)
- [SOSP 2026](https://paper.lingyunyang.com/reading-notes/conference/sosp-2026.md)
- [OSDI 2026](https://paper.lingyunyang.com/reading-notes/conference/osdi-2026.md)
- [SDCs in the Wild: Characterizing and Diagnosing SDC-defective GPUs in Production LLM Training](https://paper.lingyunyang.com/reading-notes/conference/osdi-2026/sdchunter.md)
- [Safeguarding LLM Training at Scale: Online SDC Detection and Insights from 35 Million GPU Hours](https://paper.lingyunyang.com/reading-notes/conference/osdi-2026/aegis.md)
- [ICML 2026](https://paper.lingyunyang.com/reading-notes/conference/icml-2026.md)
- [ISCA 2026](https://paper.lingyunyang.com/reading-notes/conference/isca-2026.md)
- [KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta](https://paper.lingyunyang.com/reading-notes/conference/isca-2026/kernelevolve.md): #kernel\_generation #heterogeneous\_accelerators #dlrm
- [CAIS 2026](https://paper.lingyunyang.com/reading-notes/conference/cais-2026.md)
- [MLSys 2026](https://paper.lingyunyang.com/reading-notes/conference/mlsys-2026.md)
- [NSDI 2026](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2026.md)
- [EuroSys 2026](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2026.md)
- [ASPLOS 2026](https://paper.lingyunyang.com/reading-notes/conference/asplos-2026.md)
- [FAST 2026](https://paper.lingyunyang.com/reading-notes/conference/fast-2026.md)
- [Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC](https://paper.lingyunyang.com/reading-notes/conference/fast-2026/sysspec.md): #generative\_file\_system #llm\_for\_systems #specification
- [HPCA 2026](https://paper.lingyunyang.com/reading-notes/conference/hpca-2026.md)
- [PPoPP 2026](https://paper.lingyunyang.com/reading-notes/conference/ppopp-2026.md)
- [SC 2025](https://paper.lingyunyang.com/reading-notes/conference/sc-2025.md)
- [SOSP 2025](https://paper.lingyunyang.com/reading-notes/conference/sosp-2025.md)
- [Jenga: Effective Memory Management for Serving LLM with Heterogeneity](https://paper.lingyunyang.com/reading-notes/conference/sosp-2025/jenga.md)
- [SIGCOMM 2025](https://paper.lingyunyang.com/reading-notes/conference/sigcomm-2025.md)
- [ICML 2025](https://paper.lingyunyang.com/reading-notes/conference/icml-2025.md)
- [ATC 2025](https://paper.lingyunyang.com/reading-notes/conference/atc-2025.md)
- [OSDI 2025](https://paper.lingyunyang.com/reading-notes/conference/osdi-2025.md)
- [ISCA 2025](https://paper.lingyunyang.com/reading-notes/conference/isca-2025.md)
- [SIGMETRICS 2025](https://paper.lingyunyang.com/reading-notes/conference/sigmetrics-2025.md)
- [HotOS 2025](https://paper.lingyunyang.com/reading-notes/conference/hotos-2025.md)
- [MLSys 2025](https://paper.lingyunyang.com/reading-notes/conference/mlsys-2025.md)
- [NSDI 2025](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2025.md)
- [ASPLOS 2025](https://paper.lingyunyang.com/reading-notes/conference/asplos-2025.md)
- [EuroSys 2025](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2025.md)
- [HPCA 2025](https://paper.lingyunyang.com/reading-notes/conference/hpca-2025.md)
- [PPoPP 2025](https://paper.lingyunyang.com/reading-notes/conference/ppopp-2025.md)
- [NeurIPS 2024](https://paper.lingyunyang.com/reading-notes/conference/neurips-2024.md)
- [SoCC 2024](https://paper.lingyunyang.com/reading-notes/conference/socc-2024.md)
- [HotNets 2024](https://paper.lingyunyang.com/reading-notes/conference/hotnets-2024.md)
- [SC 2024](https://paper.lingyunyang.com/reading-notes/conference/sc-2024.md)
- [SOSP 2024](https://paper.lingyunyang.com/reading-notes/conference/sosp-2024.md)
- [VLDB 2024](https://paper.lingyunyang.com/reading-notes/conference/vldb-2024.md)
- [SIGCOMM 2024](https://paper.lingyunyang.com/reading-notes/conference/sigcomm-2024.md)
- [ICML 2024](https://paper.lingyunyang.com/reading-notes/conference/icml-2024.md)
- [ATC 2024](https://paper.lingyunyang.com/reading-notes/conference/atc-2024.md)
- [OSDI 2024](https://paper.lingyunyang.com/reading-notes/conference/osdi-2024.md)
- [ISCA 2024](https://paper.lingyunyang.com/reading-notes/conference/isca-2024.md)
- [CVPR 2024](https://paper.lingyunyang.com/reading-notes/conference/cvpr-2024.md)
- [MLSys 2024](https://paper.lingyunyang.com/reading-notes/conference/mlsys-2024.md)
- [ASPLOS 2024](https://paper.lingyunyang.com/reading-notes/conference/asplos-2024.md)
- [SpotServe: Serving generative large language models on preemptible instances](https://paper.lingyunyang.com/reading-notes/conference/asplos-2024/spotserve.md)
- [EuroSys 2024](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2024.md)
- [Orion: Interference-aware, fine-grained GPU sharing for ML applications](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2024/orion-interference-aware-fine-grained-gpu-sharing-for-ml-applications.md)
- [Just-In-Time Checkpointing: Low Cost Error Recovery from Deep Learning Training Failures](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2024/jit-checkpointing.md)
- [NSDI 2024](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2024.md)
- [NeurIPS 2023](https://paper.lingyunyang.com/reading-notes/conference/neurips-2023.md)
- [SC 2023](https://paper.lingyunyang.com/reading-notes/conference/sc-2023.md)
- [Interference-aware multiplexing for deep learning in GPU clusters: A middleware approach](https://paper.lingyunyang.com/reading-notes/conference/sc-2023/iadeep.md)
- [SoCC 2023](https://paper.lingyunyang.com/reading-notes/conference/socc-2023.md)
- [SOSP 2023](https://paper.lingyunyang.com/reading-notes/conference/sosp-2023.md)
- [UGache: A unified GPU cache for embedding-based deep learning](https://paper.lingyunyang.com/reading-notes/conference/sosp-2023/ugache.md): #DLRM\_inference #GPU\_embedding\_cache
- [SIGCOMM 2023](https://paper.lingyunyang.com/reading-notes/conference/sigcomm-2023.md)
- [HotChips 2023](https://paper.lingyunyang.com/reading-notes/conference/hotchips-2023.md)
- [ICML 2023](https://paper.lingyunyang.com/reading-notes/conference/icml-2023.md)
- [ATC 2023](https://paper.lingyunyang.com/reading-notes/conference/atc-2023.md)
- [Accelerating Distributed MoE Training and Inference with Lina](https://paper.lingyunyang.com/reading-notes/conference/atc-2023/lina.md)
- [SmartMoE: Efficiently Training Sparsely-Activated Models ...](https://paper.lingyunyang.com/reading-notes/conference/atc-2023/smartmoe.md)
- [Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent](https://paper.lingyunyang.com/reading-notes/conference/atc-2023/fgd.md)
- [OSDI 2023](https://paper.lingyunyang.com/reading-notes/conference/osdi-2023.md)
- [HotOS 2023](https://paper.lingyunyang.com/reading-notes/conference/hotos-2023.md)
- [SIGMOD 2023](https://paper.lingyunyang.com/reading-notes/conference/sigmod-2023.md)
- [ISCA 2023](https://paper.lingyunyang.com/reading-notes/conference/isca-2023.md)
- [MLSys 2023](https://paper.lingyunyang.com/reading-notes/conference/mlsys-2023.md)
- [EuroSys 2023](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2023.md)
- [NSDI 2023](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2023.md)
- [Shepherd: Serving DNNs in the wild](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2023/shepherd.md): #model\_serving\_system #mixed-integer\_linear\_programming #workload\_unpredictability
- [Understanding RDMA microarchitecture resources for performance isolation](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2023/husky.md): #RDMA #performance\_isolation #test\_suite #RNIC #virtual\_machine #RDMA\_microarchitecture\_resource
- [Skyplane: Optimizing transfer cost and throughput using cloud-aware overlays](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2023/skyplane.md)
- [Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2023/shockwave.md)
- [ASPLOS 2023](https://paper.lingyunyang.com/reading-notes/conference/asplos-2023.md)
- [TPP: Transparent page placement for CXL-enabled tiered-memory](https://paper.lingyunyang.com/reading-notes/conference/asplos-2023/tpp.md): #CXL #memory\_management #tiered\_memory #Linux\_kernel
- [EVStore: Storage and caching capabilities for scaling embedding tables in deep recommendation system](https://paper.lingyunyang.com/reading-notes/conference/asplos-2023/evstore.md): #deep\_learning\_recommender\_system #embedding\_lookup #recommendation\_inference #cache
- [Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs](https://paper.lingyunyang.com/reading-notes/conference/asplos-2023/lucid.md): #deep\_learning\_training\_workloads #cluster\_scheduler #system\_interpretability #ML\_for\_System #decision\_tree #generalized\_additive\_model
- [SC 2022](https://paper.lingyunyang.com/reading-notes/conference/sc-2022.md)
- [SoCC 2022](https://paper.lingyunyang.com/reading-notes/conference/socc-2022.md)
- [ESCHER: Expressive scheduling with ephemeral resources](https://paper.lingyunyang.com/reading-notes/conference/socc-2022/escher.md): #ephemeral\_resources #scheduling\_flexibility #scheduling\_requirements #Kubernetes #Ray
- [Serving unseen deep learning model with near-optimal configurations: A fast adaptive search approach](https://paper.lingyunyang.com/reading-notes/conference/socc-2022/falcon.md)
- [SIGCOMM 2022](https://paper.lingyunyang.com/reading-notes/conference/sigcomm-2022.md)
- [Multi-resource interleaving for deep learning training](https://paper.lingyunyang.com/reading-notes/conference/sigcomm-2022/multi-resource-interleaving-for-deep-learning-training.md): #deep\_learning\_training\_workloads #multi\_resource\_scheduler #multi\_resource\_interleaving #PyTorch #iterative\_process #blossom\_algorithm
- [ATC 2022](https://paper.lingyunyang.com/reading-notes/conference/atc-2022.md)
- [PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training](https://paper.lingyunyang.com/reading-notes/conference/atc-2022/pilotfish.md): Resource manager which co-locates cloud gaming and DL training to improve GPU utilization.
- [Memory Harvesting in Multi-GPU Systems with Hierarchical Unified Virtual Memory](https://paper.lingyunyang.com/reading-notes/conference/atc-2022/memharvester.md): GPU memory manager which harvests the temporarily available neighbor GPUs' memory.
- [Whale: Efficient Giant Model Training over Heterogeneous GPUs](https://paper.lingyunyang.com/reading-notes/conference/atc-2022/whale.md): Distributed training framework for large models.
- [DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Service...](https://paper.lingyunyang.com/reading-notes/conference/atc-2022/dvabatch.md): DNN batching inference system to reduce the latency and improve the throughput.
- [Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing](https://paper.lingyunyang.com/reading-notes/conference/atc-2022/gpulet.md): DNN inference scheduling framework to improve GPU utilization under SLO constraints.
- [SOTER: Guarding Black-box Inference for General Neural Networks at the Edge](https://paper.lingyunyang.com/reading-notes/conference/atc-2022/soter.md): Secure DNN inference system to ensure model confidentiality, low latency, and high accuracy with integrity protection.
- [Direct access, high-performance memory disaggregation with DirectCXL](https://paper.lingyunyang.com/reading-notes/conference/atc-2022/directcxl.md)
- [OSDI 2022](https://paper.lingyunyang.com/reading-notes/conference/osdi-2022.md)
- [Orca: A distributed serving system for transformer-based generative models](https://paper.lingyunyang.com/reading-notes/conference/osdi-2022/orca.md): #distributed\_serving\_system #batch\_serving #selective\_batching #transformer-based\_model #iteration-level\_scheduling
- [Microsecond-scale preemption for concurrent GPU-accelerated DNN inferences](https://paper.lingyunyang.com/reading-notes/conference/osdi-2022/reef.md): #deep\_learning\_inference\_system #GPU\_kernel\_preemption #co-location
- [Looking beyond GPUs for DNN scheduling on multi-tenant clusters](https://paper.lingyunyang.com/reading-notes/conference/osdi-2022/synergy.md): #deep\_learning\_training\_workloads #resource\_scheduler #homogeneous\_cluster
- [IPDPS 2022](https://paper.lingyunyang.com/reading-notes/conference/ipdps-2022.md)
- [DGSF: Disaggregated GPUs for serverless functions](https://paper.lingyunyang.com/reading-notes/conference/ipdps-2022/dgsf.md): Transparently enable serverless functions to use GPUs through CUDA APIs.
- [EuroSys 2022](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2022.md)
- [Slashing the disaggregation tax in heterogeneous data centers with FractOS](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2022/slashing-the-disaggregation-tax-in-heterogeneous-data-centers-with-fractos.md): #rCUDA #distributed\_OS #disaggregated\_system #GPU\_adaptor #device\_adaptor
- [NSDI 2022](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2022.md)
- [SoCC 2021](https://paper.lingyunyang.com/reading-notes/conference/socc-2021.md)
- [ATC 2021](https://paper.lingyunyang.com/reading-notes/conference/atc-2021.md)
- [Zico: Efficient GPU memory sharing for concurrent DNN training](https://paper.lingyunyang.com/reading-notes/conference/atc-2021/zico.md): Reduce the system-wide GPU memory consumption for co-located DNN training jobs.
- [OSDI 2021](https://paper.lingyunyang.com/reading-notes/conference/osdi-2021.md)
- [Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning](https://paper.lingyunyang.com/reading-notes/conference/osdi-2021/pollux.md)
- [SOSP 2021](https://paper.lingyunyang.com/reading-notes/conference/sosp-2021.md)
- [HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM](https://paper.lingyunyang.com/reading-notes/conference/sosp-2021/hemem.md)
- [EuroSys 2021](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2021.md)
- [Take it to the limit: Peak prediction-driven resource overcommitment in datacenters](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2021/peak-oracle.md)
- [HotOS 2021](https://paper.lingyunyang.com/reading-notes/conference/hotos-2021.md)
- [From cloud computing to sky computing](https://paper.lingyunyang.com/reading-notes/conference/hotos-2021/sky-computing.md)
- [NSDI 2021](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2021.md)
- [OSDI 2020](https://paper.lingyunyang.com/reading-notes/conference/osdi-2020.md)
- [A unified architecture for accelerating distributed DNN training in heterogeneous GPU/CPU clusters](https://paper.lingyunyang.com/reading-notes/conference/osdi-2020/byteps.md): #communication\_framework #parameter\_server #all-reduce #RoCEv2 #heterogeneous\_environment #distributed\_deep\_learning\_training
- [HiveD: Sharing a GPU cluster for deep learning with guarantees](https://paper.lingyunyang.com/reading-notes/conference/osdi-2020/hived.md)
- [ATC 2020](https://paper.lingyunyang.com/reading-notes/conference/atc-2020.md)
- [Serverless in the wild: Characterizing and optimizing the serverless workload](https://paper.lingyunyang.com/reading-notes/conference/atc-2020/serverless-in-the-wild-characterizing-and-optimizing-the-serverless-workload.md): #serverless #Function\_as\_a\_Service #FaaS #trace\_analysis #reduce\_code\_start\_invocations
- [EuroSys 2020](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2020.md)
- [ASPLOS 2020](https://paper.lingyunyang.com/reading-notes/conference/asplos-2020.md)
- [MLSys 2020](https://paper.lingyunyang.com/reading-notes/conference/mlsys-2020.md)
- [SoCC 2020](https://paper.lingyunyang.com/reading-notes/conference/socc-2020.md)
- [Elastic Parameter Server Load Distribution in Deep Learning Clusters](https://paper.lingyunyang.com/reading-notes/conference/socc-2020/elastic-parameter-server-load-distribution-in-deep-learning-clusters.md)
- [HPDC 2020](https://paper.lingyunyang.com/reading-notes/conference/hpdc-2020.md)
- [KubeShare: A framework to manage GPUs as first-class and shared resources in container cloud](https://paper.lingyunyang.com/reading-notes/conference/hpdc-2020/kubeshare.md)
- [CLUSTER 2019](https://paper.lingyunyang.com/reading-notes/conference/cluster-2019.md)
- [EuroSys 2019](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2019.md)
- [NSDI 2019](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2019.md)
- [IWQoS 2019](https://paper.lingyunyang.com/reading-notes/conference/iwqos-2019.md)
- [Who limits the resource efficiency of my datacenter: An analysis of Alibaba datacenter traces](https://paper.lingyunyang.com/reading-notes/conference/iwqos-2019/who-limits-the-resource-efficiency-of-my-datacenter.md): Trace analysis in Alibaba production clusters, which co-locates different workloads to improve resource efficiency.
- [SIGCOMM 2018](https://paper.lingyunyang.com/reading-notes/conference/sigcomm-2018.md)
- [Revisiting network support for RDMA](https://paper.lingyunyang.com/reading-notes/conference/sigcomm-2018/irn.md): An improved RoCE NIC (IRN) design that makes a few simple changes to the RoCE NIC for better handling of packets.
- [OSDI 2018](https://paper.lingyunyang.com/reading-notes/conference/osdi-2018.md)
- [Ray: A distributed framework for emerging AI applications](https://paper.lingyunyang.com/reading-notes/conference/osdi-2018/ray.md)
- [EuroSys 2018](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2018.md)
- [Medea: Scheduling of long running applications in shared production clusters](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2018/medea.md)
- [ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018](https://paper.lingyunyang.com/reading-notes/conference/ispa-iucc-bdcloud-socialcom-sustaincom-2018.md)
- [GaiaGPU: Sharing GPUs in container clouds](https://paper.lingyunyang.com/reading-notes/conference/ispa-iucc-bdcloud-socialcom-sustaincom-2018/gaiagpu.md)
- [SoCC 2017](https://paper.lingyunyang.com/reading-notes/conference/socc-2017.md)
- [SLAQ: Quality-driven scheduling for distributed machine learning](https://paper.lingyunyang.com/reading-notes/conference/socc-2017/slaq.md)
- [ASPLOS 2017](https://paper.lingyunyang.com/reading-notes/conference/asplos-2017.md)
- [Neurosurgeon: Collaborative intelligence between the cloud and mobile edge](https://paper.lingyunyang.com/reading-notes/conference/asplos-2017/neurosurgeon.md): #graph\_partitioning #cloud-edge\_collaboration #prediction\_model #computation\_offloading
- [NSDI 2017](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2017.md)
- [Clipper: A low-latency online prediction serving system](https://paper.lingyunyang.com/reading-notes/conference/nsdi-2017/clipper.md)
- [CLUSTER 2014](https://paper.lingyunyang.com/reading-notes/conference/cluster-2014.md)
- [Evaluating job packing in warehouse-scale computing](https://paper.lingyunyang.com/reading-notes/conference/cluster-2014/evaluating-job-packing.md)
- [Journal](https://paper.lingyunyang.com/reading-notes/journal.md)
- [IEEE Transactions on Cloud Computing](https://paper.lingyunyang.com/reading-notes/journal/tcc.md)
- [2021](https://paper.lingyunyang.com/reading-notes/journal/tcc/tcc-2021.md)
- [Gemini: Enabling multi-tenant GPU sharing based on kernel burst estimation](https://paper.lingyunyang.com/reading-notes/journal/tcc/tcc-2021/gemini-enabling-multi-tenant-gpu-sharing-based-on-kernel-burst-estimation.md): #GPU\_sharing #GPU\_time\_sharing #API\_remoting #kernel\_burst
- [ACM Computing Surveys](https://paper.lingyunyang.com/reading-notes/journal/csur.md)
- [2017](https://paper.lingyunyang.com/reading-notes/journal/csur/csur-2017.md)
- [GPU virtualization and scheduling methods: A comprehensive survey](https://paper.lingyunyang.com/reading-notes/journal/csur/csur-2017/gpu-virtualization-survey.md)
- [ACM SIGCOMM Computer Communication Review (CCR)](https://paper.lingyunyang.com/reading-notes/journal/ccr.md)
- [2021](https://paper.lingyunyang.com/reading-notes/journal/ccr/2021.md)
- [Data-driven Networking Research: models for academic collaboration with industry](https://paper.lingyunyang.com/reading-notes/journal/ccr/2021/data-driven-networking-research.md)
- [2007](https://paper.lingyunyang.com/reading-notes/journal/ccr/2007.md)
- [How to Read a Paper](https://paper.lingyunyang.com/reading-notes/journal/ccr/2007/how-to-read-a-paper.md)
- [Communications of the ACM](https://paper.lingyunyang.com/reading-notes/journal/communications-of-the-acm.md)
- [2015](https://paper.lingyunyang.com/reading-notes/journal/communications-of-the-acm/2015.md)
- [Why Google stores billions of lines of code in a single repository](https://paper.lingyunyang.com/reading-notes/journal/communications-of-the-acm/2015/why-google-stores-billions-of-lines-of-code-in-a-single-repository.md)
- [Miscellaneous](https://paper.lingyunyang.com/reading-notes/miscellaneous.md)
- [arXiv](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv.md): A free distribution service and an open-access archive.
- [2024](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2024.md)
- [Efficiently programming large language models using SGLang](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2024/sglang.md): LLM Inference
- [2023](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023.md)
- [HexGen: Generative inference of foundation model over heterogeneous decentralized environment](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023/hexgen.md)
- [High-throughput generative inference of large language models with a single GPU](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023/flexgen.md): An offloading framework for high-throughput LLM inference.
- [2022](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022.md)
- [DisaggRec: Architecting disaggregated systems for large-scale personalized recommendation](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/disaggrec.md): #deep\_learning\_recommender\_system #memory\_disaggregation #total\_cost\_of\_ownership #RDMA
- [A case for disaggregation of ML data processing](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/tf-data.md)
- [Singularity: Planet-scale, preemptive and elastic scheduling of AI workloads](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/singularity.md): Live GPU job migration.
- [Aryl: An elastic cluster scheduler for deep learning](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/aryl.md)
- [2016](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016.md)
- [Wide & deep learning for recommender systems](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016/wide-and-deep-learning-for-recommender-systems.md): A recommender system with a wide & deep model (WDL).
- [Training deep nets with sublinear memory cost](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016/training-deep-nets-with-sublinear-memory-cost.md): Reduce memory cost to store intermediate results and gradients.
- [MSR Technical Report](https://paper.lingyunyang.com/reading-notes/miscellaneous/msr-technical-report.md)
- [2011](https://paper.lingyunyang.com/reading-notes/miscellaneous/msr-technical-report/2011.md)
- [Heuristics for vector bin packing](https://paper.lingyunyang.com/reading-notes/miscellaneous/msr-technical-report/2011/heuristics-for-vector-bin-packing.md)