# Reading Notes

- [Conference](/reading-notes/conference.md)
- [OSDI 2026](/reading-notes/conference/osdi-2026.md)
- [MLSys 2026](/reading-notes/conference/mlsys-2026.md)
- [NSDI 2026](/reading-notes/conference/nsdi-2026.md)
- [EuroSys 2026](/reading-notes/conference/eurosys-2026.md)
- [ASPLOS 2026](/reading-notes/conference/asplos-2026.md)
- [HPCA 2026](/reading-notes/conference/hpca-2026.md)
- [PPoPP 2026](/reading-notes/conference/ppopp-2026.md)
- [SC 2025](/reading-notes/conference/sc-2025.md)
- [SOSP 2025](/reading-notes/conference/sosp-2025.md)
- [SIGCOMM 2025](/reading-notes/conference/sigcomm-2025.md)
- [ICML 2025](/reading-notes/conference/icml-2025.md)
- [ATC 2025](/reading-notes/conference/atc-2025.md)
- [OSDI 2025](/reading-notes/conference/osdi-2025.md)
- [ISCA 2025](/reading-notes/conference/isca-2025.md)
- [SIGMETRICS 2025](/reading-notes/conference/sigmetrics-2025.md)
- [HotOS 2025](/reading-notes/conference/hotos-2025.md)
- [MLSys 2025](/reading-notes/conference/mlsys-2025.md)
- [NSDI 2025](/reading-notes/conference/nsdi-2025.md)
- [ASPLOS 2025](/reading-notes/conference/asplos-2025.md)
- [EuroSys 2025](/reading-notes/conference/eurosys-2025.md)
- [HPCA 2025](/reading-notes/conference/hpca-2025.md)
- [PPoPP 2025](/reading-notes/conference/ppopp-2025.md)
- [NeurIPS 2024](/reading-notes/conference/neurips-2024.md)
- [SoCC 2024](/reading-notes/conference/socc-2024.md)
- [HotNets 2024](/reading-notes/conference/hotnets-2024.md)
- [SC 2024](/reading-notes/conference/sc-2024.md)
- [SOSP 2024](/reading-notes/conference/sosp-2024.md)
- [VLDB 2024](/reading-notes/conference/vldb-2024.md)
- [SIGCOMM 2024](/reading-notes/conference/sigcomm-2024.md)
- [ICML 2024](/reading-notes/conference/icml-2024.md)
- [ATC 2024](/reading-notes/conference/atc-2024.md)
- [OSDI 2024](/reading-notes/conference/osdi-2024.md)
- [ISCA 2024](/reading-notes/conference/isca-2024.md)
- [CVPR 2024](/reading-notes/conference/cvpr-2024.md)
- [MLSys 2024](/reading-notes/conference/mlsys-2024.md)
- [ASPLOS 2024](/reading-notes/conference/asplos-2024.md)
- [SpotServe: Serving generative large language models on preemptible instances](/reading-notes/conference/asplos-2024/spotserve.md)
- [EuroSys 2024](/reading-notes/conference/eurosys-2024.md)
- [Orion: Interference-aware, fine-grained GPU sharing for ML applications](/reading-notes/conference/eurosys-2024/orion-interference-aware-fine-grained-gpu-sharing-for-ml-applications.md)
- [NSDI 2024](/reading-notes/conference/nsdi-2024.md)
- [NeurIPS 2023](/reading-notes/conference/neurips-2023.md)
- [SC 2023](/reading-notes/conference/sc-2023.md)
- [Interference-aware multiplexing for deep learning in GPU clusters: A middleware approach](/reading-notes/conference/sc-2023/iadeep.md)
- [SoCC 2023](/reading-notes/conference/socc-2023.md)
- [SOSP 2023](/reading-notes/conference/sosp-2023.md)
- [UGache: A unified GPU cache for embedding-based deep learning](/reading-notes/conference/sosp-2023/ugache.md): #DLRM\_inference #GPU\_embedding\_cache
- [SIGCOMM 2023](/reading-notes/conference/sigcomm-2023.md)
- [HotChips 2023](/reading-notes/conference/hotchips-2023.md)
- [ICML 2023](/reading-notes/conference/icml-2023.md)
- [ATC 2023](/reading-notes/conference/atc-2023.md)
- [Accelerating Distributed MoE Training and Inference with Lina](/reading-notes/conference/atc-2023/lina.md)
- [SmartMoE: Efficiently Training Sparsely-Activated Models ...](/reading-notes/conference/atc-2023/smartmoe.md)
- [Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent](/reading-notes/conference/atc-2023/fgd.md)
- [OSDI 2023](/reading-notes/conference/osdi-2023.md)
- [HotOS 2023](/reading-notes/conference/hotos-2023.md)
- [SIGMOD 2023](/reading-notes/conference/sigmod-2023.md)
- [ISCA 2023](/reading-notes/conference/isca-2023.md)
- [MLSys 2023](/reading-notes/conference/mlsys-2023.md)
- [EuroSys 2023](/reading-notes/conference/eurosys-2023.md)
- [NSDI 2023](/reading-notes/conference/nsdi-2023.md)
- [Shepherd: Serving DNNs in the wild](/reading-notes/conference/nsdi-2023/shepherd.md): #model\_serving\_system #mixed-integer\_linear\_programming #workload\_unpredictability
- [Understanding RDMA microarchitecture resources for performance isolation](/reading-notes/conference/nsdi-2023/husky.md): #RDMA #performance\_isolation #test\_suite #RNIC #virtual\_machine #RDMA\_microarchitecture\_resource
- [Skyplane: Optimizing transfer cost and throughput using cloud-aware overlays](/reading-notes/conference/nsdi-2023/skyplane.md)
- [Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning](/reading-notes/conference/nsdi-2023/shockwave.md)
- [ASPLOS 2023](/reading-notes/conference/asplos-2023.md)
- [TPP: Transparent page placement for CXL-enabled tiered-memory](/reading-notes/conference/asplos-2023/tpp.md): #CXL #memory\_management #tiered\_memory #Linux\_kernel
- [EVStore: Storage and caching capabilities for scaling embedding tables in deep recommendation system](/reading-notes/conference/asplos-2023/evstore.md): #deep\_learning\_recommender\_system #embedding\_lookup #recommendation\_inference #cache
- [Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs](/reading-notes/conference/asplos-2023/lucid.md): #deep\_learning\_training\_workloads #cluster\_scheduler #system\_interpretability #ML\_for\_System #decision\_tree #generalized\_additive\_model
- [SC 2022](/reading-notes/conference/sc-2022.md)
- [SoCC 2022](/reading-notes/conference/socc-2022.md)
- [ESCHER: Expressive scheduling with ephemeral resources](/reading-notes/conference/socc-2022/escher.md): #ephemeral\_resources #scheduling\_flexibility #scheduling\_requirements #Kubernetes #Ray
- [Serving unseen deep learning model with near-optimal configurations: A fast adaptive search approach](/reading-notes/conference/socc-2022/falcon.md)
- [SIGCOMM 2022](/reading-notes/conference/sigcomm-2022.md)
- [Multi-resource interleaving for deep learning training](/reading-notes/conference/sigcomm-2022/multi-resource-interleaving-for-deep-learning-training.md): #deep\_learning\_training\_workloads #multi\_resource\_scheduler #multi\_resource\_interleaving #PyTorch #iterative\_process #blossom\_algorithm
- [ATC 2022](/reading-notes/conference/atc-2022.md)
- [PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training](/reading-notes/conference/atc-2022/pilotfish.md): Resource manager which co-locates cloud gaming and DL training to improve GPU utilization.
- [Memory Harvesting in Multi-GPU Systems with Hierarchical Unified Virtual Memory](/reading-notes/conference/atc-2022/memharvester.md): GPU memory manager which harvests the temporarily available neighbor GPUs' memory.
- [Whale: Efficient Giant Model Training over Heterogeneous GPUs](/reading-notes/conference/atc-2022/whale.md): Distributed training framework for large models.
- [DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Service...](/reading-notes/conference/atc-2022/dvabatch.md): DNN batching inference system to reduce the latency and improve the throughput.
- [Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing](/reading-notes/conference/atc-2022/gpulet.md): DNN inference scheduling framework to improve GPU utilization under SLO constraints.
- [SOTER: Guarding Black-box Inference for General Neural Networks at the Edge](/reading-notes/conference/atc-2022/soter.md): Secure DNN inference system to ensure model confidentiality, low latency, and high accuracy with integrity protection.
- [Direct access, high-performance memory disaggregation with DirectCXL](/reading-notes/conference/atc-2022/directcxl.md)
- [OSDI 2022](/reading-notes/conference/osdi-2022.md)
- [Orca: A distributed serving system for transformer-based generative models](/reading-notes/conference/osdi-2022/orca.md): #distributed\_serving\_system #batch\_serving #selective\_batching #transformer-based\_model #iteration-level\_scheduling
- [Microsecond-scale preemption for concurrent GPU-accelerated DNN inferences](/reading-notes/conference/osdi-2022/reef.md): #deep\_learning\_inference\_system #GPU\_kernel\_preemption #co-location
- [Looking beyond GPUs for DNN scheduling on multi-tenant clusters](/reading-notes/conference/osdi-2022/synergy.md): #deep\_learning\_training\_workloads #resource\_scheduler #homogeneous\_cluster
- [IPDPS 2022](/reading-notes/conference/ipdps-2022.md)
- [DGSF: Disaggregated GPUs for serverless functions](/reading-notes/conference/ipdps-2022/dgsf.md): Transparently enable serverless functions to use GPUs through CUDA APIs.
- [EuroSys 2022](/reading-notes/conference/eurosys-2022.md)
- [Slashing the disaggregation tax in heterogeneous data centers with FractOS](/reading-notes/conference/eurosys-2022/slashing-the-disaggregation-tax-in-heterogeneous-data-centers-with-fractos.md): #rCUDA #distributed\_OS #disaggregated\_system #GPU\_adaptor #device\_adaptor
- [NSDI 2022](/reading-notes/conference/nsdi-2022.md)
- [SoCC 2021](/reading-notes/conference/socc-2021.md)
- [ATC 2021](/reading-notes/conference/atc-2021.md)
- [Zico: Efficient GPU memory sharing for concurrent DNN training](/reading-notes/conference/atc-2021/zico.md): Reduce the system-wide GPU memory consumption for co-located DNN training jobs.
- [OSDI 2021](/reading-notes/conference/osdi-2021.md)
- [Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning](/reading-notes/conference/osdi-2021/pollux.md)
- [SOSP 2021](/reading-notes/conference/sosp-2021.md)
- [HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM](/reading-notes/conference/sosp-2021/hemem.md)
- [EuroSys 2021](/reading-notes/conference/eurosys-2021.md)
- [Take it to the limit: Peak prediction-driven resource overcommitment in datacenters](/reading-notes/conference/eurosys-2021/peak-oracle.md)
- [HotOS 2021](/reading-notes/conference/hotos-2021.md)
- [From cloud computing to sky computing](/reading-notes/conference/hotos-2021/sky-computing.md)
- [NSDI 2021](/reading-notes/conference/nsdi-2021.md)
- [OSDI 2020](/reading-notes/conference/osdi-2020.md)
- [A unified architecture for accelerating distributed DNN training in heterogeneous GPU/CPU clusters](/reading-notes/conference/osdi-2020/byteps.md): #communication\_framework #parameter\_server #all-reduce #RoCEv2 #heterogeneous\_environment #distributed\_deep\_learning\_training
- [HiveD: Sharing a GPU cluster for deep learning with guarantees](/reading-notes/conference/osdi-2020/hived.md)
- [ATC 2020](/reading-notes/conference/atc-2020.md)
- [Serverless in the wild: Characterizing and optimizing the serverless workload](/reading-notes/conference/atc-2020/serverless-in-the-wild-characterizing-and-optimizing-the-serverless-workload.md): #serverless #Function\_as\_a\_Service #FaaS #trace\_analysis #reduce\_code\_start\_invocations
- [EuroSys 2020](/reading-notes/conference/eurosys-2020.md)
- [ASPLOS 2020](/reading-notes/conference/asplos-2020.md)
- [MLSys 2020](/reading-notes/conference/mlsys-2020.md)
- [SoCC 2020](/reading-notes/conference/socc-2020.md)
- [Elastic Parameter Server Load Distribution in Deep Learning Clusters](/reading-notes/conference/socc-2020/elastic-parameter-server-load-distribution-in-deep-learning-clusters.md)
- [HPDC 2020](/reading-notes/conference/hpdc-2020.md)
- [KubeShare: A framework to manage GPUs as first-class and shared resources in container cloud](/reading-notes/conference/hpdc-2020/kubeshare.md)
- [CLUSTER 2019](/reading-notes/conference/cluster-2019.md)
- [EuroSys 2019](/reading-notes/conference/eurosys-2019.md)
- [NSDI 2019](/reading-notes/conference/nsdi-2019.md)
- [IWQoS 2019](/reading-notes/conference/iwqos-2019.md)
- [Who limits the resource efficiency of my datacenter: An analysis of Alibaba datacenter traces](/reading-notes/conference/iwqos-2019/who-limits-the-resource-efficiency-of-my-datacenter.md): Trace analysis in Alibaba production clusters, which co-locates different workloads to improve resource efficiency.
- [SIGCOMM 2018](/reading-notes/conference/sigcomm-2018.md)
- [Revisiting network support for RDMA](/reading-notes/conference/sigcomm-2018/irn.md): An improved RoCE NIC (IRN) design that makes a few simple changes to the RoCE NIC for better handling of packets.
- [OSDI 2018](/reading-notes/conference/osdi-2018.md)
- [Ray: A distributed framework for emerging AI applications](/reading-notes/conference/osdi-2018/ray.md)
- [EuroSys 2018](/reading-notes/conference/eurosys-2018.md)
- [Medea: Scheduling of long running applications in shared production clusters](/reading-notes/conference/eurosys-2018/medea.md)
- [ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018](/reading-notes/conference/ispa-iucc-bdcloud-socialcom-sustaincom-2018.md)
- [GaiaGPU: Sharing GPUs in container clouds](/reading-notes/conference/ispa-iucc-bdcloud-socialcom-sustaincom-2018/gaiagpu.md)
- [SoCC 2017](/reading-notes/conference/socc-2017.md)
- [SLAQ: Quality-driven scheduling for distributed machine learning](/reading-notes/conference/socc-2017/slaq.md)
- [ASPLOS 2017](/reading-notes/conference/asplos-2017.md)
- [Neurosurgeon: Collaborative intelligence between the cloud and mobile edge](/reading-notes/conference/asplos-2017/neurosurgeon.md): #graph\_partitioning #cloud-edge\_collaboration #prediction\_model #computation\_offloading
- [NSDI 2017](/reading-notes/conference/nsdi-2017.md)
- [Clipper: A low-latency online prediction serving system](/reading-notes/conference/nsdi-2017/clipper.md)
- [CLUSTER 2014](/reading-notes/conference/cluster-2014.md)
- [Evaluating job packing in warehouse-scale computing](/reading-notes/conference/cluster-2014/evaluating-job-packing.md)
- [Journal](/reading-notes/journal.md)
- [IEEE Transactions on Cloud Computing](/reading-notes/journal/tcc.md)
- [2021](/reading-notes/journal/tcc/tcc-2021.md)
- [Gemini: Enabling multi-tenant GPU sharing based on kernel burst estimation](/reading-notes/journal/tcc/tcc-2021/gemini-enabling-multi-tenant-gpu-sharing-based-on-kernel-burst-estimation.md): #GPU\_sharing #GPU\_time\_sharing #API\_remoting #kernel\_burst
- [ACM Computing Surveys](/reading-notes/journal/csur.md)
- [2017](/reading-notes/journal/csur/csur-2017.md)
- [GPU virtualization and scheduling methods: A comprehensive survey](/reading-notes/journal/csur/csur-2017/gpu-virtualization-survey.md)
- [ACM SIGCOMM Computer Communication Review (CCR)](/reading-notes/journal/ccr.md)
- [2021](/reading-notes/journal/ccr/2021.md)
- [Data-driven Networking Research: models for academic collaboration with industry](/reading-notes/journal/ccr/2021/data-driven-networking-research.md)
- [2007](/reading-notes/journal/ccr/2007.md)
- [How to Read a Paper](/reading-notes/journal/ccr/2007/how-to-read-a-paper.md)
- [Communications of the ACM](/reading-notes/journal/communications-of-the-acm.md)
- [2015](/reading-notes/journal/communications-of-the-acm/2015.md)
- [Why Google stores billions of lines of code in a single repository](/reading-notes/journal/communications-of-the-acm/2015/why-google-stores-billions-of-lines-of-code-in-a-single-repository.md)
- [Miscellaneous](/reading-notes/miscellaneous.md)
- [arXiv](/reading-notes/miscellaneous/arxiv.md): A free distribution service and an open-access archive.
- [2024](/reading-notes/miscellaneous/arxiv/2024.md)
- [Efficiently programming large language models using SGLang](/reading-notes/miscellaneous/arxiv/2024/sglang.md): LLM Inference
- [2023](/reading-notes/miscellaneous/arxiv/2023.md)
- [HexGen: Generative inference of foundation model over heterogeneous decentralized environment](/reading-notes/miscellaneous/arxiv/2023/hexgen.md)
- [High-throughput generative inference of large language models with a single GPU](/reading-notes/miscellaneous/arxiv/2023/flexgen.md): An offloading framework for high-throughput LLM inference.
- [2022](/reading-notes/miscellaneous/arxiv/2022.md)
- [DisaggRec: Architecting disaggregated systems for large-scale personalized recommendation](/reading-notes/miscellaneous/arxiv/2022/disaggrec.md): #deep\_learning\_recommender\_system #memory\_disaggregation #total\_cost\_of\_ownership #RDMA
- [A case for disaggregation of ML data processing](/reading-notes/miscellaneous/arxiv/2022/tf-data.md)
- [Singularity: Planet-scale, preemptive and elastic scheduling of AI workloads](/reading-notes/miscellaneous/arxiv/2022/singularity.md): Live GPU job migration.
- [Aryl: An elastic cluster scheduler for deep learning](/reading-notes/miscellaneous/arxiv/2022/aryl.md)
- [2016](/reading-notes/miscellaneous/arxiv/2016.md)
- [Wide & deep learning for recommender systems](/reading-notes/miscellaneous/arxiv/2016/wide-and-deep-learning-for-recommender-systems.md): A recommender system with a wide & deep model (WDL).
- [Training deep nets with sublinear memory cost](/reading-notes/miscellaneous/arxiv/2016/training-deep-nets-with-sublinear-memory-cost.md): Reduce memory cost to store intermediate results and gradients.
- [MSR Technical Report](/reading-notes/miscellaneous/msr-technical-report.md)
- [2011](/reading-notes/miscellaneous/msr-technical-report/2011.md)
- [Heuristics for vector bin packing](/reading-notes/miscellaneous/msr-technical-report/2011/heuristics-for-vector-bin-packing.md)
