SIGCOMM 2024
Meta Info
Homepage: https://conferences.sigcomm.org/sigcomm/2024/
Paper list
Papers
Large Language Models (LLMs)
- Systems/Networking for LLM - CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving [Paper] [arXiv] [Code] [Video] - UChicago & Microsoft & Stanford 
- CacheGen: A context-loading module for LLM systems. - Use a custom tensor encoder to encode a KV cache into more compact bitstream representations with negligible decoding overhead. 
- Adapt the compression level of different parts of a KV cache to cope with changes in available bandwidth. 
 
- Objective: Focus on reducing the network delay in fetching the KV cache → TTFT reduction. 
 
- Alibaba HPN: A Data Center Network for Large Language Model Training [Paper] [Video] - Alibaba Cloud 
- Experience Track 
- LLM training's characteristics - Produce a small number of periodic, bursty flows (e.g., 400Gbps) on each host. 
- Require GPUs to complete iterations in synchronization; more sensitive to single-point failure. 
 
- Alibaba High-Performance Network (HPN): Introduce a 2-tier, dual-plane architecture capable of interconnecting 15K GPUs within one Pod. - Benefits: eliminate hash polarization; simplify the optimal path selections. 
 
 
- RDMA over Ethernet for Distributed Training at Meta Scale [Paper] [Blog] - Meta 
- Experience Track 
- Deploy a combination of centralized traffic engineering and an Enhanced ECMP (Equal-Cost Multi-Path) scheme to achieve optimal load distribution for training workloads. 
- Design a receiver-driven traffic admission via the collective library -> Co-tune both the collective library configuration and the underlying network configuration. 
 
 
- LLMs for Networking - NetLLM: Adapting Large Language Models for Networking [Paper] - CUHK-Shenzhen & Tsinghua SIGS & UChicago 
- NetLLM: Empower the LLM to process multimodal data in networking and generate task-specific answers. 
- Study three networking-related use cases: viewport prediction, adaptive bitrate streaming, and cluster job scheduling. 
 
 
Distributed Training
- Crux: GPU-Efficient Communication Scheduling for Deep Learning Training [Paper] [Dataset] - Alibaba Cloud 
- Observation: Communication contention among different deep learning training (DLT) jobs seriously influences the overall GPU computation utilization -> Low efficiency of the training cluster. 
- Crux: A communication scheduler - Objective: Mitigate the communication contention among DLT jobs -> Maximize GPU computation utilization. 
- Designs: reduce the GPU utilization problem to a flow optimization problem; GPU intensity-aware communication scheduling; prioritize the DLT flows with high GPU computation intensity. 
 
 
- Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs [Paper] - KAIST & UC Irvine & VMware Research 
- StellaTrain: Cache-aware gradient compression; a CPU-based sparse optimizer. 
- Adapt training configurations to fluctuating dynamic network bandwidth -> Enable co-training using on-premises and cloud clusters. 
 
Data Processing
- Turbo: Efficient Communication Framework for Large-scale Data Processing Cluster [Paper] - Tencent & FDU & NVIDIA & THU 
- Experience Track 
- Network throughput & scalability: A dynamic block-level flowlet transmission mechanism; a non-blocking communication middleware. 
- System reliability: Utilize an external shuffle service as well as TCP serving as a backup. 
- Integrated into Apache Spark. 
 
Data Transfers
- An exabyte a day: Throughput-oriented, Large-scale, Managed Data Transfers with Effingo [Paper] - Google 
- Experience Track 
- Effingo: A copy system, integrated with resource management and authorization systems. - Per-cluster deployments -> Limit failure domains to individual clusters. 
- Separation from the bandwidth management layer (BwE) -> A modular design that reduces dependencies. 
 
 
Last updated
Was this helpful?