PPoPP 2026
Meta Info
Homepage: https://ppopp26.sigplan.org/
Paper list: https://ppopp26.sigplan.org/track/PPoPP-2026-papers
Papers
LLM
LLM training
CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training
ICT, CAS & Ant Group
COCCL: A Collective Communication Library Supporting Easy Integration and Configuration of Customized Compression for Scalable LLM Training
ICT, CAS & CUHK-SZ
Elastor: Elastic and Efficient Model Partitioning and Checkpointing for Fault-tolerant Distributed Training
PKU
LLM inference
JanusQuant: Accurate and Efficient 2-bit KV Cache Quantization for Long-context Inference
WHU
Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving
SYSU
High-Throughput Non-Uniformly Quantized 3-bit LLM Inference
CUHK & HKUST
Accelerating Sparse Transformer Inference on GPU
CUP-Beijing & BUAA
Diffusion Models
Difflow: A Data-Characteristic-Aware Serving System for Diffusion Models
THU
MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models
UWaterloo & CMU & Rice
GNN
APERTURE: Algorithm-System Co-Optimization for Temporal Graph Network Inference
BUAA
ElasGNN: An Elastic Training Framework for Distributed GNN Training
BUAA
TAC: Cache-based System for Accelerating Billion-Scale GNN Training on Multi-GPU Platform
UCAS
Sparse Matrix
ASM-SpMM: Unleashing the Potential of Arm SME for Sparse Matrix Multiplication Acceleration
SYSU
Exploiting Efficient Mapping and Pipelined Execution for Accelerating SpMV on Tensor Cores
BUAA
VDHA: Vector-Driven Hash Aggregation for Sparse Matrix–Sparse Vector Multiplication on GPUs
THU
Quantization
RoMeo: Mitigating Dual-dimensional Outliers with Rotated Mixed Precision Quantization [Artifact]
THU
Cache Management
Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds
Alibaba Cloud
Misc
Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters
GaTech
zBuffer: Zero-Copy and Metadata-Free Serialization for Fast RPC with Scatter-Gather Reflection
XMU & Alibaba & SJTU
Acronyms
LLM: Large Language Model
GNN: Graph Neural Network
SpMM: Sparse Matrix-Matrix Multiplication
SpMV: Sparse Matrix-Vector Multiplication
RPC: Remote Procedure Call
Last updated
Was this helpful?