PPoPP 2026

Meta Info

Homepage: https://ppopp26.sigplan.org/

Paper list: https://ppopp26.sigplan.org/track/PPoPP-2026-papers

Papers

LLM

  • LLM training

    • CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training

      • ICT, CAS & Ant Group

    • COCCL: A Collective Communication Library Supporting Easy Integration and Configuration of Customized Compression for Scalable LLM Training

      • ICT, CAS & CUHK-SZ

    • Elastor: Elastic and Efficient Model Partitioning and Checkpointing for Fault-tolerant Distributed Training

      • PKU

    • HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism [arXiv] [Code]

      • NUS

  • LLM inference

    • JanusQuant: Accurate and Efficient 2-bit KV Cache Quantization for Long-context Inference

      • WHU

    • Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving

      • SYSU

    • High-Throughput Non-Uniformly Quantized 3-bit LLM Inference

      • CUHK & HKUST

    • Accelerating Sparse Transformer Inference on GPU

      • CUP-Beijing & BUAA

  • Attention

    • FlashAttention-T: Towards Fully Tensorized Attention by Exploiting Tensor-Vector Parallelism

      • USTC & ICT, CAS

    • MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends [arXiv] [Code]

      • SJTU, IPADS & PKU & MSRA

Diffusion Models

  • Difflow: A Data-Characteristic-Aware Serving System for Diffusion Models

    • THU

  • MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models

    • UWaterloo & CMU & Rice

GNN

  • APERTURE: Algorithm-System Co-Optimization for Temporal Graph Network Inference

    • BUAA

  • ElasGNN: An Elastic Training Framework for Distributed GNN Training

    • BUAA

  • TAC: Cache-based System for Accelerating Billion-Scale GNN Training on Multi-GPU Platform

    • UCAS

Sparse Matrix

  • ASM-SpMM: Unleashing the Potential of Arm SME for Sparse Matrix Multiplication Acceleration

    • SYSU

  • Exploiting Efficient Mapping and Pipelined Execution for Accelerating SpMV on Tensor Cores

    • BUAA

  • VDHA: Vector-Driven Hash Aggregation for Sparse Matrix–Sparse Vector Multiplication on GPUs

    • THU

Quantization

  • RoMeo: Mitigating Dual-dimensional Outliers with Rotated Mixed Precision Quantization [Artifact]

    • THU

Cache Management

  • Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds

    • Alibaba Cloud

Misc

  • Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters

    • GaTech

  • zBuffer: Zero-Copy and Metadata-Free Serialization for Fast RPC with Scatter-Gather Reflection

    • XMU & Alibaba & SJTU

Acronyms

  • LLM: Large Language Model

  • GNN: Graph Neural Network

  • SpMM: Sparse Matrix-Matrix Multiplication

  • SpMV: Sparse Matrix-Vector Multiplication

  • RPC: Remote Procedure Call

Last updated

Was this helpful?