ISCA 2024

Meta Info

Homepage: https://iscaconf.org/isca2024/

Paper list: https://www.iscaconf.org/isca2024/program/

Papers

Large Language Models (LLMs)

  • Splitwise: Efficient Generative LLM Inference Using Phase Splitting

    • Microsoft

    • Best Paper Award

  • MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition

  • Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization

  • LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference

  • ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

Mixture-of-Experts (MoEs)

  • Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

    • MSRA

    • Pre-gating function to alleviate the dynamic nature of sparse expert activation. -> Address the large memory footprint.

Recommendation Models

  • Heterogeneous Acceleration Pipeline for Recommendation System Training [arXiv]

    • UBC & GaTech

    • Hotline: a runtime framework.

    • Utilize CPU main memory for non-popular embeddings and GPUs’ HBM for popular embeddings.

    • Fragment a mini-batch into popular and non-popular micro-batches (μ-batches).

Diffusion Models

  • Cambricon-D: Full-Network Differential Acceleration for Diffusion Models

    • ICT, CAS

    • The first processor design to address Diffusion Model acceleration.

    • Mitigate additional memory accesses, while maintaining the concise computation from differential computing.

Video Analytics

  • DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

Accelerators

  • Intel Accelerator Ecosystem: An SoC-Oriented Perspective

    • Intel

    • Industry Session

Last updated