ISCA 2024
Meta Info
Homepage: https://iscaconf.org/isca2024/
Paper list: https://www.iscaconf.org/isca2024/program/
Papers
Large Language Models (LLMs)
Splitwise: Efficient Generative LLM Inference Using Phase Splitting
Microsoft
Best Paper Award
MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
Mixture-of-Experts (MoEs)
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
MSRA
Pre-gating function to alleviate the dynamic nature of sparse expert activation. -> Address the large memory footprint.
Recommendation Models
Heterogeneous Acceleration Pipeline for Recommendation System Training [arXiv]
UBC & GaTech
Hotline: a runtime framework.
Utilize CPU main memory for non-popular embeddings and GPUs’ HBM for popular embeddings.
Fragment a mini-batch into popular and non-popular micro-batches (μ-batches).
Diffusion Models
Cambricon-D: Full-Network Differential Acceleration for Diffusion Models
ICT, CAS
The first processor design to address Diffusion Model acceleration.
Mitigate additional memory accesses, while maintaining the concise computation from differential computing.
Video Analytics
DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics
Accelerators
Intel Accelerator Ecosystem: An SoC-Oriented Perspective
Intel
Industry Session
Last updated