# ISCA 2024

## Meta Info

Homepage: <https://iscaconf.org/isca2024/>

Paper list: <https://www.iscaconf.org/isca2024/program/>

## Papers

### Large Language Models (LLMs)

* Splitwise: Efficient Generative LLM Inference Using Phase Splitting
  * Microsoft
  * **Best Paper Award**
* MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition
* Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
* LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference
* ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

### Mixture-of-Experts (MoEs)

* Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
  * MSRA
  * Pre-gating function to alleviate the dynamic nature of sparse expert activation. -> Address the large memory footprint.

### Recommendation Models

* Heterogeneous Acceleration Pipeline for Recommendation System Training \[[arXiv](https://arxiv.org/abs/2204.05436)]
  * UBC & GaTech
  * **Hotline**: a runtime framework.
  * Utilize CPU main memory for non-popular embeddings and GPUs’ HBM for popular embeddings.
  * Fragment a mini-batch into popular and non-popular micro-batches (μ-batches).

### Diffusion Models

* Cambricon-D: Full-Network Differential Acceleration for Diffusion Models
  * ICT, CAS
  * The first processor design to address Diffusion Model acceleration.
  * Mitigate additional memory accesses, while maintaining the concise computation from differential computing.

### Video Analytics

* DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

### Accelerators

* Intel Accelerator Ecosystem: An SoC-Oriented Perspective
  * Intel
  * Industry Session
