# ISCA 2024

## Meta Info

Homepage: <https://iscaconf.org/isca2024/>

Paper list: <https://www.iscaconf.org/isca2024/program/>

## Papers

### Large Language Models (LLMs)

* Splitwise: Efficient Generative LLM Inference Using Phase Splitting
  * Microsoft
  * **Best Paper Award**
* MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition
* Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
* LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference
* ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

### Mixture-of-Experts (MoEs)

* Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
  * MSRA
  * Pre-gating function to alleviate the dynamic nature of sparse expert activation. -> Address the large memory footprint.

### Recommendation Models

* Heterogeneous Acceleration Pipeline for Recommendation System Training \[[arXiv](https://arxiv.org/abs/2204.05436)]
  * UBC & GaTech
  * **Hotline**: a runtime framework.
  * Utilize CPU main memory for non-popular embeddings and GPUs’ HBM for popular embeddings.
  * Fragment a mini-batch into popular and non-popular micro-batches (μ-batches).

### Diffusion Models

* Cambricon-D: Full-Network Differential Acceleration for Diffusion Models
  * ICT, CAS
  * The first processor design to address Diffusion Model acceleration.
  * Mitigate additional memory accesses, while maintaining the concise computation from differential computing.

### Video Analytics

* DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

### Accelerators

* Intel Accelerator Ecosystem: An SoC-Oriented Perspective
  * Intel
  * Industry Session


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/conference/isca-2024.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
