> For the complete documentation index, see [llms.txt](https://paper.lingyunyang.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://paper.lingyunyang.com/reading-notes/conference/isca-2024.md).

# ISCA 2024

## Meta Info

Homepage: <https://iscaconf.org/isca2024/>

Paper list: <https://www.iscaconf.org/isca2024/program/>

## Papers

### Large Language Models (LLMs)

* Splitwise: Efficient Generative LLM Inference Using Phase Splitting
  * Microsoft
  * **Best Paper Award**
* MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition
* Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
* LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference
* ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

### Mixture-of-Experts (MoEs)

* Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
  * MSRA
  * Pre-gating function to alleviate the dynamic nature of sparse expert activation. -> Address the large memory footprint.

### Recommendation Models

* Heterogeneous Acceleration Pipeline for Recommendation System Training \[[arXiv](https://arxiv.org/abs/2204.05436)]
  * UBC & GaTech
  * **Hotline**: a runtime framework.
  * Utilize CPU main memory for non-popular embeddings and GPUs’ HBM for popular embeddings.
  * Fragment a mini-batch into popular and non-popular micro-batches (μ-batches).

### Diffusion Models

* Cambricon-D: Full-Network Differential Acceleration for Diffusion Models
  * ICT, CAS
  * The first processor design to address Diffusion Model acceleration.
  * Mitigate additional memory accesses, while maintaining the concise computation from differential computing.

### Video Analytics

* DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

### Accelerators

* Intel Accelerator Ecosystem: An SoC-Oriented Perspective
  * Intel
  * Industry Session


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/conference/isca-2024.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
