# Miscellaneous

- [arXiv](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv.md): A free distribution service and an open-access archive.
- [2024](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2024.md)
- [Efficiently programming large language models using SGLang](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2024/sglang.md): LLM Inference
- [2023](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023.md)
- [HexGen: Generative inference of foundation model over heterogeneous decentralized environment](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023/hexgen.md)
- [High-throughput generative inference of large language models with a single GPU](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2023/flexgen.md): An offloading framework for high-throughput LLM inference.
- [2022](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022.md)
- [DisaggRec: Architecting disaggregated systems for large-scale personalized recommendation](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/disaggrec.md): #deep\_learning\_recommender\_system #memory\_disaggregation #total\_cost\_of\_ownership #RDMA
- [A case for disaggregation of ML data processing](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/tf-data.md)
- [Singularity: Planet-scale, preemptive and elastic scheduling of AI workloads](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/singularity.md): Live GPU job migration.
- [Aryl: An elastic cluster scheduler for deep learning](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2022/aryl.md)
- [2016](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016.md)
- [Wide & deep learning for recommender systems](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016/wide-and-deep-learning-for-recommender-systems.md): A recommender system with a wide & deep model (WDL).
- [Training deep nets with sublinear memory cost](https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016/training-deep-nets-with-sublinear-memory-cost.md): Reduce memory cost to store intermediate results and gradients.
- [MSR Technical Report](https://paper.lingyunyang.com/reading-notes/miscellaneous/msr-technical-report.md)
- [2011](https://paper.lingyunyang.com/reading-notes/miscellaneous/msr-technical-report/2011.md)
- [Heuristics for vector bin packing](https://paper.lingyunyang.com/reading-notes/miscellaneous/msr-technical-report/2011/heuristics-for-vector-bin-packing.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/miscellaneous.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
