Last updated 1 year ago
Was this helpful?
Homepage:
Paper List:
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time []
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU [] []