DNN batching inference system to reduce the latency and improve the throughput.
DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs
Last updated 2 years ago