SageServe: Optimizing LLM Serving on Cloud Data Centers with Forecast Aware Auto-Scaling
Shashwat Jaiswal*, Kunal Jain*, Yogesh Simmhan, Anjaly Parayil, Ankur Mallick, Rujia Wang, Renee St Amant, et al. ‘Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale’. arXiv [Cs.DC], 2025. https://doi.org/10.48550/ARXIV.2502.14617.
