Intelligent router for llm workloads: Improving performance through workload-aware scheduling
Kunal Jain, Anjaly Parayil, Ankur Mallick, Esha Choukse, Xiaoting Qin, Jue Zhang, Íñigo Goiri, Rujia Wang, Chetan Bansal, Victor Rühle, Anoop Kulkarni, Steve Kofsky, Saravan Rajmohan, Microsoft . 2025. Performance Aware LLM Load Balancer for Mixed Workloads. In The 5th Workshop on Machine Learning and Systems (EuroMLSys ’25), March 30-April 3, 2025, Rotterdam, Netherlands. ACM, New York, NY, USA, 12 pages. https://doi. org/10.1145/3721146.3721947