Efficiently serve dozens of fine-tuned models with vLLM on Amazon SageMaker AI and Amazon Bedrock

less than 1 minute read

Published:

Blog on our optimizations in vLLM for efficient multi-LoRA on MoE models, e.g., GPT-OSS and Qwen3-MoE.