Mu is a small language model designed to run locally on Copilot+ PCs, specifically optimized for Neural Processing Units (NPUs). It is a 330M encoder-decoder model that translates natural language queries into actionable settings function calls, enhancing user experience. Mu achieved approximately 47% lower first-token latency and 4.7× higher decoding speed compared to a decoder-only model on a Qualcomm Hexagon NPU. The model incorporates advanced techniques such as Dual LayerNorm, Rotary Positional Embeddings, and Grouped-Query Attention to improve performance while maintaining a compact size. It was trained using A100 GPUs on Azure, pre-training on hundreds of billions of educational tokens and fine-tuning on tasks like SQUAD, CodeXGlue, and the Windows Settings agent. Mu's quantization techniques converted model weights from floating-point to integer representations, optimizing performance on edge devices. The Settings agent was fine-tuned with a dataset of 3.6 million samples to achieve response times under 500 milliseconds, effectively managing user queries related to Windows settings.