The Applied Sciences team has developed the small language model (SLM) Phi Silica, which enhances power efficiency, inference speed, and memory efficiency for Windows 11 Copilot+ PCs using Snapdragon X Series NPUs. Phi Silica is designed for on-device use and supports multiple languages, featuring a 4k context length. Microsoft announced that developers will have access to the Phi Silica API starting January 2025. The Copilot+ PCs can perform over 40 trillion operations per second, achieving significant performance improvements when connected to the cloud. Phi Silica utilizes a Cyber-EO compliant derivative of Phi-3.5-mini, and its architecture includes components such as a tokenizer, detokenizer, embedding model, transformer block, and language model head. The model's context processing consumes only 4.8mWh of energy on the NPU, with a 56% improvement in power consumption compared to CPU operation. Phi Silica features 4-bit weight quantization for efficiency, rapid time to first token, and high accuracy across languages. The model was developed using QuaRot for low-precision inference, achieving 4-bit quantization with minimal accuracy loss. Techniques like weight sharing and memory-mapped embeddings were employed to optimize memory usage, resulting in a ~60% reduction in memory consumption. Innovations such as a sliding window for context processing and a dynamic KV cache were introduced to expand context length. The model has undergone safety alignment and is subject to Responsible AI assessments and content moderation measures.