演讲人:付俊伟 胡宁馨,英特尔首席工程师,W3CWebNeuralNetwork(WebNN)标准的起草和主要编辑者,ChromiumcommitterandChromiumWebNN组件的主要拥有者 张敏,IntelWebNN团队的技术经理,ChromiumandONNXRuntimeWebNNEP 的开发者,WebNNdeveloperpreview的作者 付俊伟,英特尔高级软件工程师,ChromiumcommitterandChromiumWebNN的 基础架构设计和ChromiumShapeDetectionAPI主要开发者 目录 01 WebNN出现的背景 02 WebNN的架构设计 03 如何使用WebNN 04 WebNN的性能对比 https://microsoft.github.io/webnn-developer-preview/ WebNNExecutionProviderofONNXRuntimeWebwithGPUaccelerationfromDirectML. RunningonIntel®CoreUltra7processor155HwithintegratedArcGPU. Acatunderthesnow StableDiffusion Unet Step 1 TextEncoderImageGeneration Unet Step 2 Unet Step 3 Unet Step 4 ImageDecoder WebNNOperation matMul gather sigmoid softmax DirectML GEMM GATHER LOGISTIC SOFTMAX TFLite BATH_MATMUL GATHER ACTIVATION_SIGMOID ACTIVATION_SOFTMAX CoreML matmul gather_along_axis sigmoid softmax 运用场景 Image Classification ObjectDetection NoiseSuppression NaturalLanguage BackgroundSegmentation 框架 Transformers.js MediaPipeWeb ONNXRuntimeWeb TensorFlow.js WebAPI WebAssembly WebGPU WebNN APIextensions Web引擎 JavaScriptRuntime (e.g.,Electron/Node.js) WebBrowser (e.g.,Chrome/Edge) 系统 MLAPIs OtherMLOSAPIs WindowsStudioEffects CoreML DirectML TFLite 硬件 NPU GPU CPU create build ComputationalGraph(Web) conv2dadd relu compile MLGraphcompute CompiledGraph(Native) Fusedconv2d output input bias filter OutputBuffers(CPU/GPU) output tmp tmp input InputBuffers(CPU/GPU) MLGraphBuilder MLContext devicetype:cpu/gpu/npu powerpreference:high-perf/low-power WebNNAPI OtherWebAPI CallflowDataflow WebNN为Web带来了神经网络的统一抽象 JSMLFrameworks WebApplication Apps/Frameworks Chromium RendererProcess WebNNMojoClient MLGraph MLGraphBuilder MLContext GPU/UtilityProcess IPC WebNNMojoServer CoreMLBackend DirectMLBackend TFLiteBackend NativeMLAPIsOSDrivers macOS CoreMLBNNS/MPS Windows DirectMLMCDM Android/ChromeOS/Linux TFLite XNNPACK/Delegate HardwareGPUNPUCPU 1.18release input WasmKernels weights bias intermediate weights intermediate WebNNGraph Intermediate WasmKernels IntegrationStatus NPU GPU CPU NativeNPUKernels NativeGPUKernels NativeCPUKernels BrowserswithWebNNsupport WebGLKernels WebGPUKernels WebNNGraph WasmKernels ONNXRuntimeWeb TensorFlowLiteWeb WebApplication Post-Processing MatMul Conv2d Pre-processing Prototype Available https://microsoft.github.io/webnn-developer-preview/ WebNNExecutionProviderofONNXRuntimeWebwithGPUaccelerationfromDirectML. RunningonIntel®CoreUltra7processor155HwithintegratedArcGPU. VanillaJS(plainJavaScript)useofWebNNAPI,withNPUaccelerationfromDirectML. RunningonIntel®CoreUltra7processor155HwithintegratedIntel®AIBoostNPU. 5.0 4.5 4.0 InferenceSpeedup 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 MediaPipeModelsInferencePerformance(Normalized/HigherisBetter) 4.44.5 3.03.1 3.31.2 33.2 .0 3.03.2 3.03.1 2.9 3.0 3.31.2 3.31.3 3.1 2.8 3.0 2.8 2.5 2.9 2.5 2.9 2.2 1.8 2.7 2.3 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 100.0% 90.0% WebNNvs.NativeRatio 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% •Browser:ChromeCanary118.0.5943.0 •DUT:Dell/Linux/i7-1260P,singlep-core •Workloads:MediaPipesolutionmodels(FP32,batch=1) WasmSIMDWebNNXNNPackNativeXNNPackWebNNvsNative WebNNDirectMLvs.NativeDirectML 10000120.0 InferenceTime(ms)(Logscale) 1000 100.0 95.0 87.987.3 87.4 88.8 88.1 89.5 87.6 86.7 91.4 93.2 95.6 91.5 81.5 82.6 82.4 85.3 75.8 78.6 79.0 81.5 71.4 72.0 71.5 76.5 73.0 Percentage(%) 80.0 10060.0 40.0 10 20.0 10.0 •Browser:ChromeCanary126.0.6459.0 •OS:Windows11Pro23H2 •DUT:AsusZenbook •CPU:Intel(R)Core(TM)Ultra7155H3.80GHz •GPU:Intel(R)Arc(TM)Graphics •GPUDriver:31.0.101.5512 WebNNGPUNativeDirectMLWebNNGPUvs.NativeDirectML 8.00 7.00 InferenceTime(ms) 6.00 WebNNDirectMLvsNativeonMTLNPU 95.8% 73.4 86.1% % 100.0% 90.0% 80.0% 5.00 4.00 3.00 2.00 1.00 0.00 62.7% MobileNetV2SqueezeNet1.0 70.0% WebNNvsNative(%) 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% ResNet50v1EffiecientNetLite4 WebNNDirectMLNPUNativeNPUWebNNNPUvsNative •Browser:ChromeCanary126.0.6459.0 •OS:Windows11Pro23H2 •DUT:AsusZenbook •CPU:Intel(R)Core(TM)Ultra7155H3.80GHz •NPU:Intel(R)AIBoost •NPUDriver:32.0.100.2381 Theaverageperformanceoflisted4modelsonWebNNDirectMLisabout80%ofnativeDMLonMTLNPU SpeechtoTextPoCDemoforKhanAcademyKhanmigo. WebNNExecutionProviderofONNXRuntimeWebwithNPUaccelerationfromDirectML. RunningonIntel®CoreUltra7processor155HwithintegratedIntel®AIBoostNPU. • • • • • • • • • THANKS 大模型正在重新定义软件 LargeLanguageModelIsRedefiningTheSoftware