行业研究公司研究宏观策略财报招股书会议纪要稳定币低空经济 DeepSeek AIGC 智能驾驶大模型

AIGC浪潮下WebNN的演进与实践

信息技术2024-12-04-全球软件开发大会乐***

AI智能总结

WebNN概述与性能对比

背景

WebNN（Web Neural Network）为Web带来了神经网络的统一抽象，旨在简化跨平台的机器学习模型部署。

架构设计

WebNN包括计算图（Computation Graph）、输入缓冲区（Input Buffers）、中间结果缓冲区（Intermediate Buffers）和输出缓冲区（Output Buffers），支持CPU、GPU和NPU等不同类型的硬件加速。

性能对比

MediaPipe模型：在Intel Core Ultra 7处理器155H搭配集成Arc GPU的环境下，WebNN相对于原生实现的加速比约为3.0-4.5之间。
DirectML加速：WebNN DirectML在Intel Core Ultra 7处理器155H搭配集成Arc GPU的环境下，相对于原生DirectML的加速比约为70%-95%之间。
MobileNet V2、SqueezeNet 1.0、ResNet50 v1、EfficientNet Lite 4：在Intel Core Ultra 7处理器155H搭配集成AI Boost NPU的环境下，WebNN NPU相对于原生NPU的加速比分别为238%、95.8%、73.4%和86.1%。

具体示例

文本转语音演示：为Khan Academy Khanmigo提供WebNN执行提供程序，通过DirectML加速，运行在Intel Core Ultra 7处理器155H上。

总结

WebNN通过统一的抽象层实现了跨平台的神经网络加速，显著提升了模型的推理速度。在多种硬件配置下，WebNN展示了良好的性能表现，特别是在使用NPU加速时，能够获得较高的加速比。

演讲人：付俊伟胡宁馨,英特尔首席工程师,W3CWebNeuralNetwork(WebNN)标准的起草和主要编辑者,ChromiumcommitterandChromiumWebNN组件的主要拥有者张敏,IntelWebNN团队的技术经理,ChromiumandONNXRuntimeWebNNEP 的开发者,WebNNdeveloperpreview的作者付俊伟,英特尔高级软件工程师,ChromiumcommitterandChromiumWebNN的基础架构设计和ChromiumShapeDetectionAPI主要开发者目录 01 WebNN出现的背景 02 WebNN的架构设计 03 如何使用WebNN 04 WebNN的性能对比 https://microsoft.github.io/webnn-developer-preview/ WebNNExecutionProviderofONNXRuntimeWebwithGPUaccelerationfromDirectML. RunningonIntel®CoreUltra7processor155HwithintegratedArcGPU. Acatunderthesnow StableDiffusion Unet Step 1 TextEncoderImageGeneration Unet Step 2 Unet Step 3 Unet Step 4 ImageDecoder WebNNOperation matMul gather sigmoid softmax DirectML GEMM GATHER LOGISTIC SOFTMAX TFLite BATH_MATMUL GATHER ACTIVATION_SIGMOID ACTIVATION_SOFTMAX CoreML matmul gather_along_axis sigmoid softmax 运用场景 Image Classification ObjectDetection NoiseSuppression NaturalLanguage BackgroundSegmentation 框架 Transformers.js MediaPipeWeb ONNXRuntimeWeb TensorFlow.js WebAPI WebAssembly WebGPU WebNN APIextensions Web引擎 JavaScriptRuntime (e.g.,Electron/Node.js) WebBrowser (e.g.,Chrome/Edge) 系统 MLAPIs OtherMLOSAPIs WindowsStudioEffects CoreML DirectML TFLite 硬件 NPU GPU CPU create build ComputationalGraph(Web) conv2dadd relu compile MLGraphcompute CompiledGraph(Native) Fusedconv2d output input bias filter OutputBuffers(CPU/GPU) output tmp tmp input InputBuffers(CPU/GPU) MLGraphBuilder MLContext devicetype:cpu/gpu/npu powerpreference:high-perf/low-power WebNNAPI OtherWebAPI CallflowDataflow WebNN为Web带来了神经网络的统一抽象 JSMLFrameworks WebApplication Apps/Frameworks Chromium RendererProcess WebNNMojoClient MLGraph MLGraphBuilder MLContext GPU/UtilityProcess IPC WebNNMojoServer CoreMLBackend DirectMLBackend TFLiteBackend NativeMLAPIsOSDrivers macOS CoreMLBNNS/MPS Windows DirectMLMCDM Android/ChromeOS/Linux TFLite XNNPACK/Delegate HardwareGPUNPUCPU 1.18release input WasmKernels weights bias intermediate weights intermediate WebNNGraph Intermediate WasmKernels IntegrationStatus NPU GPU CPU NativeNPUKernels NativeGPUKernels NativeCPUKernels BrowserswithWebNNsupport WebGLKernels WebGPUKernels WebNNGraph WasmKernels ONNXRuntimeWeb TensorFlowLiteWeb WebApplication Post-Processing MatMul Conv2d Pre-processing Prototype Available https://microsoft.github.io/webnn-developer-preview/ WebNNExecutionProviderofONNXRuntimeWebwithGPUaccelerationfromDirectML. RunningonIntel®CoreUltra7processor155HwithintegratedArcGPU. VanillaJS(plainJavaScript)useofWebNNAPI,withNPUaccelerationfromDirectML. RunningonIntel®CoreUltra7processor155HwithintegratedIntel®AIBoostNPU. 5.0 4.5 4.0 InferenceSpeedup 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 MediaPipeModelsInferencePerformance(Normalized/HigherisBetter) 4.44.5 3.03.1 3.31.2 33.2 .0 3.03.2 3.03.1 2.9 3.0 3.31.2 3.31.3 3.1 2.8 3.0 2.8 2.5 2.9 2.5 2.9 2.2 1.8 2.7 2.3 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 100.0% 90.0% WebNNvs.NativeRatio 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% •Browser:ChromeCanary118.0.5943.0 •DUT:Dell/Linux/i7-1260P,singlep-core •Workloads:MediaPipesolutionmodels(FP32,batch=1) WasmSIMDWebNNXNNPackNativeXNNPackWebNNvsNative WebNNDirectMLvs.NativeDirectML 10000120.0 InferenceTime(ms)(Logscale) 1000 100.0 95.0 87.987.3 87.4 88.8 88.1 89.5 87.6 86.7 91.4 93.2 95.6 91.5 81.5 82.6 82.4 85.3 75.8 78.6 79.0 81.5 71.4 72.0 71.5 76.5 73.0 Percentage(%) 80.0 10060.0 40.0 10 20.0 10.0 •Browser:ChromeCanary126.0.6459.0 •OS:Windows11Pro23H2 •DUT:AsusZenbook •CPU:Intel(R)Core(TM)Ultra7155H3.80GHz •GPU:Intel(R)Arc(TM)Graphics •GPUDriver:31.0.101.5512 WebNNGPUNativeDirectMLWebNNGPUvs.NativeDirectML 8.00 7.00 InferenceTime(ms) 6.00 WebNNDirectMLvsNativeonMTLNPU 95.8% 73.4 86.1% % 100.0% 90.0% 80.0% 5.00 4.00 3.00 2.00 1.00 0.00 62.7% MobileNetV2SqueezeNet1.0 70.0% WebNNvsNative(%) 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% ResNet50v1EffiecientNetLite4 WebNNDirectMLNPUNativeNPUWebNNNPUvsNative •Browser:ChromeCanary126.0.6459.0 •OS:Windows11Pro23H2 •DUT:AsusZenbook •CPU:Intel(R)Core(TM)Ultra7155H3.80GHz •NPU:Intel(R)AIBoost •NPUDriver:32.0.100.2381 Theaverageperformanceoflisted4modelsonWebNNDirectMLisabout80%ofnativeDMLonMTLNPU SpeechtoTextPoCDemoforKhanAcademyKhanmigo. WebNNExecutionProviderofONNXRuntimeWebwithNPUaccelerationfromDirectML. RunningonIntel®CoreUltra7processor155HwithintegratedIntel®AIBoostNPU. • •  • • • • • • • THANKS 大模型正在重新定义软件 LargeLanguageModelIsRedefiningTheSoftware

点击免费查看完整报告