AIGC驱动的3D场景理解及医学图像解析 香港中文大学(深圳)助理教授 李镇博士 1讲者介绍 香港大学博士(师从余益州教授),芝加哥大学访问学者(师从许锦波教授) 香港中文大学(深圳)理工学院/未来智联网络研究院助理院长/教授,校长青年学者 香港中文大学(深圳)深度比特实验室主任博士后:1名,博士生:8名,研究生:2名 李镇助理教授 FNII助理院长 CASP12接触图预测全球冠军,并作为AlphaFoldV1的基线方案 人才PLOSCB2018年创新与突破奖(一年一例) 荣誉中国科协2019年青年托举人才 2022年05月CAMEO蛋白打分月度第一,2022SemancticKITTI分割竞赛第一,2023CVPR 主H持O国I4家D自分然割科竞学赛基第金一青,年2项01目8全1项球气象预测大赛第一,ICCV2022Urban3D第二等 科研主持深港A类项目“深度学习辅助的RNA蛋白结构预测以及蛋白高亲和性RNA设计”(300 学术 万) CCF-腾讯犀牛鸟2019优秀奖,2022年犀牛鸟专项 参与科技部国家重点研发项目 合作牵头国家自然科学基金重点项目,合作牵头粤深联合基金重点项目 2 目录 •AIGC驱动的3D室内场景稠密描述及视觉定位 •AIGC驱动的3D高精度的说话人脸驱动及生成 •AIGC驱动的结肠镜图片生成及解析 案例简介 •300字以内进行概括性的案例介绍(突出亮点、案例独特性等) 随着AIGC和ChatGPT等生成模型的迅速发展,我们探索🎧AIGC驱动的3D场景理解以及医疗场景的分析,并通过一系列自研的算法和工具,对AIGC算法辅助的下游应用进行了深入地研究,从3D场景的自动稠密描述,到室内场景的视觉定位,再到3D视觉驱动的高保真说话人脸生成,并推广到AIGC辅助的医疗场景的解析,我们均进行了深入地探讨。在本次分享中,我们将会从3D场景描述和定位,3D说话人脸生成,生成图片辅助的肠胃镜图片解析等方面,详解介绍我们应用方案的架构设计与工程实践,同时也会基于我们的经验分享在使用AIGC驱动的3D场景理解和医疗图像理解过程中的思考和对未来AIGC演进的展望。 目录 •AIGC驱动的3D室内场景稠密描述及视觉定位 •AIGC驱动的3D高精度的说话人脸驱动及生成 •AIGC驱动的结肠镜图片生成及解析 InstanceRefer:CooperativeHolisticUnderstandingforVisualGroundingonPointCloudsthroughInstanceMulti-levelContextualReferring ZhihaoYuan1,†,XuYan1,†,YinghongLiao1,RuimaoZhang1 ShengWang2,ZhenLi1,*,andShuguangCui1 1TheChineseUniversityofHongKong(Shenzhen),ShenzhenResearchInstituteofBigData 2CryoEMCenter,SouthernUniversityofScienceandTechnology Background VisualGrounding: Visualgrounding(VG)aimsatlocalizingthedesiredobjectsorareasinanimageora3Dscenebasedonanobject-relatedlinguisticquery ScanRefer:3DObjectLocalizationinRGB-DScansusingNaturalLanguage ScanRefer: 1.Exploitingobjectdetectiontogenerateproposalcandidates; 2.Localizedescribedobjectbyfusinglanguagefeaturesintocandidates. ScanRefer:3DObjectLocalizationinRGB-DScansusingNaturalLanguage ScanRefer:Cons: 1.Theobjectproposalsinthelarge3Dsceneareusuallyredundant; 2.Theappearanceandattributeinformationisnotsufficientlycaptured; 3.Therelationsamongproposalsandtheonesbetweenproposalsandbackgroundarenotfullystudied. •ScanRefergenerates114possiblecandidatesafterfilteringproposalsbytheirobjectnessscores; •Eachproposal’sfeatureisgeneratedbythedetectionframework; •Thereisnorelationreasoningamongproposals ScanRefer:3DObjectLocalizationinRGB-DScansusingNaturalLanguage InstanceRefer: 1.Instance-levelcandidaterepresentation(smallnumber); 2.Multi-levelcontextualinference(attribute,objects’relationandenvironment). InstanceReferArchitecture: Languagefeatureencoding(thesameasScanRefer). Description WordEmbeddingW WordFeaturesE Thereisagrayandblueleatherchair.Placedinarawwithotherchairsinthesideofthewall. GloVE BiGRU InstanceReferArchitecture: Extractinginstancesthroughpanopticsegmentation(predictinstanceandsemantics). Description WordEmbeddingW WordFeaturesE Thereisagrayandblueleatherchair.Placedinarawwithotherchairsinthesideofthewall. GloVE BiGRU InstanceMaskI PanopticSegmentation extract InputPointCloudP (Table) (Chair) ...... (Chair) (Chair) SemanticsSInstances Method InstanceReferArchitecture: Eliminatingirrelativeinstancesbythetargetcategory(inferredbylanguage). Description WordEmbeddingW WordFeaturesE TargetPrediction (Table) “Chair” (Chair) ...... (Chair)filter(chaironly) ...... (Chair) Thereisagrayandblueleatherchair.Placedinarawwithotherchairsinthesideofthewall. GloVE BiGRU InstanceMaskI PanopticSegmentation extract InputPointCloudP SemanticsSInstancesCandidates InstanceReferArchitecture: Generatingvisualfeatureofeachcandidatebymulti-levelreferring(threenovelmodulesareproposed). Description WordEmbeddingW WordFeaturesE TargetPrediction Thereisagrayandblueleatherchair.Placedinarawwithotherchairsinthesideofthewall. GloVE BiGRU InstanceMaskI PanopticSegmentation extract InputPointCloudP (Table) (Chair) ...... (Chair) (Chair) “Chair” filter(chaironly) AP ...... RP , Multi-Level GLP ,P, SemanticsSInstancesCandidates VisualContext InstanceReferArchitecture: Scoringeachcandidatematchinglanguageandvisualfeatures(thecandidatewiththelargestscorewillberegardedasoutput). Description WordEmbeddingW WordFeaturesE TargetPrediction AttentionPooling (0.95) (0.31) ...... (0.03) matching Thereisagrayandblueleatherchair.Placedinarawwithotherchairsinthesideofthewall. GloVE BiGRU InstanceMaskI PanopticSegmentation extract InputPointCloudP (Table) (Chair) ...... (Chair) (Chair) “Chair” filter(chaironly) AP ...... RP , Multi-Level GLP ,P, SimilarityScoreQ SemanticsSInstancesCandidates VisualContext SpecificModules: (a)AttributePerception(AP)Module. •Itconstructafour-layerSparseConvolution(SparseConv)asthefeatureextractor; •Afteranaveragepooling,theglobalattributeperceptionfeatureisobtained. SpecificModules: (b)RelationPerception(RP)Module. •Itusesk-nearestneighborstoconstructagraph,wherenodes’featuresaretheirsemanticsobtainedbypanopticsegmentationandedgesareconsistedoftheirsemanticsandrelativeposition; •Dynamicgraphconvolutionnetwork(DGCNN)isexploitedtoupdatethenode’sfeature SpecificModules: (c)GlobalLocalizationPerception(GLP)Module. •ItusesSparseConvla