行业研究公司研究宏观策略财报招股书会议纪要海南封关低空经济 DeepSeek AIGC 大模型

AIGC驱动的3D场景理解及医学图像解析

2023-07-15李镇香港中文大学温***

AI智能总结

李镇博士在其研究报告中介绍了AIGC驱动的3D场景理解及医学图像解析。他通过讲述自己和团队的研究成果，着重介绍了AIGC算法辅助的下游应用。该报告重点关注了3D场景的自动稠密描述、室内场景的视觉定位以及3D视觉驱动的高保真说话人脸生成。李镇博士还分享了他们应用方案的架构设计与工程实践，并基于经验分享了在使用AIGC驱动的3D场景理解和医疗图像理解过程中的思考和对未来AIGC演进的展望。

香港中文大学（深圳）助理教授李镇博士 香港大学博士（师从余益州教授），芝加哥大学访问学者（师从许锦波教授）讲者介绍1 香港中文大学（深圳）理工学院/未来智联网络研究院助理院长/教授，校长青年学者李镇助理教授FNII助理院长目录 •AIGC驱动的3D室内场景稠密描述及视觉定位•AIGC驱动的3D高精度的说话人脸驱动及生成•AIGC驱动的结肠镜图片生成及解析案例简介 •300字以内进行概括性的案例介绍（突出亮点、案例独特性等）随着AIGC和ChatGPT等生成模型的迅速发展，我们探索出AIGC驱动的3D场景理解以及医疗场景的分析，并通过一系列自研的算法和工具，对AIGC算法辅助的下游应用进行了深入地研究，从3D场景的自动稠密描述，到室内场景的视觉定位，再到3D视觉驱动的高保真说话人脸生成，并推广到AIGC辅助的医疗场景的解析，我们均进行了深入地探讨。在本次分享中，我们将会从3D场景描述和定位，3D说话人脸生成，生成图片辅助的肠胃镜图片解析等方面，详解介绍我们应用方案的架构设计与工程实践，同时也会基于我们的经验分享在使用AIGC驱动的3D场景理解和医疗图像理解过程中的思考和对未来AIGC演进的展望。目录 •AIGC驱动的3D室内场景稠密描述及视觉定位 •AIGC驱动的3D高精度的说话人脸驱动及生成 •AIGC驱动的结肠镜图片生成及解析 InstanceRefer:CooperativeHolisticUnderstandingforVisualGroundingonPointCloudsthroughInstanceMulti-levelContextualReferring ZhihaoYuan1,†,XuYan1,†,YinghongLiao1,RuimaoZhang1ShengWang2,ZhenLi1,*,andShuguangCui1 1TheChineseUniversityofHongKong(Shenzhen),ShenzhenResearchInstituteofBigData2CryoEMCenter,SouthernUniversityofScienceandTechnology Background VisualGrounding: Visualgrounding(VG)aimsatlocalizingthedesiredobjectsorareasinanimageora3Dscenebasedonanobject-relatedlinguisticquery Background ScanRefer: 1.Exploitingobjectdetectiontogenerateproposalcandidates;2.Localizedescribedobjectbyfusinglanguagefeaturesintocandidates. Background ScanRefer: Cons: 1.Theobjectproposalsinthelarge3Dsceneareusuallyredundant;2.Theappearanceandattributeinformationisnotsufficientlycaptured;3.Therelationsamongproposalsandtheonesbetweenproposalsandbackgroundarenotfullystudied. •ScanRefergenerates114possiblecandidatesafterfilteringproposalsbytheirobjectnessscores;•Eachproposal’sfeatureisgeneratedbythedetectionframework;•Thereisnorelationreasoningamongproposals Method InstanceRefer: 1.Instance-levelcandidaterepresentation(smallnumber);2.Multi-levelcontextualinference(attribute,objects’relationandenvironment). Method InstanceReferArchitecture: Languagefeatureencoding(thesameasScanRefer). Method InstanceReferArchitecture: Extractinginstancesthroughpanopticsegmentation(predictinstanceandsemantics). Method InstanceReferArchitecture: Eliminatingirrelativeinstancesbythetargetcategory(inferredbylanguage). Method InstanceReferArchitecture: Generatingvisualfeatureofeachcandidatebymulti-levelreferring(threenovelmodulesareproposed). Method InstanceReferArchitecture: Scoringeachcandidatematchinglanguageandvisualfeatures(thecandidatewiththelargestscorewillberegardedasoutput). Method SpecificModules: (a)AttributePerception(AP)Module. •Itconstructafour-layerSparseConvolution(SparseConv)asthefeatureextractor;•Afteranaveragepooling,theglobalattributeperceptionfeatureisobtained. Method SpecificModules: (b)RelationPerception(RP)Module. •Itusesk-nearestneighborstoconstructagraph,wherenodes’featuresaretheirsemanticsobtainedbypanopticsegmentationandedgesareconsistedoftheirsemanticsandrelativeposition; •Dynamicgraphconvolutionnetwork(DGCNN)isexploitedtoupdatethenode’sfeature Method SpecificModules: (c)GlobalLocalizationPerception(GLP)Module. •ItusesSparseConvlayerswithheight-poolingtogeneratea3×3bird-eyes-view(BEV)plane;•Bycombininglanguagefeature,itpredictswhichgridthetargetobjectislocatedin;•ItinterpolatesprobabilitiesandgeneratestheglobalperceptionfeaturesbymergingfeaturesfromAPmodule. Method SpecificModules: (d)MatchingModule •AnaiveversionbyusingCosinesimilarity;•Anenhanceversionbyusingmodularco-attentionfromMCAN[1]. (e)ContrastiveObjective whereQ+andQ−denotethescoresofpositiveandnegativepairs. Results ScanRefer: Results Results Nr3D/Sr3D: InstanceRefer:CooperativeHolisticUnderstandingforVisualGroundingonPointCloudsthroughInstanceMulti-levelContextualReferring Thanksforwatching! ZhihaoYuan1,†,XuYan1,†,YinghongLiao1,RuimaoZhang1ShengWang2,ZhenLi1,*,andShuguangCui1 1TheChineseUniversityofHongKong(Shenzhen),ShenzhenResearchInstituteofBigData2CryoEMCenter,SouthernUniversityofScienceandTechnology X-Trans2Cap:Cross-ModalKnowledgeTransferusingTransformerfor3DDenseCaptioning ZhihaoYuan1,†,XuYan1,†,YinghongLiao1,YaoGuo2,GuanbinLi3,ShuguangCui1,ZhenLi1,* 1TheChineseUniversityofHongKong(Shenzhen),TheFutureNetworkofIntelligenceInstitute,ShenzhenResearchInstituteofBigData,2ShanghaiJiaoTongUniversity,3SunYat-senUniversity Background TaskDescription(3DDenseCaptioning) Background Limitations •TheobjectrepresentationsinScan2Caparedefectivesincetheyaresolelylearnedfromsparse3Dpointclouds,thusfailingtoprovidestrongtextureandcolorinformationcomparedwiththeonesgeneratedfrom2Dimages.•Itrequirestheextra2Dinputinbothtrainingandinferencephases.However,theextra2Dinformationisusuallycomputationintensiveandunavailableduringinference. X-Trans2Cap Motivation •WeproposeaCross-ModalKnowledgeTransferframeworkon3Ddensecaptioningtask.•Duringthetrainingphase,theteachernetworkexploitsauxiliary2Dmodalityandguidesthestudentnetworkthatonlytakespointcloudsasinputthroughthefeatureconsistencyconstraints.•Amorefaithfulcaptioncanbegeneratedonlyusingpointcloudsduringtheinference. X-Trans2Cap 2Dand3DInputs 3DProposals 2DProposals X-Trans2Ca

点击免费查看完整报告

你可能感兴趣

AIGC驱动的3D场景理解及医学图像解析

你可能感兴趣

传媒互联网周报：AIGC图像、视频、3D对象生成模型密集发布，各地积极探索数据资产通证化

计算机行业研究周报：浅谈AIGC扩散模型未来的发展方向及应用场景

【财联社早知道】炸裂！这家芯片公司一季度净利预增10倍，扣非净利预增超700倍；超越KIMI！又一AI爆款应用出现，这家公司是其官宣合作伙伴，公司搭建了图片及场景等多模态的AIGC数字工具矩阵-20240410

【风口研报·业绩】公司爆款AIGC图像视频应用已获超万次的下载量，上半年业绩逆势大幅增长、全年有望增超10倍，是稀缺的高增长+低估值标的

【点金互动易】 AI PCAIGC 基于AI芯片的Al PC项目已在研发当中，微软、AMD等企业是其重要合作伙伴，这家公司提供AIGC生成式AI本地化解决方案，可以在PC端通过大模型生成文字、图像等