我想要解析大的XML文件并将其存储为数据库(Mysql) XML,如下所示:文件XML ~ 200MB我该如何解析这个XML文件?如何获得像这样的子元素。它有两个部分'vuln‘和'vulnerable-configuration’谢谢!
<entry id="CVE-2015-0002">
<vuln:vulnerable-configuration id="http://www.nist.gov/">
<cpe-lang:logical-test operator="OR" negate="false">
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_7:-:sp1"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_server_2008:r2:sp1"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_8:-"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_8.1:-"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_server_2012:-:gold"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_server_2012:r2::~~~x64~~"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_rt:-:gold"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_rt_8.1:-"/>
</cpe-lang:logical-test>
</vuln:vulnerable-configuration>
<vuln:vulnerable-software-list>
<vuln:product>cpe:/o:microsoft:windows_server_2012:-:gold</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_rt:-:gold</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_7:-:sp1</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_rt_8.1:-</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_server_2012:r2::~~~x64~~</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_8:-</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_8.1:-</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_server_2008:r2:sp1</vuln:product>
</vuln:vulnerable-software-list>
<vuln:cve-id>CVE-2015-0002</vuln:cve-id>
<vuln:published-datetime>2015-01-13T17:59:01.253-05:00</vuln:published-datetime>
<vuln:last-modified-datetime>2015-01-14T16:51:14.253-05:00</vuln:last-modified-datetime>
<vuln:cvss>
<cvss:base_metrics>
<cvss:score>7.2</cvss:score>
<cvss:access-vector>LOCAL</cvss:access-vector>
<cvss:access-complexity>LOW</cvss:access-complexity>
<cvss:authentication>NONE</cvss:authentication>
<cvss:confidentiality-impact>COMPLETE</cvss:confidentiality-impact>
<cvss:integrity-impact>COMPLETE</cvss:integrity-impact>
<cvss:availability-impact>COMPLETE</cvss:availability-impact>
<cvss:source>http://nvd.nist.gov</cvss:source>
<cvss:generated-on-datetime>2015-01-14T16:20:33.273-05:00</cvss:generated-on-datetime>
</cvss:base_metrics>
</vuln:cvss>
<vuln:cwe id="CWE-264"/>
<vuln:references xml:lang="en" reference_type="VENDOR_ADVISORY">
<vuln:source>MS</vuln:source>
<vuln:reference href="http://technet.microsoft.com/security/bulletin/MS15-001" xml:lang="en">MS15-001</vuln:reference>
</vuln:references>
<vuln:references xml:lang="en" reference_type="UNKNOWN">
<vuln:source>MISC</vuln:source>
<vuln:reference href="https://code.google.com/p/google-security-research/issues/detail?id=118" xml:lang="en">https://code.google.com/p/google-security-research/issues/detail?id=118</vuln:reference>
</vuln:references>
<vuln:references xml:lang="en" reference_type="UNKNOWN">
<vuln:source>MISC</vuln:source>
<vuln:reference href="http://www.zdnet.com/article/google-discloses-unpatched-windows-vulnerability/" xml:lang="en">http://www.zdnet.com/article/google-discloses-unpatched-windows-vulnerability/</vuln:reference>
</vuln:references>
<vuln:references xml:lang="en" reference_type="UNKNOWN">
<vuln:source>MISC</vuln:source>
<vuln:reference href="http://twitter.com/sambowne/statuses/550384131683520512" xml:lang="en">http://twitter.com/sambowne/statuses/550384131683520512</vuln:reference>
</vuln:references>
<vuln:summary>The AhcVerifyAdminContext function in ahcache.sys in the Application Compatibility component in Microsoft Windows 7 SP1, Windows Server 2008 R2 SP1, Windows 8, Windows 8.1, Windows Server 2012 Gold and R2, and Windows RT Gold and 8.1 does not verify that an impersonation token is associated with an administrative account, which allows local users to gain privileges by running AppCompatCache.exe with a crafted DLL file, aka MSRC ID 20544 or "Microsoft Application Compatibility Infrastructure Elevation of Privilege Vulnerability."</vuln:summary>
</entry>发布于 2016-07-25 17:43:43
部分答案。
首先,看一下这个链接来回答你的大部分问题,How to import XML with nested nodes (parent/child relationships) into Access?
导入XML以供Access使用,并使用文件转换XML,以便每个子表都获得关键字vuln:cve-id以链接回主表条目
下面的代码适用于一些子表,但不是所有的子表,如果有人可以指出为什么它不适用于所有的子表,请这样做。但是,它确实给出了主表,其中包含vuln:cve-id vuln:published-datetime vuln:last-modified-datetime vuln:summary加上cvss:base_metrics cvss:score cvss:access-vector cvss:access-complexity cvss:source cvss:generated-on-datetime。
将以下代码放入名为transform.xslt的文件中,并在导入access时使用它。你需要添加适当的XSL标题,我不能在这篇文章中添加它们,因为“你需要至少10个声誉才能发布2个以上的链接”:-(
<xsl:template match="/">
<dataroot>
<xsl:apply-templates select="@*|node()"/>
</dataroot>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="entry">
<xsl:apply-templates select="@*|node()"/>
</xsl:template>
<xsl:template match="cpe-lang:logical-test">
<cpe-lang:logical-test>
<vuln:cve-id><xsl:value-of select="../../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</cpe-lang:logical-test>
</xsl:template>
<xsl:template match="vuln:vulnerable-configuration">
<vuln:vulnerable-configuration>
<vuln:cve-id><xsl:value-of select="../../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</vuln:vulnerable-configuration>
</xsl:template>
<xsl:template match="vuln:vulnerable-software-list">
<vuln:vulnerable-software-list>
<vuln:cve-id><xsl:value-of select="../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</vuln:vulnerable-software-list>
</xsl:template>
<xsl:template match="cvss:base_metrics">
<cvss:base_metrics>
<vuln:cve-id><xsl:value-of select="../../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</cvss:base_metrics>
</xsl:template>
<xsl:template match="vuln:references">
<vuln:references>
<vuln:cve-id><xsl:value-of select="../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</vuln:references>
</xsl:template>
<xsl:template match="vuln:scanner">
<vuln:scanner>
<vuln:cve-id><xsl:value-of select="../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</vuln:scanner>
</xsl:template>
发布于 2016-12-21 04:30:00
我知道这有点老生常谈了,但我做了一些与你的问题类似的工作。这是一些相当丑陋的代码(在几个小时内就完成了),但我认为它能完成您要求的大多数工作,除了导出到数据库。它使用巨大的XML文件(CVE),解析它们以获得特定的键/值对,并将它们与网络扫描进行比较。
https://github.com/bhealy/netScan
import xml.etree.ElementTree as ET
tree = ET.parse(XMLfile)
root = tree.getroot()
stuffYouCareAbout = root[0][1][2][3].text我能够使用etree解析XML文件,这使得查找特定项目变得容易得多。显然,示例正在查看一个非常具体的索引,但这应该是一个很好的起点(如果这篇文章还不算太晚的话!)
https://stackoverflow.com/questions/35792373
复制相似问题