我正在尝试在spark中将xml文件读取为DF。
XML文件:
<cool>
<incollection mdate="2002-01-03" key="books/acm/kim95/Blakeley95">
<author>José A. Blakeley</author>
<title>OQL[C++]: Extending C++ with an Object Query Capability.</title>
<pages>69-88</pages>
<booktitle>Modern Database Systems</booktitle>
<url>db/books/collections/kim95.html#Blakeley95</url>
<year>1995</year>
</incollection>
</cool>代码:
val corrupt_records_handled_DF=spark.read.format("xml").option("rootTag","cool").option("rowTag","incollection").load("/usr/local/inputs/temp.xml")我把它当做腐败记录。
Spark版本: 2.4.6包: com.databricks:spark-xml_2.11:0.9.0
输出:
scala> val corrupt_records_handled_DF=spark.read.format("xml").option("rootTag","cool").option("rowTag","incollection").load("/usr/local/inputs/temp.xml")
corrupt_records_handled_DF: org.apache.spark.sql.DataFrame = [_corrupt_record: string]
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|_corrupt_record |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|<incollection mdate="2002-01-03" key="books/acm/kim95/Blakeley95">
<author>José A. Blakeley</author>
<title>OQL[C++]: Extending C++ with an Object Query Capability.</title>
<pages>69-88</pages>
<booktitle>Modern Database Systems</booktitle>
<url>db/books/collections/kim95.html#Blakeley95</url>
<year>1995</year>
</incollection>|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+你们能帮我吗?
发布于 2021-07-17 02:44:46
您的问题就在这里。在xml文件中,对于空节点,您必须写入或注意,在IntelliJ中,当您打开此xml文件时,可以用红色突出显示此错误。
赫查姆
https://stackoverflow.com/questions/63520789
复制相似问题