我有上千个嵌套的json对象(参见示例)。
不同之处可能出现在各个层面。我想计算两个物体的相似度。差异发生的水平越高,数据集的差异就越大。有些字段是相关的,比如name和id。
有什么想法吗?有没有可以接管这项工作的模块?非常感谢你的帮助。
json示例:
{ "items";[
{
"id":"abcd",
"Name":"Name",
"Infos": {
"info1":"info1",
"info2":"info2"
},
"data":{
"data1":"info1",
"data2":"info2"
},
"packs": [
{
"Name":"Name",
"description":"description"
},
{
"Name1":"Name1",
"description1":"description1"
} } {
"id":"abcd",
"Name":"Name",
"Infos": {
"info1":"info1",
"info4":"info4"
},
"data":{
"data1":"info1",
"data2":"info2"
},
"packs": [
{
"Name3":"Name3",
"description":"description"
},
{
"Name3":"Name3",
"description1":"description1"
} }发布于 2020-02-26 21:56:45
def difference(x,y):
if isinstance(x, list) and isinstance(y, list):
return 1+min( difference(a, b) for a, b in zip(x,y) )
elif isinstance(x, dict) and isinstance(y, dict):
return 1+min( difference(a, b) for a, b in zip(x.values(), y.values() )
else:
return 1 if x!=y else 0这就是你需要的吗?
https://stackoverflow.com/questions/60415326
复制相似问题