首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何将嵌套字典列表转换为DataFrame

如何将嵌套字典列表转换为DataFrame
EN

Stack Overflow用户
提问于 2020-06-07 06:34:54
回答 2查看 54关注 0票数 1

我正在尝试转换来自以下链接的文件:https://ads.twitter.com/transparency

转换成一个DataFrame。

下面是数据的外观:

代码语言:javascript
复制
{
  "archives" : [ {
    "ads_account" : {
      "account_name" : "@BradleyByrne - U.S. Political Campaigning",
      "user_name" : "BradleyByrne",
      "bio_url" : "https://twitter.com/ZpdrcK6Met",
      "billing_information" : {
        "insertion_order" : [ ],
        "credit_card" : [ {
          "city" : "Arlington",
          "spend" : 3.5845999999999995E-4,
          "postal_code" : "22209",
          "region" : "va",
          "credit_card_full_name" : "Targeted Victory"
        } ]
      }
    },
    "tweets" : [ {
      "impressions" : 0,
      "spend" : 0.0,
      "ad_campaigns" : [ {
        "targeting" : [ {
          "target" : "Montgomery AL- US",
          "target_type" : "GEO",
          "impressions" : 895
        }, {
          "target" : "13-54",
          "target_type" : "AGE_BUCKET",
          "impressions" : 5721
        }, {
          "target" : "Dothan AL- US",
          "target_type" : "GEO",
          "impressions" : 189
        }, {
          "target" : "13-29",
          "target_type" : "AGE_BUCKET",
          "impressions" : 3009
        }, {
          "target" : "Chattanooga TN- US",
          "target_type" : "GEO",
          "impressions" : 2
        }, {
          "target" : "English",
          "target_type" : "LANGUAGE",
          "impressions" : 8568
        }, {
          "target" : "Orlando-Daytona Beach-Melbourne FL- US",
          "target_type" : "GEO",
          "impressions" : 13
        }, {
          "target" : "21-54",
          "target_type" : "AGE_BUCKET",
          "impressions" : 4297
        }, {
          "target" : "Thai",
          "target_type" : "LANGUAGE",
          "impressions" : 1
        }, {
          "target" : "20 and up",
          "target_type" : "AGE_BUCKET",
          "impressions" : 6598
        },


"ads_account" : {
  "account_name" : "@club4growth - U.S. Political Campaigning - Bask Digital Media",
  "user_name" : "club4growth",
  "bio_url" : "http://twitter.com/wEF8OWW5zn",
  "billing_information" : {
    "insertion_order" : [ ],
    "credit_card" : [ ]
  }
},
"tweets" : [ {
  "impressions" : 466501,
  "spend" : 2993.5,
  "ad_campaigns" : [ {
    "targeting" : [ {
      "target" : "13 and up",
      "target_type" : "AGE_BUCKET",
      "impressions" : 144460
    }, {
      "target" : "20-34",
      "target_type" : "AGE_BUCKET",
      "impressions" : 78242
    }, {
      "target" : "Korean",
      "target_type" : "LANGUAGE",
      "impressions" : 160
    }, {
      "target" : "13-54",
      "target_type" : "AGE_BUCKET",
      "impressions" : 131703
    }, {
      "target" : "30-39",
      "target_type" : "AGE_BUCKET",
      "impressions" : 42685
    }, {
      "target" : "Pennsylvania- US",
      "target_type" : "GEO",
      "impressions" : 2
    }, {
      "target" : "25-54",
      "target_type" : "AGE_BUCKET",
      "impressions" : 86998
    }, {
      "target" : "South Dakota- US",
      "target_type" : "GEO",
      "impressions" : 1
    }, {
      "target" : "20-29",
      "target_type" : "AGE_BUCKET",
      "impressions" : 61090
    }, {
      "target" : "Dutch",
      "target_type" : "LANGUAGE",
      "impressions" : 41
    }, {
      "target" : "Unknown",
      "target_type" : "GENDER",
      "impressions" : 214
    }, {
      "target" : "Washington DC- US",
      "target_type" : "GEO",
      "impressions" : 144356
    }, {
      "target" : "French",
      "target_type" : "LANGUAGE",
      "impressions" : 420
    }, {
      "target" : "German",
      "target_type" : "LANGUAGE",
      "impressions" : 71
    }, {
      "target" : "New Jersey- US",
      "target_type" : "GEO",
      "impressions" : 1
    }, {
      "target" : "Female",
      "target_type" : "GENDER",
      "impressions" : 57736
    },

看起来每个广告商都有自己的嵌套字典,而我没有找到将它们转换成DataFrame的方法。我尝试了下面的代码来转换它,但它只是将它们分离到不同的列中。

有什么解决方案吗?谢谢

代码语言:javascript
复制
import json
from pandas.io.json import json_normalize
file = 'issue.txt'
with open(file) as train_file:
    dict_train = json.load(train_file)


train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)
train
EN

回答 2

Stack Overflow用户

发布于 2020-06-07 08:57:42

你可以使用json_normalize来尝试这一点,你需要为每个json路径创建单独的数据帧,然后你必须将它们合并在一起或保持它们的分离:

代码语言:javascript
复制
df1 = pd.json_normalize(data['archives'], record_path=['tweets'])
df2 = pd.json_normalize(data['archives'],
                        record_path=['ads_account', 'billing_information', 'insertion_order'],
                        meta=[['ads_account', 'account_name'], ['ads_account', 'user_name']])

df1
df2

输出:

df1:

代码语言:javascript
复制
      impressions      spend  ...                                         tweet_text                                          tweet_url
0          132072    2071.81  ...  There’s nothing controversial about something ...  https://twitter.com/transparency/status/106532...
1         8779581  100000.00  ...  Let’s #endgunviolencetogether - go to https://...  https://twitter.com/transparency/status/106473...
2         1021063   15601.68  ...  There’s nothing controversial about something ...  https://twitter.com/transparency/status/106532...
3         5935913  113991.45  ...  Send a postcard to your representative in less...  https://twitter.com/transparency/status/106504...
4           40233     287.31  ...  Care for Pennsylvania seniors is in jeopardy. ...  https://twitter.com/transparency/status/113887...
...           ...        ...  ...                                                ...                                                ...
2855       115744     760.68  ...  Dear New York politicians: Abortion is health ...  https://twitter.com/transparency/status/108388...
2856       514286    2566.19  ...  In 2019, states have passed more laws than eve...  https://twitter.com/transparency/status/114830...
2857         8247     180.71  ...  Spread the word about Trump's real agenda so t...  https://twitter.com/transparency/status/109297...
2858         4629      24.36  ...  Illinois’ new law, the Reproductive Health Act...  https://twitter.com/transparency/status/113485...
2859         1795       6.38  ...  Congratulations to our #WebbyAwards nominated ...  https://twitter.com/transparency/status/111318...

df2:

代码语言:javascript
复制
    advertising_agency_name                                company_name  ...                 ads_account.account_name ads_account.user_name
0          Resolution Media                             Toms Shoes Inc.  ...             @TOMS - U.S. Issue Ads - OMD                  TOMS
1      Precision Strategies                                      Humana  ...   @humana - Issue - Precision Strategies                Humana
2                       NaN  Federation for American Immigration Reform  ...        @FAIRImmigration - U.S. Issue Ads       FAIRImmigration
3                       NaN                                         VH1  ...                    @VH1 - U.S. Issue Ads                   VH1
4                       NaN                                         VH1  ...                    @VH1 - U.S. Issue Ads                   VH1
..                      ...                                         ...  ...                                      ...                   ...
118             Cavalry LLC               American Hospital Association  ...  @AHAAdvocacy - U.S. Issue Ads - Cavalry           AHAAdvocacy
119                     NaN                                      FWD.us  ...                  @FWDus - U.S. Issue Ads                 FWDus
120                     NaN                                      FWD.us  ...                  @FWDus - U.S. Issue Ads                 FWDus
121                     NaN               California Secretary of State  ...              @CASOSVote - U.S. Issue Ads             CASOSvote
122                     NaN               California Secretary of State  ...              @CASOSVote - U.S. Issue Ads             CASOSvote
票数 1
EN

Stack Overflow用户

发布于 2020-06-07 07:06:54

请尝试使用pandas.read_json()

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62238802

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档