首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >pdf2json:如何自定义输出json文件?

pdf2json:如何自定义输出json文件?
EN

Stack Overflow用户
提问于 2014-02-24 05:07:31
回答 1查看 3.5K关注 0票数 1

我是否可以自定义pdf2json命令行实用程序的输出,以便输出json文件具有特定的结构?

我试图从pdf (见下图)中提取数据,并将其存储为json文件。

我试过pdf2json -f [input directory or pdf file]。该命令确实输出了一个包含我需要的信息的json文件,但它也包含了许多我不需要的信息:

{"formImage":{"Transcoder":"pdf2json@0.6.6","Agency":"","Id":{"AgencyId":"","Name":"","MC":false,"Max":1,"Parent":""},"Pages":[{"Height":49.5,"HLines":[{"x":13.111828125000002,"y":4.678418750000001,"w":0.44775000000000004,"l":78.96384375000001},{"x":13.111828125000002,"y":44.074375,"w":0.44775000000000004,"l":78.96384375000001}],"VLines":[],"Fills":[{"x":0,"y":0,"w":0,"h":0,"clr":1}],"Texts":[{"x":13.632429687500002,"y":4.382312499999998,"w":4.163000000000001,"clr":0,"A":"left","R":[{"T":"abundant","S":-1,"TS":[0,13.9091,0,0]}]},{"x":25.021517303398443,"y":4.382312499999998,"w":4.139000000000001,"clr":0,"A":"left","R":[{"T":"positive%3A1","S":-1,"TS":[0,13.9091,0,0]}]},{"x":32.38324218816407,"y":4.382312499999998,"w":4.412000000000001,"clr":0,"A":"left","R":[{"T":"negative%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":40.12887364285157,"y":4.382312499999998,"w":3.1670000000000003,"clr":0,"A":"left","R":[{"T":"anger%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":46.1237223885547,"y":4.382312499999998,"w":5.993,"clr":0,"A":"left","R":[{"T":"anticipation%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":56.09123069480469,"y":4.382312499999998,"w":3.8400000000000003,"clr":0,"A":"left","R":[{"T":"disgust%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":63.0324864791797,"y":4.382312499999998,"w":2.4170000000000003,"clr":0,"A":"left","R":[{"T":"fear%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":67.97264684597657,"y":4.382312499999998,"w":2.109,"clr":0,"A":"left","R":[{"T":"joy%3A1","S":-1,"TS":[0,13.9091,0,0]}]},{"x":72.47968185183595,"y":4.382312499999998,"w":4.013,"clr":0,"A":"left","R":[{"T":"sadness%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":79.66421908894532,"y":4.382312499999998,"w":4.178000000000001,"clr":0,"A":"left","R":[{"T":"surprise%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":87.08078776941407,"y":4.382312499999998,"w":2.8930000000000002,"clr":0,"A":"left","R":[{"T":"trust%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":13.632429687500002,"y":5.017468750000002,"w":2.4480000000000004,"clr":0,"A":"left","R":

我只需要从pdf文件的文本。我不需要任何关于格式的信息。所以我需要这样的东西:

代码语言:javascript
复制
{"data":
    {
    "abundant": {
        "positive":1,
        "negative":0,
        "anger":0,
        ...
        },
    "abuse": {...},
    "abutment": {...},
    ...
    }
}
EN

回答 1

Stack Overflow用户

发布于 2015-07-27 11:57:00

我构建了一个Node.js模块,它使用pdf2json和一些简单的数学方法从PDF中提取表数据。输出是一个行数组。

https://www.npmjs.com/package/pdf2table

票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/21979307

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档