文章/答案/技术大牛

发布

社区首页 >问答首页 >如何从OCR内容中准确提取电子邮件和单元编号字符串文本？

问如何从OCR内容中准确提取电子邮件和单元编号字符串文本？
EN

Stack Overflow用户

提问于 2017-05-08 17:46:40

回答 1查看 161关注 0票数 0

我已经使用google cloud vision OCR从here中提取名片电子邮件字符串文本，并使用下面的正则表达式尝试提取，但效果不是很好。有什么更好的建议来提高性能吗？

function extract_emails($str){
    // This regular expression extracts all emails from a string:
    $regexp = '/([a-z0-9_\.\-])+\@(([a-z0-9\-])+\.)+([a-z0-9]{2,4})+/i';
    preg_match_all($regexp, $str, $m);

    return isset($m[0]) ? $m[0] : array();
}

$Email = extract_emails($gcv_response);

if (!empty($Email))
{
    $Email = reset($Email); 
}
else
{
    $Email = 'NULL';
}

@algen.comsg网址: www.algen.comsg电话：(65) 68982292传真：(65) 68982202 (65) 68982813

运行上述代码的结果= NULL；期望的输出: philip@algen.comsg

OCR text 2："Allan Lim Yee Chian首席执行官Alpha Biofuels (S) Pte Ltd LHCCBNFLN FR2 a mobile 9790 3063 - 6264 6696 fax 6260 2082 C#01-05，2 Tuas South Ave 2新加坡637601 tang. Steve. Eric@alphabiofuels.sg www.alphabiofuels.sg“

运行上述代码的结果= NULL；期望输出: tang.Steve.Eric@alphabiofuels.sg；

ocr

google-vision

php

regex

回答 1

Stack Overflow用户

发布于 2018-07-17 20:18:46

您面临的两个问题是您没有将代码转换为小写，第二个问题是您没有涵盖代码中出现空格的情况。我试图涵盖这些，但您必须根据您的要求进行修改。

function extract_emails($str){
    // This regular expression extracts all emails from a string:
    $regexp = '/(([a-z0-9_\-])+\.\\s?)?/([a-z0-9_\.\-])+\\s?\@(([a-z0-9\-])+\.)+([a-z0-9]{2,4})+/i';
    //$regexp = '/(([a-zA-Z0-9_\-])+\.\\s?)?/([a-zA-Z0-9_\.\-])+\\s?\@(([a-z0-9\-])+\.)+([a-z0-9]{2,4})+/i';//for using uppercase letters.

preg_match_all($regexp, strtolower($str), $m);

    return isset($m[0]) ? $m[0] : array();
}

$Email = extract_emails($gcv_response);

if (!empty($Email))
{
    $Email = reset($Email); 
}
else
{
    $Email = 'NULL';
}

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43844506

复制

相似问题

问如何从OCR内容中准确提取电子邮件和单元编号字符串文本？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从OCR内容中准确提取电子邮件和单元编号字符串文本？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从OCR内容中准确提取电子邮件和单元编号字符串文本？
EN