首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用于匹配对象尺寸的正则表达式

用于匹配对象尺寸的正则表达式
EN

Stack Overflow用户
提问于 2011-12-08 16:26:15
回答 3查看 2.4K关注 0票数 7

我就把它放在这里:正则表达式太糟糕了。我试着想出一个办法来解决我的问题,但我真的不太了解他们。。。

设想以下几个句子:

  • 你好,诸如此类。它大约是11 1/2“x 32”。尺寸在22“x 17”的地方是8 x 10-3/5!
  • Probably。
  • 相当大: 42 1/2“x 60码。
  • --它们都是5.76乘8 frames.
  • Yeah,,大概有84厘米长。

我想尽可能干净地从这些句子中提取项目尺寸。在完美的世界中,正则表达式将输出以下内容:

8

  • 84cm

  • 13/19"

  • 86
  • 11 1/2“x 32"
  • 8 x 10-3/5
  • 22”x17“
  • 42 1/2”x60 yd

H 1315.76

我设想一个适用以下规则的世界:

  • 以下是有效单位:{cm, mm, yd, yards, ", ', feet},不过我更喜欢考虑任意一组单元的解决方案,而不是对上述单元的显式解决方案。
  • A维总是在数字上描述,可能或不包含跟随它的单元,也可能没有小数小数或小数部分。允许由一个分数部分单独组成,例如,4/5".
  • Fractional部件总是有一个分隔分子/分母的/,人们可以假设这些部件之间没有空格(尽管考虑到这一点,即great!).
  • Dimensions可能是一维或二维,在这种情况下,可以假定以下是分离二维的可以接受的):{x, by}。如果一个维度仅仅是一维的,则4.33 oz.

必须有来自上述集合的单元,即22 cm是确定的,.333不是,也不是22 cm

为了向您展示我对正则表达式是多么的无用(并且至少向您展示了我已经尝试过了!),我已经做到了这一点。。。

代码语言:javascript
复制
[1-9]+[/ ][x1-9]

更新(2)

你们俩速度很快,效率很高!我将在下面的正则表达式中添加一些未涉及的测试用例:

  • 最后一个测试用例为12 yd。
  • ,最后一个测试用例由99 cm长。
  • 这句话中没有尺寸: 342 / 5553 /222。
  • 三维? 22“x17”x12 cm

H 173这是一个产品代码: c720,另一个83 x更好。

  • A数字属于自己的21.
  • A卷不应与0.332盎司

相匹配

这些结果应产生以下结果(#指示不应匹配):

cm

  • 22“
  • 12 yd
  • 99 x 17”x12 cm

我已经在下面修改了M42's的答案:

代码语言:javascript
复制
\d+(?:\.\d+)?[\s-]*(?:\d+)?(?:\/\d+)?(?:cm|mm|yd|"|'|feet)(?:\s*x\s*|\s*by\s*)?(?:\d+(?:\.\d+)?[\s*-]*(?:\d+(?:\/\d+)?)?(?:cm|mm|yd|"|'|feet)?)?

但是,虽然这解决了一些新的测试用例,但它现在无法匹配以下其他测试用例。它报告说:

PASS

  • (nothing) 11/2“x 32”PASS

  • (nothing) FAIL

  • 22“x17”通过

  • 42 1/2“x 60 yd PASS

  • 13/19”PASS

H 111786 cm PASSH 2118H 111922“PASS

  • (nothing) FAIL

  • (nothing)

  • (nothing)FAIL

  • (nothing)>H 2124H1 12512 yd x失败<代码>H 2126H 12799 cm失败H 2128/code>12922”x17“,但分别为'12厘米‘FAIL

  • PASS

  • PASS
EN

回答 3

Stack Overflow用户

发布于 2011-12-08 16:52:12

新版本,接近目标,2次测试失败

代码语言:javascript
复制
#!/usr/local/bin/perl 
use Modern::Perl;
use Test::More;

my $re1 = qr/\d+(?:\.\d+)?[\s-]*(?:\d+)?(?:\/\d+)?(?:cm|mm|yd|"|'|feet)/;
my $re2 = qr/(?:\s*x\s*|\s*by\s*)/;
my $re3 = qr/\d+(?:\.\d+)?[\s-]*(?:\d+)?(?:\/\d+)?(?:cm|mm|yd|"|'|feet|frames)/;
my @out = (
'11 1/2" x 32"',
'8 x 10-3/5',
'22" x 17"',
'42 1/2" x 60 yd',
'5.76 by 8 frames',
'84cm',
'13/19"',
'86 cm',
'12 yd',
'99 cm',
'no match',
'22" x 17" x 12 cm',
'no match',
'no match',
'no match',
);
my $i = 0;
my $xx = '22" x 17"';
while(<DATA>) {
    chomp;
    if (/($re1(?:$re2$re3)?(?:$re2$re1)?)/) {
        ok($1 eq $out[$i], $1 . ' in ' . $_);
    } else {
        ok($out[$i] eq 'no match', ' got "no match" in '.$_);
    }
    $i++;
}
done_testing;


__DATA__
Hello blah blah. It's around 11 1/2" x 32".
The dimensions are 8 x 10-3/5!
Probably somewhere in the region of 22" x 17".
The roll is quite large: 42 1/2" x 60 yd.
They are all 5.76 by 8 frames.
Yeah, maybe it's around 84cm long.
I think about 13/19".
No, it's probably 86 cm actually.
The last but one test case is 12 yd x.
The last test case is 99 cm by.
This sentence doesn't have dimensions in it: 342 / 5553 / 222.
Three dimensions? 22" x 17" x 12 cm
This is a product code: c720 with another number 83 x better.  
A number on its own 21.
A volume shouldn't match 0.332 oz.

输出:

代码语言:javascript
复制
#   Failed test ' got "no match" in The dimensions are 8 x 10-3/5!'
#   at C:\tests\perl\test6.pl line 42.
#   Failed test ' got "no match" in They are all 5.76 by 8 frames.'
#   at C:\tests\perl\test6.pl line 42.
# Looks like you failed 2 tests of 15.
ok 1 - 11 1/2" x 32" in Hello blah blah. It's around 11 1/2" x 32".
not ok 2 -  got "no match" in The dimensions are 8 x 10-3/5!
ok 3 - 22" x 17" in Probably somewhere in the region of 22" x 17".
ok 4 - 42 1/2" x 60 yd in The roll is quite large: 42 1/2" x 60 yd.
not ok 5 -  got "no match" in They are all 5.76 by 8 frames.
ok 6 - 84cm in Yeah, maybe it's around 84cm long.
ok 7 - 13/19" in I think about 13/19".
ok 8 - 86 cm in No, it's probably 86 cm actually.
ok 9 - 12 yd in The last but one test case is 12 yd x.
ok 10 - 99 cm in The last test case is 99 cm by.
ok 11 -  got "no match" in This sentence doesn't have dimensions in it: 342 / 5553 / 222.
ok 12 - 22" x 17" x 12 cm in Three dimensions? 22" x 17" x 12 cm
ok 13 -  got "no match" in This is a product code: c720 with another number 83 x better.  
ok 14 -  got "no match" in A number on its own 21.
ok 15 -  got "no match" in A volume shouldn't match 0.332 oz.
1..15

似乎很难匹配5.76 by 8 frames,但不匹配0.332 oz,有时您必须将数字与单位匹配,而数字没有单位。

对不起,我不能做得更好。

票数 5
EN

Stack Overflow用户

发布于 2011-12-08 16:52:59

许多可能的解决方案之一(应该是nlp兼容的,因为它只使用基本的regex语法):

代码语言:javascript
复制
foundMatch = Regex.IsMatch(SubjectString, @"\d+(?: |cm|\.|""|/)[\d/""x -]*(?:\b(?:by\s*\d+|cm|yd)\b)?");

会得到你的结果:)

解释:

代码语言:javascript
复制
"
\d             # Match a single digit 0..9
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?:            # Match the regular expression below
                  # Match either the regular expression below (attempting the next alternative only if this one fails)
      \           # Match the character “ ” literally
   |              # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
      cm          # Match the characters “cm” literally
   |              # Or match regular expression number 3 below (attempting the next alternative only if this one fails)
      \.          # Match the character “.” literally
   |              # Or match regular expression number 4 below (attempting the next alternative only if this one fails)
      ""          # Match the character “""” literally
   |              # Or match regular expression number 5 below (the entire group fails if this one fails to match)
      /           # Match the character “/” literally
)
[\d/""x -]        # Match a single character present in the list below
                  # A single digit 0..9
                  # One of the characters “/""x”
                  # The character “ ”
                  # The character “-”
   *              # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(?:               # Match the regular expression below
   \b             # Assert position at a word boundary
   (?:            # Match the regular expression below
                  # Match either the regular expression below (attempting the next alternative only if this one fails)
         by       # Match the characters “by” literally
         \s       # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
            *     # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
         \d       # Match a single digit 0..9
            +     # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      |           # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
         cm       # Match the characters “cm” literally
      |           # Or match regular expression number 3 below (the entire group fails if this one fails to match)
         yd       # Match the characters “yd” literally
   )
   \b             # Assert position at a word boundary
)?                # Between zero and one times, as many times as possible, giving back as needed (greedy)
"
票数 2
EN

Stack Overflow用户

发布于 2011-12-08 17:18:34

这就是我在'Perl‘中使用正则表达式所能得到的全部信息。试着让它适应你的regex口味:

代码语言:javascript
复制
\d.*\d(?:\s+\S+|\S+)

解释:

代码语言:javascript
复制
\d        # One digit.
.*        # Any number of characters.
\d        # One digit. All joined means to find all content between first and last digit.
\s+\S+    # A non-space characters after some space. It tries to match any unit like 'cm' or 'yd'.
|         # Or. Select one of two expressions between parentheses.
\S+       # Any number of non-space characters. It tries to match double-quotes, or units joined to the 
          # last number.

我的测试:

script.pl含量

代码语言:javascript
复制
use warnings;
use strict;

while ( <DATA> ) {
        print qq[$1\n] if m/(\d.*\d(\s+\S+|\S+))/
}

__DATA__
Hello blah blah. It's around 11 1/2" x 32".
The dimensions are 8 x 10-3/5!
Probably somewhere in the region of 22" x 17".
The roll is quite large: 42 1/2" x 60 yd.
They are all 5.76 by 8 frames.
Yeah, maybe it's around 84cm long.
I think about 13/19".
No, it's probably 86 cm actually.

运行脚本:

代码语言:javascript
复制
perl script.pl

结果:

代码语言:javascript
复制
11 1/2" x 32".
8 x 10-3/5!
22" x 17".
42 1/2" x 60 yd.
5.76 by 8 frames.
84cm
13/19".
86 cm
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/8434205

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档