文章/答案/技术大牛

发布

问锈菌的IPv6解析
EN

Code Review用户

提问于 2018-06-21 18:18:36

回答 1查看 796关注 0票数 13

下面是解析IPv6地址的代码。IPv6地址有128位长。当它以可打印的形式表示时，它的十六进制(1个十六进制== 16位)被表示为十六进制数，并由列分隔。例如

fe80:0000:0000:0000:8657:e6fe:08d5:5325

请注意，对于每个六重奏，最左边的0s是可以忽略的.以下是相同的地址：

fe80:0:0:0:8657:e6fe:8d5:5325

最后，如果有几个连续的十六进制值为0，则可以省略它们并由::替换。以下是同样的地址：

fe80::8657:e6fe:8d5:5325

::可以在任何地方，而不仅仅在中间。例如，这些是有效的IPv6地址：

::1
ffff::

空地址可以表示为::。

最后，有一种特殊类型的IPv6地址提供了与IPv4兼容的功能。这些地址的最后32位表示一个IPv4，表示如下所示：

1111:2222:3333:4444:5555:6666:1.2.3.4

IPv4必须位于地址的末尾，才能使IP有效。

我的代码是受go标准库ParseIPv6函数启发的。

代码有点长，所以我也发布了作为要旨 (其中包含一些测试)

我想知道是否：

有一些方法可以提高代码的效率(甚至使用第三方板条箱)
用字节代替字符可以吗？在一个IPv6中，所有的字符都应该有一个ASCII表示，所以我认为这是可以的，但我不是100%肯定。如果我必须使用字符，它就会复杂得多，因为没有办法在Rust中索引字符串。

经过长时间的介绍，代码：

use std::str::FromStr;

#[derive(Debug, Copy, Eq, PartialEq, Hash, Clone)]
pub struct Ipv6Address(u128);

impl FromStr for Ipv6Address {
    type Err = MalformedAddress;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        // We'll manipulate bytes instead of UTF-8 characters, because the characters that
        // represent an IPv6 address are supposed to be ASCII characters.
        let bytes = s.as_bytes();

        // The maximimum length of a string representing an IPv6 is the length of:
        //
        //      1111:2222:3333:4444:5555:6666:7777:8888
        //
        // The minimum length of a string representing an IPv6 is the length of:
        //
        //      ::
        //
        if bytes.len() > 38 || bytes.len() < 2 {
            return Err(MalformedAddress(s.into()));
        }

        let mut offset = 0;
        let mut ellipsis: Option<usize> = None;

        // Handle the special case where the IP start with "::"
        if bytes[0] == b':' {
            if bytes[1] == b':' {
                if bytes.len() == 2 {
                    return Ok(Ipv6Address(0));
                }
                ellipsis = Some(0);
                offset += 2;
            } else {
                // An IPv6 cannot start with a single column. It must be a double column.
                // So this is an invalid address
                return Err(MalformedAddress(s.into()));
            }
        }

        // When dealing with IPv6, it's easier to reason in terms of "hextets" instead of octets.
        // An IPv6 is 8 hextets. At the end, we'll convert that array into an u128.
        let mut address: [u16; 8] = [0; 8];

        // Keep track of the number of hextets we process
        let mut hextet_index = 0;

        loop {
            if offset == bytes.len() {
                break;
            }

            // Try to read an hextet
            let (bytes_read, hextet) = read_hextet(&bytes[offset..]);

            // Handle the case where we could not read an hextet
            if bytes_read == 0 {
                match bytes[offset] {
                    // We could not read an hextet because the first character in the slace was ":"
                    // This may be because we have two consecutive columns.
                    b':' => {
                        // Check if already saw an ellipsis. If so, fail parsing, because an IPv6
                        // can only have one ellipsis.
                        if ellipsis.is_some() {
                            return Err(MalformedAddress(s.into()));
                        }
                        // Otherwise, remember the position of the ellipsis. We'll need that later
                        // to count the number of zeros the ellipsis represents.
                        ellipsis = Some(hextet_index);
                        offset += 1;
                        // Continue and try to read the next hextet
                        continue;
                    }
                    // We now the first character does not represent an hexadecimal digit
                    // (otherwise read_hextet() would have read at least one character), and that
                    // it's not ":", so the string does not represent an IPv6 address
                    _ => return Err(MalformedAddress(s.into())),
                }
            }

            // At this point, we know we read an hextet.

            address[hextet_index] = hextet;
            offset += bytes_read;
            hextet_index += 1;

            // If this was the last hextet of if we reached the end of the buffer, we should be
            // done
            if hextet_index == 8 || offset == bytes.len() {
                break
            }

            // Read the next charachter. After a hextet, we usually expect a column, but there's a special
            // case for IPv6 that ends with an IPv4.
            match bytes[offset] {
                // We saw the column, we can continue
                b':' => offset += 1,
                // Handle the special IPv4 case, ie address like. Note that the hextet we just read
                // is part of that IPv4 address:
                //
                // aaaa:bbbb:cccc:dddd:eeee:ffff:a.b.c.d.
                //                               ^^
                //                               ||
                // hextet we just read, that  ---+|
                // is actually the first byte of  +--- dot we're handling
                // the ipv4.
                b'.' => {
                    // The hextet was actually part of the IPv4, so not that we start reading the
                    // IPv4 at `offset - bytes_read`.
                    let ipv4: u32 = Ipv4Address::parse(&bytes[offset-bytes_read..])?.into();
                    // Replace the hextet we just read by the 16 most significant bits of the
                    // IPv4 address (a.b in the comment above)
                    address[hextet_index - 1] = ((ipv4 & 0xffff_0000) >> 16) as u16;
                    // Set the last hextet to the 16 least significant bits of the IPv4 address
                    // (c.d in the comment above)
                    address[hextet_index] = (ipv4 & 0x0000_ffff) as u16;
                    hextet_index += 1;
                    // After successfully parsing an IPv4, we should be done.
                    // If there are bytes left in the buffer, or if we didn't read enough hextet,
                    // we'll fail later.
                    break;
                }
                _ => return Err(MalformedAddress(s.into())),
            }
        } // end of loop

        // If we exited the loop, we should have reached the end of the buffer.
        // If there are trailing characters, parsing should fail.
        if offset < bytes.len() {
            return Err(MalformedAddress(s.into()));
        }

        if hextet_index == 8 && ellipsis.is_some() {
            // We parsed an address that looks like 1111:2222::3333:4444:5555:6666:7777,
            // ie with an empty ellipsis.
            return Err(MalformedAddress(s.into()));
        }

        // We didn't parse enough hextets, but this may be due to an ellipsis
        if hextet_index < 8 {
            if let Some(ellipsis_index) = ellipsis {
                // Count how many zeros the ellipsis accounts for
                let nb_zeros = 8 - hextet_index;
                // Shift the hextet that we read after the ellipsis by the number of zeros
                for index in (ellipsis_index..hextet_index).rev() {
                    address[index+nb_zeros] = address[index];
                    address[index] = 0;
                }
            } else {
                return Err(MalformedAddress(s.into()));
            }
        }

        // Build the IPv6 address from the array of hextets
        return Ok(Ipv6Address(
                ((address[0] as u128) << 112)
                + ((address[1] as u128) << 96)
                + ((address[2] as u128) << 90)
                + ((address[3] as u128) << 64)
                + ((address[4] as u128) << 48)
                + ((address[5] as u128) << 32)
                + ((address[6] as u128) << 16)
                + address[7] as u128))
    }
}

以下是我所用的帮手：

/// Check whether an ASCII character represents an hexadecimal digit
fn is_hex_digit(byte: u8) -> bool {
    match byte {
        b'0' ... b'9' | b'a' ... b'f' | b'A' ... b'F' => true,
        _ => false,
    }
}

/// Convert an ASCII character that represents an hexadecimal digit into this digit
fn hex_to_digit(byte: u8) -> u8 {
    match byte {
        b'0' ... b'9' => byte - b'0',
        b'a' ... b'f' => byte - b'a' + 10,
        b'A' ... b'F' => byte - b'A' + 10,
        _ => unreachable!(),
    }
}

/// Read up to four ASCII characters that represent hexadecimal digits, and return their value, as
/// well as the number of characters that were read. If not character is read, `(0, 0)` is
/// returned.
fn read_hextet(bytes: &[u8]) -> (usize, u16) {
    let mut count = 0;
    let mut digits: [u8; 4] = [0; 4];

    for b in bytes {
        if is_hex_digit(*b) {
            digits[count] = hex_to_digit(*b);
            count += 1;
            if count == 4 {
                break;
            }
        } else {
            break;
        }
    }

    if count == 0 {
        return (0, 0);
    }

    let mut shift = (count - 1) * 4;
    let mut res = 0;
    for digit in &digits[0..count] {
        res += (*digit as u16) << shift;
        if shift >= 4 {
            shift -= 4;
        } else {
            break;
        }
    }

    (count, res)
}

目前我不处理IPv4解析，所以我只是使用以下方法：

#[derive(Debug, Copy, Eq, PartialEq, Hash, Clone)]
pub struct Ipv4Address(u32);

impl Ipv4Address {
    fn parse(_: &[u8]) -> Result<u32, MalformedAddress> {
        unimplemented!();
    }
}

最后，这里是我使用的错误类型：

use std::fmt;
use std::error::Error;

#[derive(Debug)]
pub struct MalformedAddress(String);

impl fmt::Display for MalformedAddress {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "malformed address: \"{}\"", self.0)
    }
}

impl Error for MalformedAddress {
    fn description(&self) -> &str {
        "the string cannot be parsed as an IP address"
    }

    fn cause(&self) -> Option<&Error> {
        None
    }
}

rust

ip-address

回答 1

Code Review用户

发布于 2023-02-01 01:13:37

这不是一个完整的回顾，我只是很快地看了一下，注意到有一个部分看起来更像c，而不是铁锈。

is_hex_digit和hex_to_digit功能几乎是相同的，可以合并为

fn hex_digit(byte: u8) -> Option<u8> {
    match byte {
        b'0' ... b'9' => Some(byte - b'0'),
        b'a' ... b'f' => Some(byte - b'a' + 10),
        b'A' ... b'F' => Some(byte - b'A' + 10),
        _ => None,
    }
}

那么这部分

   if is_hex_digit(*b) {
        digits[count] = hex_to_digit(*b);
        count += 1;
        if count == 4 {
            break;
        }
    } else {
        break;
    }

可以写成：

   if let Some(digit) = hex_digit(*b) {
        digits[count] = digit;
        count += 1;
        if count == 4 {
            break;
        }
   } else {
        break;
   }

我脑海中的第二件事是没有单元测试，这是一个完美的案例，在这种情况下，添加单元测试来解析一般案例和所有角落案例是有意义的。

在块的最后一条语句中，不需要显式返回。

  // Build the IPv6 address from the array of hextets
    return Ok(Ipv6Address(
            ((address[0] as u128) << 112)
            + ((address[1] as u128) << 96)
            + ((address[2] as u128) << 90)
            + ((address[3] as u128) << 64)
            + ((address[4] as u128) << 48)
            + ((address[5] as u128) << 32)
            + ((address[6] as u128) << 16)
            + address[7] as u128))

应该用较短的单个Ok(ipv6)替换。

地址转换可以用let ipv6=unsafe{ std::mem::transmute::<[u16;8],u128>(address)来完成，但这可能会带来目标系统的某些问题，而且出于某种原因它是不安全的。into_bytes和from_bytes的一些变化在这里也是可能的。但是，我至少会用看起来更干净的按位or替换加法。

      // Build the IPv6 address from the array of hextets
       let ipv6:u128 = ((address[0] as u128) << 112)
        | ((address[1] as u128) << 96)
        | ((address[2] as u128) << 90)
        | ((address[3] as u128) << 64)
        | ((address[4] as u128) << 48)
        | ((address[5] as u128) << 32)
        | ((address[6] as u128) << 16)
        | (( address[7] as u128));
        Ok(ipv6)

票数 4

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/196996

复制

相似问题

问锈菌的IPv6解析
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问锈菌的IPv6解析EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问锈菌的IPv6解析
EN