首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >锈菌的IPv6解析

锈菌的IPv6解析
EN

Code Review用户
提问于 2018-06-21 18:18:36
回答 1查看 796关注 0票数 13

下面是解析IPv6地址的代码。IPv6地址有128位长。当它以可打印的形式表示时,它的十六进制(1个十六进制== 16位)被表示为十六进制数,并由列分隔。例如

代码语言:javascript
复制
fe80:0000:0000:0000:8657:e6fe:08d5:5325

请注意,对于每个六重奏,最左边的0s是可以忽略的.以下是相同的地址:

代码语言:javascript
复制
fe80:0:0:0:8657:e6fe:8d5:5325

最后,如果有几个连续的十六进制值为0,则可以省略它们并由::替换。以下是同样的地址:

代码语言:javascript
复制
fe80::8657:e6fe:8d5:5325

::可以在任何地方,而不仅仅在中间。例如,这些是有效的IPv6地址:

代码语言:javascript
复制
::1
ffff::

空地址可以表示为::

最后,有一种特殊类型的IPv6地址提供了与IPv4兼容的功能。这些地址的最后32位表示一个IPv4,表示如下所示:

代码语言:javascript
复制
1111:2222:3333:4444:5555:6666:1.2.3.4

IPv4必须位于地址的末尾,才能使IP有效。

我的代码是受go标准库ParseIPv6函数启发的。

代码有点长,所以我也发布了作为要旨 (其中包含一些测试)

我想知道是否:

  • 有一些方法可以提高代码的效率(甚至使用第三方板条箱)
  • 用字节代替字符可以吗?在一个IPv6中,所有的字符都应该有一个ASCII表示,所以我认为这是可以的,但我不是100%肯定。如果我必须使用字符,它就会复杂得多,因为没有办法在Rust中索引字符串。

经过长时间的介绍,代码:

代码语言:javascript
复制
use std::str::FromStr;

#[derive(Debug, Copy, Eq, PartialEq, Hash, Clone)]
pub struct Ipv6Address(u128);

impl FromStr for Ipv6Address {
    type Err = MalformedAddress;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        // We'll manipulate bytes instead of UTF-8 characters, because the characters that
        // represent an IPv6 address are supposed to be ASCII characters.
        let bytes = s.as_bytes();

        // The maximimum length of a string representing an IPv6 is the length of:
        //
        //      1111:2222:3333:4444:5555:6666:7777:8888
        //
        // The minimum length of a string representing an IPv6 is the length of:
        //
        //      ::
        //
        if bytes.len() > 38 || bytes.len() < 2 {
            return Err(MalformedAddress(s.into()));
        }

        let mut offset = 0;
        let mut ellipsis: Option<usize> = None;

        // Handle the special case where the IP start with "::"
        if bytes[0] == b':' {
            if bytes[1] == b':' {
                if bytes.len() == 2 {
                    return Ok(Ipv6Address(0));
                }
                ellipsis = Some(0);
                offset += 2;
            } else {
                // An IPv6 cannot start with a single column. It must be a double column.
                // So this is an invalid address
                return Err(MalformedAddress(s.into()));
            }
        }

        // When dealing with IPv6, it's easier to reason in terms of "hextets" instead of octets.
        // An IPv6 is 8 hextets. At the end, we'll convert that array into an u128.
        let mut address: [u16; 8] = [0; 8];

        // Keep track of the number of hextets we process
        let mut hextet_index = 0;

        loop {
            if offset == bytes.len() {
                break;
            }

            // Try to read an hextet
            let (bytes_read, hextet) = read_hextet(&bytes[offset..]);

            // Handle the case where we could not read an hextet
            if bytes_read == 0 {
                match bytes[offset] {
                    // We could not read an hextet because the first character in the slace was ":"
                    // This may be because we have two consecutive columns.
                    b':' => {
                        // Check if already saw an ellipsis. If so, fail parsing, because an IPv6
                        // can only have one ellipsis.
                        if ellipsis.is_some() {
                            return Err(MalformedAddress(s.into()));
                        }
                        // Otherwise, remember the position of the ellipsis. We'll need that later
                        // to count the number of zeros the ellipsis represents.
                        ellipsis = Some(hextet_index);
                        offset += 1;
                        // Continue and try to read the next hextet
                        continue;
                    }
                    // We now the first character does not represent an hexadecimal digit
                    // (otherwise read_hextet() would have read at least one character), and that
                    // it's not ":", so the string does not represent an IPv6 address
                    _ => return Err(MalformedAddress(s.into())),
                }
            }

            // At this point, we know we read an hextet.

            address[hextet_index] = hextet;
            offset += bytes_read;
            hextet_index += 1;

            // If this was the last hextet of if we reached the end of the buffer, we should be
            // done
            if hextet_index == 8 || offset == bytes.len() {
                break
            }

            // Read the next charachter. After a hextet, we usually expect a column, but there's a special
            // case for IPv6 that ends with an IPv4.
            match bytes[offset] {
                // We saw the column, we can continue
                b':' => offset += 1,
                // Handle the special IPv4 case, ie address like. Note that the hextet we just read
                // is part of that IPv4 address:
                //
                // aaaa:bbbb:cccc:dddd:eeee:ffff:a.b.c.d.
                //                               ^^
                //                               ||
                // hextet we just read, that  ---+|
                // is actually the first byte of  +--- dot we're handling
                // the ipv4.
                b'.' => {
                    // The hextet was actually part of the IPv4, so not that we start reading the
                    // IPv4 at `offset - bytes_read`.
                    let ipv4: u32 = Ipv4Address::parse(&bytes[offset-bytes_read..])?.into();
                    // Replace the hextet we just read by the 16 most significant bits of the
                    // IPv4 address (a.b in the comment above)
                    address[hextet_index - 1] = ((ipv4 & 0xffff_0000) >> 16) as u16;
                    // Set the last hextet to the 16 least significant bits of the IPv4 address
                    // (c.d in the comment above)
                    address[hextet_index] = (ipv4 & 0x0000_ffff) as u16;
                    hextet_index += 1;
                    // After successfully parsing an IPv4, we should be done.
                    // If there are bytes left in the buffer, or if we didn't read enough hextet,
                    // we'll fail later.
                    break;
                }
                _ => return Err(MalformedAddress(s.into())),
            }
        } // end of loop

        // If we exited the loop, we should have reached the end of the buffer.
        // If there are trailing characters, parsing should fail.
        if offset < bytes.len() {
            return Err(MalformedAddress(s.into()));
        }

        if hextet_index == 8 && ellipsis.is_some() {
            // We parsed an address that looks like 1111:2222::3333:4444:5555:6666:7777,
            // ie with an empty ellipsis.
            return Err(MalformedAddress(s.into()));
        }

        // We didn't parse enough hextets, but this may be due to an ellipsis
        if hextet_index < 8 {
            if let Some(ellipsis_index) = ellipsis {
                // Count how many zeros the ellipsis accounts for
                let nb_zeros = 8 - hextet_index;
                // Shift the hextet that we read after the ellipsis by the number of zeros
                for index in (ellipsis_index..hextet_index).rev() {
                    address[index+nb_zeros] = address[index];
                    address[index] = 0;
                }
            } else {
                return Err(MalformedAddress(s.into()));
            }
        }

        // Build the IPv6 address from the array of hextets
        return Ok(Ipv6Address(
                ((address[0] as u128) << 112)
                + ((address[1] as u128) << 96)
                + ((address[2] as u128) << 90)
                + ((address[3] as u128) << 64)
                + ((address[4] as u128) << 48)
                + ((address[5] as u128) << 32)
                + ((address[6] as u128) << 16)
                + address[7] as u128))
    }
}

以下是我所用的帮手:

代码语言:javascript
复制
/// Check whether an ASCII character represents an hexadecimal digit
fn is_hex_digit(byte: u8) -> bool {
    match byte {
        b'0' ... b'9' | b'a' ... b'f' | b'A' ... b'F' => true,
        _ => false,
    }
}

/// Convert an ASCII character that represents an hexadecimal digit into this digit
fn hex_to_digit(byte: u8) -> u8 {
    match byte {
        b'0' ... b'9' => byte - b'0',
        b'a' ... b'f' => byte - b'a' + 10,
        b'A' ... b'F' => byte - b'A' + 10,
        _ => unreachable!(),
    }
}

/// Read up to four ASCII characters that represent hexadecimal digits, and return their value, as
/// well as the number of characters that were read. If not character is read, `(0, 0)` is
/// returned.
fn read_hextet(bytes: &[u8]) -> (usize, u16) {
    let mut count = 0;
    let mut digits: [u8; 4] = [0; 4];

    for b in bytes {
        if is_hex_digit(*b) {
            digits[count] = hex_to_digit(*b);
            count += 1;
            if count == 4 {
                break;
            }
        } else {
            break;
        }
    }

    if count == 0 {
        return (0, 0);
    }

    let mut shift = (count - 1) * 4;
    let mut res = 0;
    for digit in &digits[0..count] {
        res += (*digit as u16) << shift;
        if shift >= 4 {
            shift -= 4;
        } else {
            break;
        }
    }

    (count, res)
}

目前我不处理IPv4解析,所以我只是使用以下方法:

代码语言:javascript
复制
#[derive(Debug, Copy, Eq, PartialEq, Hash, Clone)]
pub struct Ipv4Address(u32);

impl Ipv4Address {
    fn parse(_: &[u8]) -> Result<u32, MalformedAddress> {
        unimplemented!();
    }
}

最后,这里是我使用的错误类型:

代码语言:javascript
复制
use std::fmt;
use std::error::Error;

#[derive(Debug)]
pub struct MalformedAddress(String);

impl fmt::Display for MalformedAddress {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "malformed address: \"{}\"", self.0)
    }
}

impl Error for MalformedAddress {
    fn description(&self) -> &str {
        "the string cannot be parsed as an IP address"
    }

    fn cause(&self) -> Option<&Error> {
        None
    }
}
EN

回答 1

Code Review用户

发布于 2023-02-01 01:13:37

这不是一个完整的回顾,我只是很快地看了一下,注意到有一个部分看起来更像c,而不是铁锈。

is_hex_digithex_to_digit功能几乎是相同的,可以合并为

代码语言:javascript
复制
fn hex_digit(byte: u8) -> Option<u8> {
    match byte {
        b'0' ... b'9' => Some(byte - b'0'),
        b'a' ... b'f' => Some(byte - b'a' + 10),
        b'A' ... b'F' => Some(byte - b'A' + 10),
        _ => None,
    }
}

那么这部分

代码语言:javascript
复制
   if is_hex_digit(*b) {
        digits[count] = hex_to_digit(*b);
        count += 1;
        if count == 4 {
            break;
        }
    } else {
        break;
    }

可以写成:

代码语言:javascript
复制
   if let Some(digit) = hex_digit(*b) {
        digits[count] = digit;
        count += 1;
        if count == 4 {
            break;
        }
   } else {
        break;
   }

我脑海中的第二件事是没有单元测试,这是一个完美的案例,在这种情况下,添加单元测试来解析一般案例和所有角落案例是有意义的。

在块的最后一条语句中,不需要显式返回。

代码语言:javascript
复制
  // Build the IPv6 address from the array of hextets
    return Ok(Ipv6Address(
            ((address[0] as u128) << 112)
            + ((address[1] as u128) << 96)
            + ((address[2] as u128) << 90)
            + ((address[3] as u128) << 64)
            + ((address[4] as u128) << 48)
            + ((address[5] as u128) << 32)
            + ((address[6] as u128) << 16)
            + address[7] as u128))

应该用较短的单个Ok(ipv6)替换。

地址转换可以用let ipv6=unsafe{ std::mem::transmute::<[u16;8],u128>(address)来完成,但这可能会带来目标系统的某些问题,而且出于某种原因它是不安全的。into_bytesfrom_bytes的一些变化在这里也是可能的。但是,我至少会用看起来更干净的按位or替换加法。

代码语言:javascript
复制
      // Build the IPv6 address from the array of hextets
       let ipv6:u128 = ((address[0] as u128) << 112)
        | ((address[1] as u128) << 96)
        | ((address[2] as u128) << 90)
        | ((address[3] as u128) << 64)
        | ((address[4] as u128) << 48)
        | ((address[5] as u128) << 32)
        | ((address[6] as u128) << 16)
        | (( address[7] as u128));
        Ok(ipv6)
票数 4
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/196996

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档