下面是解析IPv6地址的代码。IPv6地址有128位长。当它以可打印的形式表示时,它的十六进制(1个十六进制== 16位)被表示为十六进制数,并由列分隔。例如
fe80:0000:0000:0000:8657:e6fe:08d5:5325请注意,对于每个六重奏,最左边的0s是可以忽略的.以下是相同的地址:
fe80:0:0:0:8657:e6fe:8d5:5325最后,如果有几个连续的十六进制值为0,则可以省略它们并由::替换。以下是同样的地址:
fe80::8657:e6fe:8d5:5325::可以在任何地方,而不仅仅在中间。例如,这些是有效的IPv6地址:
::1
ffff::空地址可以表示为::。
最后,有一种特殊类型的IPv6地址提供了与IPv4兼容的功能。这些地址的最后32位表示一个IPv4,表示如下所示:
1111:2222:3333:4444:5555:6666:1.2.3.4IPv4必须位于地址的末尾,才能使IP有效。
我的代码是受go标准库ParseIPv6函数启发的。
代码有点长,所以我也发布了作为要旨 (其中包含一些测试)
我想知道是否:
经过长时间的介绍,代码:
use std::str::FromStr;
#[derive(Debug, Copy, Eq, PartialEq, Hash, Clone)]
pub struct Ipv6Address(u128);
impl FromStr for Ipv6Address {
type Err = MalformedAddress;
fn from_str(s: &str) -> Result<Self, Self::Err> {
// We'll manipulate bytes instead of UTF-8 characters, because the characters that
// represent an IPv6 address are supposed to be ASCII characters.
let bytes = s.as_bytes();
// The maximimum length of a string representing an IPv6 is the length of:
//
// 1111:2222:3333:4444:5555:6666:7777:8888
//
// The minimum length of a string representing an IPv6 is the length of:
//
// ::
//
if bytes.len() > 38 || bytes.len() < 2 {
return Err(MalformedAddress(s.into()));
}
let mut offset = 0;
let mut ellipsis: Option<usize> = None;
// Handle the special case where the IP start with "::"
if bytes[0] == b':' {
if bytes[1] == b':' {
if bytes.len() == 2 {
return Ok(Ipv6Address(0));
}
ellipsis = Some(0);
offset += 2;
} else {
// An IPv6 cannot start with a single column. It must be a double column.
// So this is an invalid address
return Err(MalformedAddress(s.into()));
}
}
// When dealing with IPv6, it's easier to reason in terms of "hextets" instead of octets.
// An IPv6 is 8 hextets. At the end, we'll convert that array into an u128.
let mut address: [u16; 8] = [0; 8];
// Keep track of the number of hextets we process
let mut hextet_index = 0;
loop {
if offset == bytes.len() {
break;
}
// Try to read an hextet
let (bytes_read, hextet) = read_hextet(&bytes[offset..]);
// Handle the case where we could not read an hextet
if bytes_read == 0 {
match bytes[offset] {
// We could not read an hextet because the first character in the slace was ":"
// This may be because we have two consecutive columns.
b':' => {
// Check if already saw an ellipsis. If so, fail parsing, because an IPv6
// can only have one ellipsis.
if ellipsis.is_some() {
return Err(MalformedAddress(s.into()));
}
// Otherwise, remember the position of the ellipsis. We'll need that later
// to count the number of zeros the ellipsis represents.
ellipsis = Some(hextet_index);
offset += 1;
// Continue and try to read the next hextet
continue;
}
// We now the first character does not represent an hexadecimal digit
// (otherwise read_hextet() would have read at least one character), and that
// it's not ":", so the string does not represent an IPv6 address
_ => return Err(MalformedAddress(s.into())),
}
}
// At this point, we know we read an hextet.
address[hextet_index] = hextet;
offset += bytes_read;
hextet_index += 1;
// If this was the last hextet of if we reached the end of the buffer, we should be
// done
if hextet_index == 8 || offset == bytes.len() {
break
}
// Read the next charachter. After a hextet, we usually expect a column, but there's a special
// case for IPv6 that ends with an IPv4.
match bytes[offset] {
// We saw the column, we can continue
b':' => offset += 1,
// Handle the special IPv4 case, ie address like. Note that the hextet we just read
// is part of that IPv4 address:
//
// aaaa:bbbb:cccc:dddd:eeee:ffff:a.b.c.d.
// ^^
// ||
// hextet we just read, that ---+|
// is actually the first byte of +--- dot we're handling
// the ipv4.
b'.' => {
// The hextet was actually part of the IPv4, so not that we start reading the
// IPv4 at `offset - bytes_read`.
let ipv4: u32 = Ipv4Address::parse(&bytes[offset-bytes_read..])?.into();
// Replace the hextet we just read by the 16 most significant bits of the
// IPv4 address (a.b in the comment above)
address[hextet_index - 1] = ((ipv4 & 0xffff_0000) >> 16) as u16;
// Set the last hextet to the 16 least significant bits of the IPv4 address
// (c.d in the comment above)
address[hextet_index] = (ipv4 & 0x0000_ffff) as u16;
hextet_index += 1;
// After successfully parsing an IPv4, we should be done.
// If there are bytes left in the buffer, or if we didn't read enough hextet,
// we'll fail later.
break;
}
_ => return Err(MalformedAddress(s.into())),
}
} // end of loop
// If we exited the loop, we should have reached the end of the buffer.
// If there are trailing characters, parsing should fail.
if offset < bytes.len() {
return Err(MalformedAddress(s.into()));
}
if hextet_index == 8 && ellipsis.is_some() {
// We parsed an address that looks like 1111:2222::3333:4444:5555:6666:7777,
// ie with an empty ellipsis.
return Err(MalformedAddress(s.into()));
}
// We didn't parse enough hextets, but this may be due to an ellipsis
if hextet_index < 8 {
if let Some(ellipsis_index) = ellipsis {
// Count how many zeros the ellipsis accounts for
let nb_zeros = 8 - hextet_index;
// Shift the hextet that we read after the ellipsis by the number of zeros
for index in (ellipsis_index..hextet_index).rev() {
address[index+nb_zeros] = address[index];
address[index] = 0;
}
} else {
return Err(MalformedAddress(s.into()));
}
}
// Build the IPv6 address from the array of hextets
return Ok(Ipv6Address(
((address[0] as u128) << 112)
+ ((address[1] as u128) << 96)
+ ((address[2] as u128) << 90)
+ ((address[3] as u128) << 64)
+ ((address[4] as u128) << 48)
+ ((address[5] as u128) << 32)
+ ((address[6] as u128) << 16)
+ address[7] as u128))
}
}以下是我所用的帮手:
/// Check whether an ASCII character represents an hexadecimal digit
fn is_hex_digit(byte: u8) -> bool {
match byte {
b'0' ... b'9' | b'a' ... b'f' | b'A' ... b'F' => true,
_ => false,
}
}
/// Convert an ASCII character that represents an hexadecimal digit into this digit
fn hex_to_digit(byte: u8) -> u8 {
match byte {
b'0' ... b'9' => byte - b'0',
b'a' ... b'f' => byte - b'a' + 10,
b'A' ... b'F' => byte - b'A' + 10,
_ => unreachable!(),
}
}
/// Read up to four ASCII characters that represent hexadecimal digits, and return their value, as
/// well as the number of characters that were read. If not character is read, `(0, 0)` is
/// returned.
fn read_hextet(bytes: &[u8]) -> (usize, u16) {
let mut count = 0;
let mut digits: [u8; 4] = [0; 4];
for b in bytes {
if is_hex_digit(*b) {
digits[count] = hex_to_digit(*b);
count += 1;
if count == 4 {
break;
}
} else {
break;
}
}
if count == 0 {
return (0, 0);
}
let mut shift = (count - 1) * 4;
let mut res = 0;
for digit in &digits[0..count] {
res += (*digit as u16) << shift;
if shift >= 4 {
shift -= 4;
} else {
break;
}
}
(count, res)
}目前我不处理IPv4解析,所以我只是使用以下方法:
#[derive(Debug, Copy, Eq, PartialEq, Hash, Clone)]
pub struct Ipv4Address(u32);
impl Ipv4Address {
fn parse(_: &[u8]) -> Result<u32, MalformedAddress> {
unimplemented!();
}
}最后,这里是我使用的错误类型:
use std::fmt;
use std::error::Error;
#[derive(Debug)]
pub struct MalformedAddress(String);
impl fmt::Display for MalformedAddress {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "malformed address: \"{}\"", self.0)
}
}
impl Error for MalformedAddress {
fn description(&self) -> &str {
"the string cannot be parsed as an IP address"
}
fn cause(&self) -> Option<&Error> {
None
}
}发布于 2023-02-01 01:13:37
这不是一个完整的回顾,我只是很快地看了一下,注意到有一个部分看起来更像c,而不是铁锈。
is_hex_digit和hex_to_digit功能几乎是相同的,可以合并为
fn hex_digit(byte: u8) -> Option<u8> {
match byte {
b'0' ... b'9' => Some(byte - b'0'),
b'a' ... b'f' => Some(byte - b'a' + 10),
b'A' ... b'F' => Some(byte - b'A' + 10),
_ => None,
}
}那么这部分
if is_hex_digit(*b) {
digits[count] = hex_to_digit(*b);
count += 1;
if count == 4 {
break;
}
} else {
break;
}可以写成:
if let Some(digit) = hex_digit(*b) {
digits[count] = digit;
count += 1;
if count == 4 {
break;
}
} else {
break;
}我脑海中的第二件事是没有单元测试,这是一个完美的案例,在这种情况下,添加单元测试来解析一般案例和所有角落案例是有意义的。
在块的最后一条语句中,不需要显式返回。
// Build the IPv6 address from the array of hextets
return Ok(Ipv6Address(
((address[0] as u128) << 112)
+ ((address[1] as u128) << 96)
+ ((address[2] as u128) << 90)
+ ((address[3] as u128) << 64)
+ ((address[4] as u128) << 48)
+ ((address[5] as u128) << 32)
+ ((address[6] as u128) << 16)
+ address[7] as u128))应该用较短的单个Ok(ipv6)替换。
地址转换可以用let ipv6=unsafe{ std::mem::transmute::<[u16;8],u128>(address)来完成,但这可能会带来目标系统的某些问题,而且出于某种原因它是不安全的。into_bytes和from_bytes的一些变化在这里也是可能的。但是,我至少会用看起来更干净的按位or替换加法。
// Build the IPv6 address from the array of hextets
let ipv6:u128 = ((address[0] as u128) << 112)
| ((address[1] as u128) << 96)
| ((address[2] as u128) << 90)
| ((address[3] as u128) << 64)
| ((address[4] as u128) << 48)
| ((address[5] as u128) << 32)
| ((address[6] as u128) << 16)
| (( address[7] as u128));
Ok(ipv6)https://codereview.stackexchange.com/questions/196996
复制相似问题