我正在尝试用bash编写一个简单的脚本来查询LibreOffice词库扩展名为文本文件。对于每个输入查询字符串,我希望输出是所有相关字符串。我想在巴什做这个。
要下载和解压缩同义词库,我需要
wget "https://extensions.libreoffice.org/assets/downloads/41/1653961771/dict-en-20220601_lo.oxt" # download LO dictionary & thesaurus
unzip -p dict-en-20220601_lo.oxt th_en_US_v2.dat > lo # extract contents of thesaurus to text file查看文本文件的一部分:
nine|3
(adj)|9|ix|cardinal (similar term)
(noun)|9|IX|niner|Nina from Carolina|ennead|digit (generic term)|figure (generic term)
(noun)|baseball club|ball club|club|baseball team (generic term)
nine-banded armadillo|1
(noun)|peba|Texas armadillo|Dasypus novemcinctus|armadillo (generic term)
nine-fold|1
(adj)|nonuple|ninefold|multiple (similar term)
nine-membered|1
(adj)|9-membered|membered (similar term)
nine-sided|1
(adj)|multilateral (similar term)|many-sided (similar term)
nine-spot|1
(noun)|spot (generic term)因此,例如,我希望能够输入“9”作为查询,并让bash返回类似的内容
9
ix
cardinal
9
IX
niner
Nina from Carolina
ennead
digit
figure
baseball club
ball club
club
baseball team我认为在awk或sed中使用正确的语法应该相当容易,特别是因为所有包含查询术语的行都不是以"(“开头,而所有包含相关术语的行都以"(”开头)。
但我还是个新手,还没弄明白。对我来说,问题的关键似乎是将查询术语和所有相关的术语放在一行上。从那里,我知道如何sed我的方式到胜利。但要达到这一点对我来说是很有挑战性的。
蒂娅谢谢你的帮助!
附注:我正在尝试做类似的事情,但我的情况有点不同,我不太了解语法,无法根据我的需要修改它:https://www.unix.com/unix-for-dummies-questions-and-answers/184649-sed-join-lines-do-not-match-pattern.html
发布于 2022-07-08 00:15:03
这可能对您有用(GNU sed):
v=nine
sed -n ':a;/^'"${v}"'|/{:b;n;/^[^(]/ba;s/^[^|]*|\| ([^)]*)//g;y/|/\n/;p;bb}' file将焦点放在输入变量匹配之后的任何行上。
取下一行,如果它不是以(开头,则重复上面的代码。
否则,删除第一个字段和父类之间的任何值,将字段分隔符|替换为换行符,打印结果并重复。
v=nine # set variable v to `nine`
sed -n ':a # turn off implicit printing and set goto label a
/^'"${v}"'|/{ # match a line beginning with variable v
:b # set goto label b
n # fetch next line (do not print see option -n)
/^[^(]/ba # goto label a if line does not begin (
s/^[^|]*|\| ([^)]*)//g # remove first field and parens
y/|/\n/ # translate | to newline for entire line
p # print the result
bb # goto label b
}' file要查看sed脚本的作用,请调用--debug选项。
发布于 2022-07-07 22:49:30
使用sed
$ cat script.sed
N
{
/\(/ {
/9/!s/[^|]*\|//
s/\n/ /
{
/[^|]*\|(9\|)/ {
s//\1/
s/([^|]*)\|/\1\n/g
s/\([^)]*\)//
s/\([^)]*\)//g
p
}
}
}
}$ sed -Enf script.sed input_file
nine
9
ix
cardinal
9
IX
niner
Nina from Carolina
ennead
digit
figure
baseball club
ball club
club
baseball team发布于 2022-07-07 21:13:00
如果我理解你的问题,奥克解决方案
文件search.awk
#! /usr/bin/awk -f
# This block is executed BEFORE input file treatment.
BEGIN {
# Field Separator
FS = "|"
}
# The next blocks are executed for each input file line only if the condition in front of the block is true
# '$1' is the first field/column. Remember, field separator in the pipe (|)
$1 == KEY {
# Key found, flag it
flag = 1
# Associated words init
words = ""
# Do not check the next blocks conditions, process the next line of the input file
next
}
# If the flag is 1 and the line begins with an open parenthesis.
flag == 1 && $0 ~ /^\(/ {
# Association found
# For all associations (field)
# The line treatment starts with the second field
idx = 2
# NF is the Number of Fields in the current line
while (idx <= NF) {
# get the current field word (idx in the field number, $ids it is its value)
word = $idx
# remove term in parenthesis
# (in fact, replace all characters after the ' (' token by an empty string)
gsub(/ \(.*$/, "", word)
# save it (add it in 'words' string with a coma as separator)
words = words "," word
# next field
idx += 1
}
}
# If the flag is 1 and the line NOT begins with an open parenthesis.
# It's the end of KEY treatment
flag == 1 && $0 !~ /^\(/ {
# End of association
flag = 0
# Print Key and words
if (words != "") {
print KEY words
}
# Reinit words
words = ""
}
# This block is executed AFTER input file treatment.
END {
# Special case, last word in thesaurus
# Print Key and words
if (words != "") {
print KEY words
}
}可执行文件:
chmod 755 ./search.awk像这样使用:
./search.awk -v KEY="nine" lo输出:
nine,9,ix,cardinal,9,IX,niner,Nina from Carolina,ennead,digit,figure,baseball club,ball club,club,baseball teamhttps://stackoverflow.com/questions/72903668
复制相似问题