这个问题有点让人困惑,所以我只举一个例子。
假设我有以下情况:
$ grep -P "locus_tag\tM715_1000193188" Genome.tbl -B1 -A8
193188 193066 gene
locus_tag M715_1000193188
193188 193066 mRNA
product hypothetical protein
protein_id gnl|CorradiLab|M715_1000193188
transcript_id gnl|CorradiLab|M715_mrna1000193188
193188 193066 CDS
product hypothetical protein
protein_id gnl|CorradiLab|M715_1000193188
transcript_id gnl|CorradiLab|M715_mrna1000193188我想在"locus_tag M715_1000193188“后面的8行中添加"#”,这样修改后的文件将如下所示:
193188 193066 gene
locus_tag M715_1000193188
#193188 193066 mRNA
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
#193188 193066 CDS
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188基本上,我有一个包含大约3000个不同locus标签的文件,对于其中的300个,我需要注释掉mRNA和CDS功能,所以locus_tag行后面的8行代码。
有没有可能用sed来做到这一点?文件中还有其他类型的信息需要保持不变。
谢谢你,禤浩焯
发布于 2015-04-29 02:01:42
如果您可以使用awk,则应执行以下操作:
awk 'f&&f-- {$0="#"$0} /locus_tag/ {f=8} 1' file
193188 193066 gene
locus_tag M715_1000193188
#193188 193066 mRNA
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
#193188 193066 CDS
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188发布于 2015-04-29 02:08:44
sed支持range Addresses,可以在这里做你想做的事情。
sed -e '/locus_tag\tM715_1000193188/,+8s/^/#/' file如注释中所述,此范围地址格式是特定于GNU sed的。
发布于 2015-04-29 02:21:58
$ cat tst.awk
BEGIN { split(tags,tmp); for (i in tmp) tagsA[tmp[i]] }
c&&c-- { $0 = "#" $0 }
($(NF-1) == "locus_tag") && ($NF in tagsA) { c=8 }
{ print }
$ awk -v tags="M715_1000193188 M715_1000193189 M715_1000193190" -f tst.awk file
193188 193066 gene
locus_tag M715_1000193188
#193188 193066 mRNA
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
#193188 193066 CDS
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188只需列出你关心的所有300个轨迹标记值,如上面的3个示例所示。
https://stackoverflow.com/questions/29926593
复制相似问题