我想使用TreeTagger模块来标记原始语料库上的词性信息。
由于通过Google Colab使用图形处理器似乎更快,我安装了TreeTagger模块,但Colab代码找不到TreeTagger目录。
错误类型如下: TreeTaggerError: Can't locate TreeTagger directory (未指定TAGDIR值)
请告诉我应该把treetagger文件夹放在哪里。
发布于 2021-06-12 14:37:35
您必须指定目录:
treetaggerwrapper.TreeTagger(TAGLANG='en', TAGDIR='treetagger/') # treetagger is the installation dirColab.中的安装
按照website上的说明进行操作。
在Colab的一个单元格中,您必须放入以下内容(对于其他(非英语)语言,请放入参数文件的其他链接):
%%bash
mkdir treetagger
cd treetagger
# Download the tagger package for your system (PC-Linux, Mac OS-X, ARM64, ARMHF, ARM-Android, PPC64le-Linux).
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.4.tar.gz
tar -xzvf tree-tagger-linux-3.2.4.tar.gz
# Download the tagging scripts into the same directory.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tagger-scripts.tar.gz
gunzip tagger-scripts.tar.gz
# Download the installation script install-tagger.sh.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/install-tagger.sh
# Download the parameter files for the languages you want to process.
# list of all files (parameter files) https://cis.lmu.de/~schmid/tools/TreeTagger/#parfiles
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/english.par.gz
sh install-tagger.sh
cd ..
sudo pip install treetaggerwrapper在下面的另一个单元中,您可以检查安装情况:
>>> import pprint # For proper print of sequences.
>>> import treetaggerwrapper
>>> #1) build a TreeTagger wrapper:
>>> tagger = treetaggerwrapper.TreeTagger(TAGLANG='en', TAGDIR='treetagger/')
>>> #2) tag your text.
>>> tags = tagger.tag_text("This is a very short text to tag.")
>>> #3) use the tags list... (list of string output from TreeTagger).
>>> pprint.pprint(tags)https://stackoverflow.com/questions/65955293
复制相似问题