问单词标记化在家庭中的效果与在协作中的结果不同
EN

Stack Overflow用户

提问于 2020-01-03 10:40:38

回答 1查看 31关注 0票数 0

本地：

$ python
Python 3.8.0 (default, Nov  6 2019, 15:27:39) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import collections
>>> import nltk
>>> nltk.download('punkt')
>>> nltk.download('averaged_perceptron_tagger')
>>> nltk.download('stopwords')
>>> stop_words = set(nltk.corpus.stopwords.words('english'))
>>> text = """Former Kansas Territorial Governor James W. Denver visited his namesake city in 1875 and in 1882."""
>>> def preprocess(document):
...     sentence_list = list()
...     for sentence in nltk.sent_tokenize(document):
...         word_tokens = nltk.word_tokenize(sentence)
...         sentence_list.append([w for w in word_tokens if not w in stop_words and len(w) > 1])
...     sentences = [nltk.pos_tag(sent) for sent in sentence_list]
...     return sentences
>>> grammar = r'Chunk: {(<A.*>*|<N.*>*|<VB[DGNP]?>*)+}'
>>> chunk_parser = nltk.RegexpParser(grammar)
>>> tagged = preprocess(text)
>>> result = collections.Counter()
>>> for sentence in tagged:
...     my_tree =  chunk_parser.parse(sentence)
...     for subtree in my_tree.subtrees():
...         if subtree.label() == 'Chunk':
...             leaves = [x[0] for x in subtree.leaves()]
...             phrase = " ".join(leaves)
...             result[phrase] += 1

在家的输出是：

>>> print(result.most_common(10))
[('Former Kansas Territorial Governor James W. Denver', 1), ('visited', 1), ('city', 1)]

Same code on Colaboratory，结果是：

>>> print(result.most_common(10))
[]

我在这两个地方运行了非NLTK代码，并得到了相同的输出。会不会是本地NLTK库有所不同？NLTK的不同版本？

nltk

google-colaboratory

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-01-03 11:41:53

我在本地运行python 3.8.0。我将其更改为3.6.9，现在我得到的结果与Colaboratory上的结果相同。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59572573

复制

相似问题

问单词标记化在家庭中的效果与在协作中的结果不同
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问单词标记化在家庭中的效果与在协作中的结果不同EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问单词标记化在家庭中的效果与在协作中的结果不同
EN