本地:
$ python
Python 3.8.0 (default, Nov 6 2019, 15:27:39)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import collections
>>> import nltk
>>> nltk.download('punkt')
>>> nltk.download('averaged_perceptron_tagger')
>>> nltk.download('stopwords')
>>> stop_words = set(nltk.corpus.stopwords.words('english'))
>>> text = """Former Kansas Territorial Governor James W. Denver visited his namesake city in 1875 and in 1882."""
>>> def preprocess(document):
... sentence_list = list()
... for sentence in nltk.sent_tokenize(document):
... word_tokens = nltk.word_tokenize(sentence)
... sentence_list.append([w for w in word_tokens if not w in stop_words and len(w) > 1])
... sentences = [nltk.pos_tag(sent) for sent in sentence_list]
... return sentences
>>> grammar = r'Chunk: {(<A.*>*|<N.*>*|<VB[DGNP]?>*)+}'
>>> chunk_parser = nltk.RegexpParser(grammar)
>>> tagged = preprocess(text)
>>> result = collections.Counter()
>>> for sentence in tagged:
... my_tree = chunk_parser.parse(sentence)
... for subtree in my_tree.subtrees():
... if subtree.label() == 'Chunk':
... leaves = [x[0] for x in subtree.leaves()]
... phrase = " ".join(leaves)
... result[phrase] += 1在家的输出是:
>>> print(result.most_common(10))
[('Former Kansas Territorial Governor James W. Denver', 1), ('visited', 1), ('city', 1)]Same code on Colaboratory,结果是:
>>> print(result.most_common(10))
[]我在这两个地方运行了非NLTK代码,并得到了相同的输出。会不会是本地NLTK库有所不同?NLTK的不同版本?
发布于 2020-01-03 11:41:53
我在本地运行python 3.8.0。我将其更改为3.6.9,现在我得到的结果与Colaboratory上的结果相同。
https://stackoverflow.com/questions/59572573
复制相似问题