반응형
Notice
Recent Posts
Recent Comments
Link
관리 메뉴

bro's coding

NLP.Word Tokenization 본문

[AI]/NLP

NLP.Word Tokenization

givemebro 2021. 2. 3. 13:41
반응형

 

from nltk.tokenize import word_tokenize  
print(word_tokenize("Local farmers are protesting strengthened regulations on living shelters for foreign laborers introduced in the wake of a Cambodian worker’s death, claiming the measures are ill-devised and demanding more support from the government."))

'''
['Local', 'farmers', 'are', 'protesting', 'strengthened', 'regulations', 'on', 'living', 'shelters', 'for', 'foreign', 'laborers', 'introduced', 'in', 'the', 'wake', 'of', 'a', 'Cambodian', 'worker', '’', 's', 'death', ',', 'claiming', 'the', 'measures', 'are', 'ill-devised', 'and', 'demanding', 'more', 'support', 'from', 'the', 'government', '.']
'''

 

from nltk.tokenize import WordPunctTokenizer  
print(WordPunctTokenizer().tokenize("Local farmers are protesting strengthened regulations on living shelters for foreign laborers introduced in the wake of a Cambodian worker’s death, claiming the measures are ill-devised and demanding more support from the government."))

'''
['Local', 'farmers', 'are', 'protesting', 'strengthened', 'regulations', 'on', 'living', 'shelters', 'for', 'foreign', 'laborers', 'introduced', 'in', 'the', 'wake', 'of', 'a', 'Cambodian', 'worker', '’', 's', 'death', ',', 'claiming', 'the', 'measures', 'are', 'ill', '-', 'devised', 'and', 'demanding', 'more', 'support', 'from', 'the', 'government', '.']
'''

 

from tensorflow.keras.preprocessing.text import text_to_word_sequence
print(text_to_word_sequence("Local farmers are protesting strengthened regulations on living shelters for foreign laborers introduced in the wake of a Cambodian worker’s death, claiming the measures are ill-devised and demanding more support from the government."))

'''
['local', 'farmers', 'are', 'protesting', 'strengthened', 'regulations', 'on', 'living', 'shelters', 'for', 'foreign', 'laborers', 'introduced', 'in', 'the', 'wake', 'of', 'a', 'cambodian', 'worker’s', 'death', 'claiming', 'the', 'measures', 'are', 'ill', 'devised', 'and', 'demanding', 'more', 'support', 'from', 'the', 'government']
'''

 

from nltk.tokenize import TreebankWordTokenizer
tokenizer=TreebankWordTokenizer()
text="We also have passed all inspections when obtaining our employment permits, and fire safety devices are installed inside each cargo container. While the death of the foreign worker is to be mourned, many farmers will agree with me that the situation is only a very extreme and minor view of the farming sector.By Ko Jun-tae (ko.juntae@heraldcorp.com)"
print(tokenizer.tokenize(text))

'''
['We', 'also', 'have', 'passed', 'all', 'inspections', 'when', 'obtaining', 'our', 'employment', 'permits', ',', 'and', 'fire', 'safety', 'devices', 'are', 'installed', 'inside', 'each', 'cargo', 'container.', 'While', 'the', 'death', 'of', 'the', 'foreign', 'worker', 'is', 'to', 'be', 'mourned', ',', 'many', 'farmers', 'will', 'agree', 'with', 'me', 'that', 'the', 'situation', 'is', 'only', 'a', 'very', 'extreme', 'and', 'minor', 'view', 'of', 'the', 'farming', 'sector.By', 'Ko', 'Jun-tae', '(', 'ko.juntae', '@', 'heraldcorp.com', ')']
'''
반응형

'[AI] > NLP' 카테고리의 다른 글

ChatBot.DialogFlow  (0) 2021.03.02
NLP.한국어 문장 토큰화.KSS(Korean Sentence Splitter)  (0) 2021.02.03
NLP.Sentence Tokenization  (0) 2021.02.03
Comments