반응형
Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
Tags
- mglearn
- discrete_scatter
- cudnn
- tensorflow
- 대이터
- KNeighborsClassifier
- web 개발
- CES 2O21 참여
- broscoding
- C언어
- web 용어
- 웹 용어
- web
- 결합전문기관
- html
- inorder
- vscode
- postorder
- java역사
- classification
- CES 2O21 참가
- bccard
- 데이터전문기관
- web 사진
- 자료구조
- 머신러닝
- paragraph
- pycharm
- 재귀함수
- Keras
Archives
- Today
- Total
bro's coding
NLP.Word Tokenization 본문
반응형
from nltk.tokenize import word_tokenize
print(word_tokenize("Local farmers are protesting strengthened regulations on living shelters for foreign laborers introduced in the wake of a Cambodian worker’s death, claiming the measures are ill-devised and demanding more support from the government."))
'''
['Local', 'farmers', 'are', 'protesting', 'strengthened', 'regulations', 'on', 'living', 'shelters', 'for', 'foreign', 'laborers', 'introduced', 'in', 'the', 'wake', 'of', 'a', 'Cambodian', 'worker', '’', 's', 'death', ',', 'claiming', 'the', 'measures', 'are', 'ill-devised', 'and', 'demanding', 'more', 'support', 'from', 'the', 'government', '.']
'''
from nltk.tokenize import WordPunctTokenizer
print(WordPunctTokenizer().tokenize("Local farmers are protesting strengthened regulations on living shelters for foreign laborers introduced in the wake of a Cambodian worker’s death, claiming the measures are ill-devised and demanding more support from the government."))
'''
['Local', 'farmers', 'are', 'protesting', 'strengthened', 'regulations', 'on', 'living', 'shelters', 'for', 'foreign', 'laborers', 'introduced', 'in', 'the', 'wake', 'of', 'a', 'Cambodian', 'worker', '’', 's', 'death', ',', 'claiming', 'the', 'measures', 'are', 'ill', '-', 'devised', 'and', 'demanding', 'more', 'support', 'from', 'the', 'government', '.']
'''
from tensorflow.keras.preprocessing.text import text_to_word_sequence
print(text_to_word_sequence("Local farmers are protesting strengthened regulations on living shelters for foreign laborers introduced in the wake of a Cambodian worker’s death, claiming the measures are ill-devised and demanding more support from the government."))
'''
['local', 'farmers', 'are', 'protesting', 'strengthened', 'regulations', 'on', 'living', 'shelters', 'for', 'foreign', 'laborers', 'introduced', 'in', 'the', 'wake', 'of', 'a', 'cambodian', 'worker’s', 'death', 'claiming', 'the', 'measures', 'are', 'ill', 'devised', 'and', 'demanding', 'more', 'support', 'from', 'the', 'government']
'''
from nltk.tokenize import TreebankWordTokenizer
tokenizer=TreebankWordTokenizer()
text="We also have passed all inspections when obtaining our employment permits, and fire safety devices are installed inside each cargo container. While the death of the foreign worker is to be mourned, many farmers will agree with me that the situation is only a very extreme and minor view of the farming sector.By Ko Jun-tae (ko.juntae@heraldcorp.com)"
print(tokenizer.tokenize(text))
'''
['We', 'also', 'have', 'passed', 'all', 'inspections', 'when', 'obtaining', 'our', 'employment', 'permits', ',', 'and', 'fire', 'safety', 'devices', 'are', 'installed', 'inside', 'each', 'cargo', 'container.', 'While', 'the', 'death', 'of', 'the', 'foreign', 'worker', 'is', 'to', 'be', 'mourned', ',', 'many', 'farmers', 'will', 'agree', 'with', 'me', 'that', 'the', 'situation', 'is', 'only', 'a', 'very', 'extreme', 'and', 'minor', 'view', 'of', 'the', 'farming', 'sector.By', 'Ko', 'Jun-tae', '(', 'ko.juntae', '@', 'heraldcorp.com', ')']
'''
반응형
'[AI] > NLP' 카테고리의 다른 글
ChatBot.DialogFlow (0) | 2021.03.02 |
---|---|
NLP.한국어 문장 토큰화.KSS(Korean Sentence Splitter) (0) | 2021.02.03 |
NLP.Sentence Tokenization (0) | 2021.02.03 |
Comments