반응형
    
    
    
  
														Notice
														
												
											
												
												
													Recent Posts
													
											
												
												
													Recent Comments
													
											
												
												
													Link
													
											
									| 일 | 월 | 화 | 수 | 목 | 금 | 토 | 
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 
| 9 | 10 | 11 | 12 | 13 | 14 | 15 | 
| 16 | 17 | 18 | 19 | 20 | 21 | 22 | 
| 23 | 24 | 25 | 26 | 27 | 28 | 29 | 
| 30 | 
													Tags
													
											
												
												- inorder
 - tensorflow
 - vscode
 - CES 2O21 참가
 - 재귀함수
 - Keras
 - postorder
 - 결합전문기관
 - CES 2O21 참여
 - web
 - web 용어
 - paragraph
 - web 개발
 - cudnn
 - bccard
 - C언어
 - 대이터
 - web 사진
 - pycharm
 - 머신러닝
 - KNeighborsClassifier
 - discrete_scatter
 - 데이터전문기관
 - java역사
 - mglearn
 - 웹 용어
 - classification
 - 자료구조
 - html
 - broscoding
 
													Archives
													
											
												
												- Today
 
- Total
 
bro's coding
NLP.Word Tokenization 본문
반응형
    
    
    
  
from nltk.tokenize import word_tokenize  
print(word_tokenize("Local farmers are protesting strengthened regulations on living shelters for foreign laborers introduced in the wake of a Cambodian worker’s death, claiming the measures are ill-devised and demanding more support from the government."))
'''
['Local', 'farmers', 'are', 'protesting', 'strengthened', 'regulations', 'on', 'living', 'shelters', 'for', 'foreign', 'laborers', 'introduced', 'in', 'the', 'wake', 'of', 'a', 'Cambodian', 'worker', '’', 's', 'death', ',', 'claiming', 'the', 'measures', 'are', 'ill-devised', 'and', 'demanding', 'more', 'support', 'from', 'the', 'government', '.']
'''
from nltk.tokenize import WordPunctTokenizer  
print(WordPunctTokenizer().tokenize("Local farmers are protesting strengthened regulations on living shelters for foreign laborers introduced in the wake of a Cambodian worker’s death, claiming the measures are ill-devised and demanding more support from the government."))
'''
['Local', 'farmers', 'are', 'protesting', 'strengthened', 'regulations', 'on', 'living', 'shelters', 'for', 'foreign', 'laborers', 'introduced', 'in', 'the', 'wake', 'of', 'a', 'Cambodian', 'worker', '’', 's', 'death', ',', 'claiming', 'the', 'measures', 'are', 'ill', '-', 'devised', 'and', 'demanding', 'more', 'support', 'from', 'the', 'government', '.']
'''
from tensorflow.keras.preprocessing.text import text_to_word_sequence
print(text_to_word_sequence("Local farmers are protesting strengthened regulations on living shelters for foreign laborers introduced in the wake of a Cambodian worker’s death, claiming the measures are ill-devised and demanding more support from the government."))
'''
['local', 'farmers', 'are', 'protesting', 'strengthened', 'regulations', 'on', 'living', 'shelters', 'for', 'foreign', 'laborers', 'introduced', 'in', 'the', 'wake', 'of', 'a', 'cambodian', 'worker’s', 'death', 'claiming', 'the', 'measures', 'are', 'ill', 'devised', 'and', 'demanding', 'more', 'support', 'from', 'the', 'government']
'''
from nltk.tokenize import TreebankWordTokenizer
tokenizer=TreebankWordTokenizer()
text="We also have passed all inspections when obtaining our employment permits, and fire safety devices are installed inside each cargo container. While the death of the foreign worker is to be mourned, many farmers will agree with me that the situation is only a very extreme and minor view of the farming sector.By Ko Jun-tae (ko.juntae@heraldcorp.com)"
print(tokenizer.tokenize(text))
'''
['We', 'also', 'have', 'passed', 'all', 'inspections', 'when', 'obtaining', 'our', 'employment', 'permits', ',', 'and', 'fire', 'safety', 'devices', 'are', 'installed', 'inside', 'each', 'cargo', 'container.', 'While', 'the', 'death', 'of', 'the', 'foreign', 'worker', 'is', 'to', 'be', 'mourned', ',', 'many', 'farmers', 'will', 'agree', 'with', 'me', 'that', 'the', 'situation', 'is', 'only', 'a', 'very', 'extreme', 'and', 'minor', 'view', 'of', 'the', 'farming', 'sector.By', 'Ko', 'Jun-tae', '(', 'ko.juntae', '@', 'heraldcorp.com', ')']
'''반응형
    
    
    
  '[AI] > NLP' 카테고리의 다른 글
| ChatBot.DialogFlow (0) | 2021.03.02 | 
|---|---|
| NLP.한국어 문장 토큰화.KSS(Korean Sentence Splitter) (0) | 2021.02.03 | 
| NLP.Sentence Tokenization (0) | 2021.02.03 | 
			  Comments