sklearn.feature_extraction.text.CountVectorizer.ngram

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

bro's coding

sklearn.feature_extraction.text.CountVectorizer.ngram_range적용 본문

[AI]/python.sklearn

sklearn.feature_extraction.text.CountVectorizer.ngram_range적용

givemebro 2020. 4. 28. 15:23

ngram : n개의 단어로 만든 단어집

ex) s='I am Tam'
2gram : [I am], [am Tam]

from sklearn.feature_extraction.text import CountVectorizer

vect=CountVectorizer(ngram_range=(1,2))
X_train=vect.fit_transform(text_train)
len(vect.get_feature_names())

# 간략하게 2단어로 이뤄진 word 확인

#1
# count=0
# for key,value in vect.vocabulary_.items():
#     # value=vect.vocabulary_[key]
#     if len(key.split())>1:
#         print(key,value)
#         count+=1
#     if count>=100:break

#2
# count=0
# for key in vect.vocabulary_:
#     value=vect.vocabulary_[key]
#     if len(key.split())>1:
#         print(key,value)
#         count+=1
#     if count>=100:break

#3
# count=0
# for i,(key,value) in enumerate(vect.vocabulary_.items()):
# #     value=vect.vocabulary_[key]
#     if len(key.split())>1:
#         print(i,key,value) # i : dict 에서 불러온 순서
#         count+=1
#     if count>=100:break

# ngram_range=(1,2) vect에 Tfidf,LogisticRegression 적용
from sklearn.feature_extraction.text import TfidfVectorizer
vect=TfidfVectorizer(min_df=5,ngram_range=(1,2))
X_train=vect.fit_transform(text_train)
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
scores=cross_val_score(LogisticRegression(),X_train,y_train)
scores

array([0.88540917, 0.89020878, 0.88754201])

저작자표시

'[AI] > python.sklearn' 카테고리의 다른 글

활성함수를 사용하는 이유 (0)	2020.07.03
sklearn.TfidfVectorizer(tokenizer=twitter_tag.morphs).LogisticRegression (0)	2020.04.29
sklearn.decomposition.LatentDirichletAllocation (0)	2020.04.28
sklearn.feature_extraction.text.CountVectorizer.ngram.LogisticRegression.2단어들만 출력 (0)	2020.04.28
sklearn.feature_extraction.text.TfidfTransformer.LogisticRegression적용 (0)	2020.04.28
sklearn.feature_extraction.text.TfidfTransformer (0)	2020.04.28
sklearn.feature_extraction.text.CountVectorizer.stop_words적용 (0)	2020.04.28
sklearn.feature_extraction.text.CountVectorizer.max_df변화 관찰 (0)	2020.04.28

'[AI]/python.sklearn' Related Articles

Comments

bro's coding

sklearn.feature_extraction.text.CountVectorizer.ngram_range적용 본문

sklearn.feature_extraction.text.CountVectorizer.ngram_range적용

'[AI] > python.sklearn' 카테고리의 다른 글

티스토리툴바