반응형
Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
Tags
- broscoding
- html
- vscode
- KNeighborsClassifier
- classification
- bccard
- pycharm
- java역사
- 결합전문기관
- cudnn
- CES 2O21 참가
- Keras
- tensorflow
- C언어
- web 개발
- 재귀함수
- 자료구조
- web
- web 사진
- 대이터
- inorder
- 머신러닝
- CES 2O21 참여
- discrete_scatter
- web 용어
- 데이터전문기관
- postorder
- paragraph
- mglearn
- 웹 용어
Archives
- Today
- Total
bro's coding
sklearn.feature_extraction.text.CountVectorizer.ngram_range적용 본문
[AI]/python.sklearn
sklearn.feature_extraction.text.CountVectorizer.ngram_range적용
givemebro 2020. 4. 28. 15:23반응형
ngram : n개의 단어로 만든 단어집
ex) s='I am Tam'
2gram : [I am], [am Tam]
from sklearn.feature_extraction.text import CountVectorizer
vect=CountVectorizer(ngram_range=(1,2))
X_train=vect.fit_transform(text_train)
len(vect.get_feature_names())
1522634
# 간략하게 2단어로 이뤄진 word 확인
#1
# count=0
# for key,value in vect.vocabulary_.items():
# # value=vect.vocabulary_[key]
# if len(key.split())>1:
# print(key,value)
# count+=1
# if count>=100:break
#2
# count=0
# for key in vect.vocabulary_:
# value=vect.vocabulary_[key]
# if len(key.split())>1:
# print(key,value)
# count+=1
# if count>=100:break
#3
# count=0
# for i,(key,value) in enumerate(vect.vocabulary_.items()):
# # value=vect.vocabulary_[key]
# if len(key.split())>1:
# print(i,key,value) # i : dict 에서 불러온 순서
# count+=1
# if count>=100:break
# ngram_range=(1,2) vect에 Tfidf,LogisticRegression 적용
from sklearn.feature_extraction.text import TfidfVectorizer
vect=TfidfVectorizer(min_df=5,ngram_range=(1,2))
X_train=vect.fit_transform(text_train)
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
scores=cross_val_score(LogisticRegression(),X_train,y_train)
scores
array([0.88540917, 0.89020878, 0.88754201])
반응형
'[AI] > python.sklearn' 카테고리의 다른 글
활성함수를 사용하는 이유 (0) | 2020.07.03 |
---|---|
sklearn.TfidfVectorizer(tokenizer=twitter_tag.morphs).LogisticRegression (0) | 2020.04.29 |
sklearn.decomposition.LatentDirichletAllocation (0) | 2020.04.28 |
sklearn.feature_extraction.text.CountVectorizer.ngram.LogisticRegression.2단어들만 출력 (0) | 2020.04.28 |
sklearn.feature_extraction.text.TfidfTransformer.LogisticRegression적용 (0) | 2020.04.28 |
sklearn.feature_extraction.text.TfidfTransformer (0) | 2020.04.28 |
sklearn.feature_extraction.text.CountVectorizer.stop_words적용 (0) | 2020.04.28 |
sklearn.feature_extraction.text.CountVectorizer.max_df변화 관찰 (0) | 2020.04.28 |
Comments