반응형
Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
Tags
- html
- cudnn
- vscode
- 재귀함수
- KNeighborsClassifier
- 데이터전문기관
- 웹 용어
- pycharm
- 머신러닝
- paragraph
- inorder
- web 사진
- java역사
- bccard
- classification
- discrete_scatter
- 결합전문기관
- mglearn
- 자료구조
- Keras
- web
- 대이터
- CES 2O21 참여
- CES 2O21 참가
- web 용어
- C언어
- web 개발
- postorder
- broscoding
- tensorflow
Archives
- Today
- Total
bro's coding
sklearn.textdata.단어집과 문장 대조하기 본문
반응형
# 단어집과 문장 대조하기
# for i in range(X_train[0].shape[1]):
# if X_train[0,i]>0:
# print(i,vect.get_feature_names()[i],X_train[0,i]) # get_feature_names을 반복문 안에 넣어놓으면 오래걸린다(따로 변수 선언)
feature_name=vect.get_feature_names()
for i in range(X_train[0].shape[1]):
if X_train[0,i]>0:
print(i,feature_name[i],X_train[0,i])
'''
1723 actions 1
1741 actors 1
2880 almost 1
3375 and 2
3859 anything 1
4269 are 1
6512 be 1
6852 being 2
7288 better 2
7341 beyond 1
7716 bizarre 1
8922 boys 1
10096 but 2
10809 captures 1
12845 civility 1
12958 classmates 1
13888 commit 1
13907 common 1
15414 coupled 1
16986 day 1
17219 decided 1
17460 define 1
18214 destruction 1
18588 did 1
19634 do 1
21607 elephant 1
23059 even 1
23541 explaining 1
24147 far 2
24904 film 2
24942 filmmaker 1
25360 flawed 1
25839 for 1
26582 from 1
27726 gets 1
28034 given 1
29807 had 1
30570 have 1
31970 honest 1
31972 honesty 1
32540 humans 1
33505 in 3
35099 is 4
35211 it 5
38653 leads 1
39336 likely 1
42764 men 1
43993 mode 1
44193 money 1
44676 motives 1
44779 movie 1
45110 murderers 1
45209 must 1
45268 mutual 2
46714 not 1
47352 of 4
47900 order 1
48156 our 1
48610 own 1
49947 perfect 1
52605 product 1
54367 rationalistic 1
54503 re 1
55513 remarkable 1
59385 see 1
61440 skin 1
61588 slaughtering 1
65104 suicide 2
67035 terms 2
67049 terrible 1
67198 than 2
67222 that 1
67244 the 3
67280 their 2
67409 they 1
67468 think 2
67883 time 1
68091 to 4
69757 two 3
70279 under 1
72211 via 2
73731 what 3
73935 who 1
73998 why 1
74378 with 1
74379 withdraw 1
74699 world 1
74762 would 1
75381 you 2
75392 young 2
75669 zero 1
'''
# 각 문장에 포함되는 단어수 찾기
(X_train>0).sum(axis=1) # 중복 제거
matrix([[ 91],
[120],
[ 62],
...,
[117],
[121],
[211]])
X_train.sum(axis=1) # 중복 허용
matrix([[127],
[192],
[ 79],
...,
[184],
[178],
[361]], dtype=int64)
반응형
'[AI] > python.sklearn' 카테고리의 다른 글
sklearn.feature_extraction.text.CountVectorizer.max_df변화 관찰 (0) | 2020.04.28 |
---|---|
sklearn.feature_extraction.text.CountVectorizer.min_df변화 관찰 (0) | 2020.04.28 |
sklearn.textdata.BernoulliNB적용 (0) | 2020.04.28 |
sklearn.textdata.LogisticRegression적용 (0) | 2020.04.27 |
sklearn.feature_extraction.text.CountVectorizer (0) | 2020.04.27 |
sklearn.textdata.datasets.load_files (0) | 2020.04.27 |
sklearn.base.BaseEstimator, TransformerMixin(추정기 만들기) (0) | 2020.04.27 |
sklearn.base.BaseEstimator, ClassifierMixin(분류기 만들기) (0) | 2020.04.27 |
Comments