Python, pandas, データフレームの操作(メソッド)、count, describe

データフレーム作成。

In [41]: df = pd.DataFrame(np.random.randn(10,5))

In [42]: df
Out[42]:
          0         1         2         3         4
0  0.077817 -0.759317 -0.229288  1.330958  0.083665
1  0.356439 -0.977023  1.361504 -0.573934  0.391663
2  0.441791 -1.374674  2.725946  0.301030 -1.610476
3  0.344326  1.048565  0.425799 -0.458068  1.657750
4 -0.949965  0.460259  0.183424 -0.325335  1.090816
5  0.732572 -0.670865 -1.052756 -0.121706  1.520297
6  1.376255 -0.870515 -1.682764  1.858966 -1.593190
7  1.742534 -0.149480 -0.031331  0.292201 -2.005993
8 -1.571542  0.557061 -1.280689  0.350430  0.482291
9  0.661776 -0.581982 -0.720512  0.334272 -0.043694

In [43]: df.columns = list('abcde')

In [44]: df
Out[44]:
          a         b         c         d         e
0  0.077817 -0.759317 -0.229288  1.330958  0.083665
1  0.356439 -0.977023  1.361504 -0.573934  0.391663
2  0.441791 -1.374674  2.725946  0.301030 -1.610476
3  0.344326  1.048565  0.425799 -0.458068  1.657750
4 -0.949965  0.460259  0.183424 -0.325335  1.090816
5  0.732572 -0.670865 -1.052756 -0.121706  1.520297
6  1.376255 -0.870515 -1.682764  1.858966 -1.593190
7  1.742534 -0.149480 -0.031331  0.292201 -2.005993
8 -1.571542  0.557061 -1.280689  0.350430  0.482291
9  0.661776 -0.581982 -0.720512  0.334272 -0.043694

で、describe

In [46]: df.describe()
Out[46]:
               a          b          c          d          e
count  10.000000  10.000000  10.000000  10.000000  10.000000
mean    0.321200  -0.331797  -0.030067   0.298881  -0.002687
std     0.982764   0.782483   1.318384   0.774907   1.324976
min    -1.571542  -1.374674  -1.682764  -0.573934  -2.005993
25%     0.144444  -0.842716  -0.969695  -0.274428  -1.205816
50%     0.399115  -0.626424  -0.130309   0.296615   0.237664
75%     0.714873   0.307824   0.365205   0.346391   0.938685
max     1.742534   1.048565   2.725946   1.858966   1.657750

で、他のメソッド

In [47]: df.count()
Out[47]:
a    10
b    10
c    10
d    10
e    10
dtype: int64

In [48]: df.mean()
Out[48]:
a    0.321200
b   -0.331797
c   -0.030067
d    0.298881
e   -0.002687
dtype: float64

df.count()の戻し値はシリーズ/Seriesなので、 一連のシリーズの操作が可能。

In [49]: se=df.count()

In [50]: se[se == 0]
Out[50]: Series([], dtype: int64)

In [51]: se[se == 10]
Out[51]:
a    10
b    10
c    10
d    10
e    10
dtype: int64

In [52]: se.keys()
Out[52]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [53]: se.values
Out[53]: array([10, 10, 10, 10, 10])

あと、meanの値によって、列を選択する技。 df.ix[:, 式]がポイント。

In [60]: se = df.mean()

In [61]: se
Out[61]:
a    0.321200
b   -0.331797
c   -0.030067
d    0.298881
e   -0.002687
dtype: float64

In [62]: df.ix[:, se > 0]
Out[62]:
          a         d
0  0.077817  1.330958
1  0.356439 -0.573934
2  0.441791  0.301030
3  0.344326 -0.458068
4 -0.949965 -0.325335
5  0.732572 -0.121706
6  1.376255  1.858966
7  1.742534  0.292201
8 -1.571542  0.350430
9  0.661776  0.334272

In [63]: df.ix[:, se < 0]
Out[63]:
          b         c         e
0 -0.759317 -0.229288  0.083665
1 -0.977023  1.361504  0.391663
2 -1.374674  2.725946 -1.610476
3  1.048565  0.425799  1.657750
4  0.460259  0.183424  1.090816
5 -0.670865 -1.052756  1.520297
6 -0.870515 -1.682764 -1.593190
7 -0.149480 -0.031331 -2.005993
8  0.557061 -1.280689  0.482291
9 -0.581982 -0.720512 -0.043694

10mins pandas