<Python, pandas> duplicate - ねこゆきのメモ

重複したものを探すには、duplicate()。

In [85]: import pandas as pd

In [86]: df = pd.DataFrame([[1,2],[1,3],[1,4]])

In [87]: df
Out[87]: 
   0  1
0  1  2
1  1  3
2  1  4

で、duplicate()。
引数argvは、カラムcolumnを指定可。

In [88]: df.duplicated()
Out[88]: 
0    False
1    False
2    False
dtype: bool

In [89]: df.duplicated(0)
Out[89]: 
0    False
1     True
2     True
dtype: bool

In [90]: df.duplicated(1)
Out[90]: 
0    False
1    False
2    False
dtype: bool

最後にヒットしたものをFlaseにするには、keep='last'。

In [91]: df.duplicated(0, keep='last')
Out[91]: 
0     True
1     True
2    False

マニュアル。

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html