朋友们开车从状态A到状态H,经过状态B,C,D,E,F,G和H。他们在状态C和状态F交换司机,这些状态被称为“互换”。我有每个状态之间经过的时间。从这些数据中,我找到了每个状态和最终目的地之间的时间,以及每个状态和下一次交换之间的时间。我现在需要找到的是交换之间的时间,即状态C和状态F之间的时间。我需要在一个大型数据集中为个人旅行做这三件事。
我最初的计划是找出互换之间的时间,即从第一次互通的ETA中减去第二次互通的ETA。那么,在一个更大的groupby对象中,如何找到group的最后一行和下一组的最后一行之间的差异呢?也就是说,在一个Trip_Key中,找出最后一个A1预计时间和最后一个A3预计时间之间的区别?谢谢!
下面是生成我的数据帧的代码:
user_dict2 = {'A': {('A'):{'eta':0,'type':'T'},('B'):{'eta':1,'type':'T'},('C'):{'eta':2,'type':'I'},('D'):{'eta':3,'type':'T'},
('E'):{'eta':4,'type':'T'},('F'):{'eta':5,'type':'I'},('G'):{'eta':6,'type':'T'},('H'):{'eta':7,'type':'T'}},
'B':{('A'):{'eta':0,'type':'T'},('B'):{'eta':1,'type':'T'},('C'):{'eta':2,'type':'I'},('D'):{'eta':3,'type':'T'},
('E'):{'eta':4,'type':'T'},('F'):{'eta':5,'type':'I'},('G'):{'eta':6,'type':'T'},('H'):{'eta':7,'type':'T'}}}
d = pd.DataFrame.from_dict({(i,j): user_dict2[i][j]
for i in user_dict2.keys()
for j in user_dict2[i].keys()},
orient='index')
d = d.reset_index()
d['Trip_Key'] = d['level_0']
d['State'] = d['level_1']
del d['level_0']
del d['level_1']
# Groupby Trip_Key and label where 'type' changes
d = d
d["e3"] = d.groupby('Trip_Key')["type"].shift(1)
d["e4"] = d["type"] != d["e3"]
d["e5"] = d.groupby('Trip_Key')["e4"].cumsum()
d.loc[d['type'] == 'I', 'e5'] = d['e5'].shift(1)
d['Inter_Key'] = d['Trip_Key'] + d['e5'].map(int).map(str)
del d['e3']
del d['e4']
del d['e5']
df = d
df['ETA_Shift'] = df.groupby('Trip_Key')['eta'].transform(lambda x: x.shift(-1))
df.fillna(0)
df['ETA_Sum'] = df.iloc[::-1].groupby('Trip_Key')['ETA_Shift'].cumsum()[::-1]
g = df.groupby('Trip_Key').last().reset_index()
df = df.merge(g[['Trip_Key','State']],on=['Trip_Key'],how='outer')
#df['Pair'] = '('+df['SPLC_x']+', '+df['SPLC_y']+')'
df = df.rename(columns={'State_x':'State',
'State_y':'Destination'})
df['ETI_Shift'] = df.groupby('Inter_Key')['eta'].transform(lambda x: x.shift(-1))
df.fillna(0)
df['ETI_Sum'] = df.iloc[::-1].groupby('Inter_Key')['ETI_Shift'].cumsum()[::-1]
g2 = df.groupby('Inter_Key').last().reset_index()
df = df.merge(g2[['Inter_Key','State']],on='Inter_Key',how='outer')
#df['Pair'] = '('+df['State_x']+', '+df['State_y']+')'
df = df.rename(columns={'State_x':'Origin',
'State_y':'Inter_Dest'})
del df['ETA_Shift']
del df['ETI_Shift'] 下面是它看起来的样子:
+----+-------+--------+------------+----------+-------------+-----------+---------------+-----------+--------------+
| | eta | type | Trip_Key | Origin | Inter_Key | ETA_Sum | Destination | ETI_Sum | Inter_Dest |
|----+-------+--------+------------+----------+-------------+-----------+---------------+-----------+--------------|
| 0 | 0 | T | A | A | A1 | 28 | H | 3 | C |
| 1 | 1 | T | A | B | A1 | 27 | H | 2 | C |
| 2 | 2 | I | A | C | A1 | 25 | H | nan | C |
| 3 | 3 | T | A | D | A3 | 22 | H | 9 | F |
| 4 | 4 | T | A | E | A3 | 18 | H | 5 | F |
| 5 | 5 | I | A | F | A3 | 13 | H | nan | F |
| 6 | 6 | T | A | G | A5 | 7 | H | 7 | H |
| 7 | 7 | T | A | H | A5 | nan | H | nan | H |
| 8 | 0 | T | B | A | B1 | 28 | H | 3 | C |
| 9 | 1 | T | B | B | B1 | 27 | H | 2 | C |
| 10 | 2 | I | B | C | B1 | 25 | H | nan | C |
| 11 | 3 | T | B | D | B3 | 22 | H | 9 | F |
| 12 | 4 | T | B | E | B3 | 18 | H | 5 | F |
| 13 | 5 | I | B | F | B3 | 13 | H | nan | F |
| 14 | 6 | T | B | G | B5 | 7 | H | 7 | H |
| 15 | 7 | T | B | H | B5 | nan | H | nan | H |
+----+-------+--------+------------+----------+-------------+-----------+---------------+-----------+--------------+编辑:
预期输出为:
+----+--------+-------+------------+----------+-------------+-----------+---------------+-----------+--------------+--------------+-----------------------------+
| | type | eta | Trip_Key | Origin | Inter_Key | ETA_Sum | Destination | ETI_Sum | Inter_Dest | Inter_Time | Index ETA_Sum Subtraction |
|----+--------+-------+------------+----------+-------------+-----------+---------------+-----------+--------------+--------------+-----------------------------|
| 0 | T | 0 | A | A | A1 | 28 | H | 3 | C | 0 | 0 |
| 1 | T | 1 | A | B | A1 | 27 | H | 2 | C | 0 | 0 |
| 2 | I | 2 | A | C | A1 | 25 | H | 0 | C | 12 | 2-5 |
| 3 | T | 3 | A | D | A3 | 22 | H | 9 | F | 0 | 0 |
| 4 | T | 4 | A | E | A3 | 18 | H | 5 | F | 0 | 0 |
| 5 | I | 5 | A | F | A3 | 13 | H | 0 | F | 13 | 5-7 |
| 6 | T | 6 | A | G | A5 | 7 | H | 7 | H | 0 | 0 |
| 7 | T | 7 | A | H | A5 | 0 | H | 0 | H | 0 | 0 |
| 8 | T | 0 | B | A | B1 | 28 | H | 3 | C | 0 | 0 |
| 9 | T | 1 | B | B | B1 | 27 | H | 2 | C | 0 | 0 |
| 10 | I | 2 | B | C | B1 | 25 | H | 0 | C | 12 | 10-13 |
| 11 | T | 3 | B | D | B3 | 22 | H | 9 | F | 0 | 0 |
| 12 | T | 4 | B | E | B3 | 18 | H | 5 | F | 0 | 0 |
| 13 | I | 5 | B | F | B3 | 13 | H | 0 | F | 13 | 13-15 |
| 14 | T | 6 | B | G | B5 | 7 | H | 7 | H | 0 | 0 |
| 15 | T | 7 | B | H | B5 | 0 | H | 0 | H | 0 | 0 |
+----+--------+-------+------------+----------+-------------+-----------+---------------+-----------+--------------+--------------+-----------------------------+注:标记为“索引ETA_Sum减法”的列仅用于说明目的。
发布于 2018-01-17 03:22:17
如果我没理解错你的问题,你可以这样做
In [37]: df.groupby('Trip_Key').apply(lambda x: x[x.Inter_Key.str[-1] == '1'].iloc[-1].ETA_Sum - x[x.Inter_Key.str[-1] == '3'].iloc[-1].ETA_Sum)
Out[37]:
Trip_Key
A 12.0
B 12.0
dtype: float64https://stackoverflow.com/questions/48233213
复制相似问题