软件: Stata
我有两个数据集:一个是公司CEO(数据集1),另一个是已签署的商业协议(数据集2)。
数据集1的格式如下,按公司排序:
company 1, CEO name, start date, end date, etc.
company 1, CEO name, start date, end date, etc.
...
company 2, CEO name, start date, end date, etc. 数据集2的格式如下,按协议排序(每个协议有2-150个参与方):
agreement 1, party 1, party 1 accession date, party 2, party 2 accession date.
agreement 2, party 1, party 1 accession date, party 2, party 2 accession date. 我想写一段代码,针对每个CEO,统计CEO所在公司在他/她担任CEO期间签署的协议数量。
到目前为止,我已经用expand创建了一个首席执行官日数据集。
gen duration = enddate - startdate
expand duration -1
sort id startdate
by id: gen n = _n -1
gen day = startdate + n理想情况下,我会使用下面这样的代码:
collapse (count) agreement, by(id) 然而,数据集2将不同的各方作为不同的变量列出。公司1并不总是“第一方”,有时可能是“150方”。此外,每个缔约方可能有不同的加入日期。我需要一个循环来“扫描”数据集2中的协议,其中公司1作为协议的一方加入协议,加入日期位于公司1的首席执行官1是公司1的首席执行官的期间内。
我该怎么办?我需要创建一个循环吗?
发布于 2015-05-14 04:38:23
循环并不是绝对必要的。您可以尝试使用reshape和joinby
clear
set more off
*----- example data -----
// ceo data set
input ///
firm str15(ceo startd endd)
1 "pete" "01/04/1999" "05/12/2010"
1 "bill" "06/12/2010" "12/01/2011"
1 "lisa" "13/01/2011" "15/06/2014"
2 "mary" "01/04/1999" "05/12/2010"
2 "hank" "06/12/2010" "12/01/2011"
2 "mary" "13/01/2011" "15/06/2014"
3 "bob" "01/04/1999" "05/12/2010"
3 "john" "06/12/2010" "12/01/2011"
end
gen double startd2 = date(startd, "DMY")
gen double endd2 = date(endd, "DMY")
format %td startd2 endd2
drop startd endd
tempfile ceo
save "`ceo'"
clear
// agreement data set
input ///
agree party1 str15 p1acc party2 str15 p2acc
1 2 "09/12/2010" 3 "10/01/2011"
2 1 "05/06/1999" 2 "17/01/2011"
3 1 "06/06/1999" 3 "05/04/1999"
4 2 "07/01/2011" . ""
5 2 "08/01/2011" . ""
end
gen double p1accn = date(p1acc, "DMY")
gen double p2accn = date(p2acc, "DMY")
format %td p?accn
drop p?acc
*----- what you want -----
// reshape
gen i = _n
reshape long party p@accn, i(i)
rename (party paccn) (firm date)
order firm agree date
sort firm agree
drop i _j
// joinby
joinby firm using "`ceo'"
// find under which ceo, agreement was signed
gen tag = inrange(date, startd2, endd2)
list, sepby(firm)
// count
keep if tag
collapse (count) agreenum=tag, by(ceo firm)
list一个潜在的陷阱是joinby创建了太多的观察结果,导致内存耗尽。
如果您没有在Stata中使用日期的经验,请参阅help datetime。
(请注意我是如何为您的问题设置示例数据的。只要它能帮助别人,就能帮助你自己。)
https://stackoverflow.com/questions/30223987
复制相似问题