 |
|
Situation 1: when DATE value is separated by either hyphen,slash,dot,comma,or space
Such as:
1001 7-11-1995
1002 1/21/1962
1003 11.2.1952
1004 Sept 18, 1995
1005 Jun20,1997
1006 January 1 2000
1007 4/12.1990
1008 3/22/68
This file contains IDs and birth dates in various format.The way you get dates into Stata is to read them as strings and then
convert the strings to Stata elapsed dates:
First, read them as strings into data:
infix id 1-4 str bday 6-20 using http://cdph.fsu.edu/people/minxing/date1.raw
Second, convert the strings to Stata elapsed dates by generating a new variable "edate":
. gen edate=date(bday,"mdy")
(1 missing value generated)
. list
+-------------------------------+
| id bday edate |
|-------------------------------|
1. | 1001 7-11-1995 12975 |
2. | 1002 1/21/1962 751 |
3. | 1003 11.2.1952 -2616 |
4. | 1004 Sept 18, 1995 13044 |
5. | 1005 Jun20,1997 13685 |
|-------------------------------|
6. | 1006 January 1 2000 14610 |
7. | 1007 4/12.1990 11059 |
8. | 1008 3/22/68 . |
+-------------------------------+
As you can see, Stata was able to handle almost all of these crazy date formats, as long as there are delimiters separating the month,
day, and year. It was able to handle June20,1997 even though there was not a delimiter between the month and day (Stata was able to
figure it out since the month was character and the day was a number).The only date that did not work was 3/22/68. 3/22/68 is considered
as missing because of two-digit years, however, it will show if you tell Stata whether it is 1968 or 2068. for example: “md19y” or “dm20y”.
Situation 2: How about if Month, Day, and Year run together in one variable?
Such as:
. infile id long bday using http://cdph.fsu.edu/people/minxing/date2.raw
+-----------------+
| id bday |
|-----------------|
1. | 1001 7111995 |
2. | 1002 1211962 |
3. | 1003 11021952 |
+-----------------+
Your program could be
generate month = int(bday/1000000)
generate day = int((bday - month*1000000)/10000)
generate year = bday - month*1000000 - day*10000
generate elapdate = mdy(month, day, year)
list
The output from this program is
+-------------------------------------------------+
| id bday month day year elapdate |
|-------------------------------------------------|
1. | 1001 7111995 7 11 1995 12975 |
2. | 1002 1211962 1 21 1962 751 |
3. | 1003 11021952 11 2 1952 -2616 |
+-------------------------------------------------+
Situation 3: How about if Month, Day, and Year run together in one STRING variable?
infile id str10 bday using http://cdph.fsu.edu/people/minxing/date3.raw
(3 observations read)
. list
+------------------+
| id bday |
|------------------|
1. | 1001 Jul111995 |
2. | 1002 Jan211962 |
3. | 1003 Nov021952 |
+------------------+
Now we have a string variable “bday”, we may use the same method to create month, day, and year from this variable:
.gen month = substr(bday,1,3)
.gen day = real(substr(bday,4,2))
.gen year = real(substr(bday,6,4))
.list
+----------------------------------------+
| id bday month day year |
|----------------------------------------|
1. | 1001 Jul111995 Jul 11 1995 |
2. | 1002 Jan211962 Jan 21 1962 |
3. | 1003 Nov021952 Nov 2 1952 |
+----------------------------------------+
Now we have three variables for month, day, and year, however, “month” is still character variable, we should convert it to numerical variable by using ecode command:
. encode month,gen(month2)
. list,nolabel
+-------------------------------------------------+
| id bday month day year month2 |
|-------------------------------------------------|
1. | 1001 Jul111995 Jul 11 1995 2 |
2. | 1002 Jan211962 Jan 21 1962 1 |
3. | 1003 Nov021952 Nov 2 1952 3 |
+-------------------------------------------------+
We need to use nolabel option to see month2 is really numeric. Now we can recode month2 variable by using replace command:
.replace month2=7 if month2==2
.replace month2=1 if month2==1
.replace month2=11 if month2==3
.list,nolabel
+-------------------------------------------------+
| id bday month day year month2 |
|-------------------------------------------------|
1. | 1001 Jul111995 Jul 11 1995 7 |
2. | 1002 Jan211962 Jan 21 1962 1 |
3. | 1003 Nov021952 Nov 2 1952 11|
+-------------------------------------------------+
Finally we can create elapdate:
. gen elapdate=mdy(month2,day,year)
. list
+-----------------------------------------------------------+
| id bday month day year month2 elapdate |
|-----------------------------------------------------------|
1. | 1001 Jul111995 Jul 11 1995 7 12975 |
2. | 1002 Jan211962 Jan 21 1962 1 751 |
3. | 1003 Nov021952 Nov 2 1952 11 -2616 |
+-----------------------------------------------------------+
If you are not confortable with the DATE in Stata's elapdate format, you can format it to normal date:
. format elapdate %d
. list elapdate
+-----------+
| elapdate |
|-----------|
1. | 11jul1995 |
2. | 21jan1962 |
3. | 02nov1952 |
+-----------+
Some applications of Stata elapsed dates
Elapsed dates is the format that Stata uses to manipulate date information. Elapsed dates are calculated
as the number of days from January 1, 1960. This format is useful for adding or subtracting dates or to change
the format of date variables.
For example, if elapdate=0, the real date would be January 1,1960; if elapdate=10, the real date would be
To display value of elapdate for a date (e.g. Dec 03,1999), you may use:
. display d(03dec1999)
14581
Thinking about dates in this way has a big advantage : if you subtract two dates, you obtain the number of days
between dates. For example:
1) If we have birthday variable, and suppose we want to know the age on Jan 1, 2000?
.gen age = (mdy(1,1,200)-birthday)/365.25
2) Or Mary was admitted to the hospital on 27mar1995 (12,869 by Stata's way of thinking) and released on 3apr1995 (12,876).
Mary was in the hospital 12,876-12,869 = 7 days.
3) Sam was born on 14jun1952 (-2,757 by Stata's count), and we want to know his age as of 18sep1995 (13,044).
Sam is 13,044-(-2,757) =15,801 days old or, if you prefer, 15,801/365.25 = 43.26 years old.
back to previous page
|
|
 |