Wednesday, August 17, 2011

Weird Office Excel Date System

Before coding NPOI, I always thought Excel use standard file date system and no doubt on its design. However, several days ago, I found that I had to convert date number read from raw stream to REAL date system because Excel use a very different date system, which are called 1900 and 1904 date system.

Here is a description from KB180162:

The 1900 Date System

In the 1900 date system, the first day that is supported is January 1, 1900. When you enter a date, the date is converted into a serial number that represents the number of elapsed days since January 1, 1900. For example, if you enter July 5, 1998, Microsoft Excel converts the date to the serial number 35981.
By default, Microsoft Excel for Windows and Microsoft Excel for Windows NT use the 1900 date system. The 1900 date system allows greater compatibility between Microsoft Excel and other spreadsheet programs, such as Lotus 1-2-3, that are designed to run under MS-DOS or Microsoft Windows.

The 1904 Date System

In the 1904 date system, the first day that is supported is January 1, 1904. When you enter a date, the date is converted into a serial number that represents the number of elapsed days since January 1, 1904. For example, if you enter July 5, 1998, Microsoft Excel converts the date to the serial number 34519.
By default, Microsoft Excel for the Macintosh uses the 1904 date system. Because of the design of early Macintosh computers, dates before January 1, 1904 were not supported; this design was intended to prevent problems related to the fact that 1900 was not a leap year. Note that if you switch to the 1900 date system, Microsoft Excel for the Macintosh does support dates as early as January 1, 1900.

 

As above says,  there are two kinds of date system in Excel! What a hell!

1904 date system is always 1,462 days bigger than 1900 date system, which sounds simple but actually it’s not as simple as we think!

As we know, four years and one day=365*4+1=1461. So where is another day from? In fact, it is a Date bug in Excel according to KB214326. Excel treats 1900 as a leap year which means Feb. 29, 1900 is a valid day. That’s why 1904 date system is 1462 days bigger than 1900 date system. As KB214326 says, Microsoft is not willing to fix this bug as this bug will make Excel date system compatible with Lotus 1-2-3 date system.

So if you read a date serial number from raw Excel file, don’t forget to decrease 1 day.

 

By the way, to determine whether your Excel file uses 1904 date system, you can read 1904 Date System record(22h) from raw xls stream.

Offset Field Name Size Contents
4 f1904 2 =1 if the 1904 date system is used

Reference: Office Excel 97-2007 Binary File Format Specification, page 34/349

No comments:

Post a Comment