After I reviewed the code in HyperLink Record in POI, I notice that a small bug exists while reading the url. To fix the bug, I tried to find some document about it but faild. The Hyperlink stream is undocumented in the Microsoft Excel binary format document (though I don’t know the reason).
To help others understand it well, I’d like to summarize it in this post.
//link to http://www.lakings.com/ byte[] data1 = {
0x02, 0x00, //First row of the hyperlink 0x02, 0x00, //Last row of the hyperlink 0x00, 0x00, //First column of the hyperlink 0x00, 0x00, //Last column of the hyperlink //16-byte GUID. Seems to be always the same. Does not depend on the hyperlink type (byte)0xD0, (byte)0xC9, (byte)0xEA, 0x79, (byte)0xF9, (byte)0xBA, (byte)0xCE, 0x11, (byte)0x8C, (byte)0x82, 0x00, (byte)0xAA, 0x00, 0x4B, (byte)0xA9, 0x0B,
0x02, 0x00, 0x00, 0x00, //integer, always 2 // flags. Define the type of the hyperlink: // HyperlinkRecord.HLINK_URL | HyperlinkRecord.HLINK_ABS | HyperlinkRecord.HLINK_LABEL 0x17, 0x00, 0x00, 0x00,
0x08, 0x00, 0x00, 0x00, //length of the label including the trailing '' //label: 0x4D, 0x00, 0x79, 0x00, 0x20, 0x00, 0x4C, 0x00, 0x69, 0x00, 0x6E, 0x00, 0x6B, 0x00, 0x00, 0x00, //16-byte link moniker: HyperlinkRecord.URL_MONIKER (byte)0xE0, (byte)0xC9, (byte)0xEA, 0x79, (byte)0xF9, (byte)0xBA, (byte)0xCE, 0x11, (byte)0x8C, (byte)0x82, 0x00, (byte)0xAA, 0x00, 0x4B, (byte)0xA9, 0x0B, //count of bytes in the address including the tail 0x48, 0x00, 0x00, 0x00, //integer //the actual link, terminated by '\u0000' 0x68, 0x00, 0x74, 0x00, 0x74, 0x00, 0x70, 0x00, 0x3A, 0x00, 0x2F, 0x00, 0x2F, 0x00, 0x77, 0x00, 0x77, 0x00, 0x77, 0x00, 0x2E, 0x00, 0x6C, 0x00, 0x61, 0x00, 0x6B, 0x00, 0x69, 0x00, 0x6E, 0x00, 0x67, 0x00, 0x73, 0x00, 0x2E, 0x00, 0x63, 0x00, 0x6F, 0x00, 0x6D, 0x00, 0x2F, 0x00, 0x00, 0x00, //standard 24-byte tail of a URL link. Seems to always be the same for all URL HLINKs 0x79, 0x58, (byte)0x81, (byte)0xF4, 0x3B, 0x1D, 0x7F, 0x48, (byte)0xAF, 0x2C, (byte)0x82, 0x5D, (byte)0xC4, (byte)0x85, 0x27, 0x63, 0x00, 0x00, 0x00, 0x00, (byte)0xA5, (byte)0xAB, 0x00, 0x00};
I copied the above data of a URL-type hyperlink from TestHyperlinkRecord.cs in NPOI.
begin offset | length (bytes) | Description |
0 | 2 | First row number |
2 | 2 | Last row number |
4 | 2 | First column number |
6 | 2 | Last column number |
8 | 16 | GUID |
24 | 4 | unknown field, always equals 2 |
28 | 4 | type of the hyperlink,
HyperlinkRecord.HLINK_URL | HyperlinkRecord.HLINK_ABS | HyperlinkRecord.HLINK_LABEL |
32 | 4 | length of label including the trailing ‘’ |
36 | variable1 | label string |
36+variable1 | 16 | link moniker |
52+variable1 | 4 | count of bytes in the address including the tail |
56+variable1 | variable2 | the actual link terminated by ‘\u0000’ |
56+variable1+variable2 | 24 | standard 24-byte tail of a URL link. it seems to be always same |
The tail of a url link is always as follows:
public static byte[] URL_TAIL = {0x79, 0x58, (byte)0x81, (byte)0xF4, 0x3B, 0x1D, 0x7F, 0x48, (byte)0xAF, 0x2C,
(byte)0x82, 0x5D, (byte)0xC4, (byte)0x85, 0x27, 0x63, 0x00, 0x00, 0x00,
0x00, (byte)0xA5, (byte)0xAB, 0x00, 0x00};
The tail of a file link is always as follows:
public static byte[] FILE_TAIL = {(byte)0xFF, (byte)0xFF, (byte)0xAD, (byte)0xDE, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
The link moniker are always same, too.
public static byte[] STD_MONIKER = {(byte)0xD0, (byte)0xC9, (byte)0xEA, 0x79, (byte)0xF9, (byte)0xBA, (byte)0xCE, 0x11,
(byte)0x8C, (byte)0x82, 0x00, (byte)0xAA, 0x00, 0x4B, (byte)0xA9, 0x0B };
public static byte[] URL_MONIKER = {(byte)0xE0, (byte)0xC9, (byte)0xEA, 0x79, (byte)0xF9, (byte)0xBA, (byte)0xCE, 0x11,
(byte)0x8C, (byte)0x82, 0x00, (byte)0xAA, 0x00, 0x4B, (byte)0xA9, 0x0B };
public static byte[] FILE_MONIKER = { 0x03, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, (byte)0xC0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x46 };
The bug exists in the HyperlinkRecord constructor with only one parameter RecordInputStream. Here is the code:
if (Arrays.Equals(URL_MONIKER, moniker))
{
int len = in1.ReadInt();
address = in1.ReadUnicodeLEString( len /2);
tail = in1.ReadRemainder();
}
The code reads “tail data” after reading the address. However, the length readed with in1.ReadInt() is the length including the tail data. So the correct length should be (len-tail_length)/2. The correct code is as follows:
if (Arrays.Equals(URL_MONIKER, moniker))
{
int len = in1.ReadInt();
address = in1.ReadUnicodeLEString( (len - URL_TAIL.Length)/2); //minus the length of tail
tail = in1.ReadRemainder();
}
No comments:
Post a Comment