Hi Tom,
ElfData can help you parse complicated stuff, yes.
However, it's still going to be a hard problem (I imagine) with or
without ElfData. You'd need to write the parser yourself. ElfData just
makes writing that parser a bit easier. (and definitely faster
assuming you use ElfData correctly).
I'm not familiar with Excel's binary data format, unfortunately. I
actually did some paid work parsing Excel's XML export in the past,
and that was easy enough (considering).
As for the UTF-16 thing... converting that is fine, but I doubt the
entire file is UTF-16. More like it's binary with bits of UTF-16 mixed
in. Extracting the correct strings from the correct places will be
your main problem. Once you got those strings, converting them to
other encodings is simplicity.
On 16 Sep 2009, at 18:28, tsrdatatech wrote:
> Theo,
>
> Considering the capabilities of your plugin, do you think it would
> be a good match to parse meta data from excel files?
>
> This is something I am looking to do outside of using the office
> classes. I would like to read it in binary stream and try to glean
> the info that way. Stuff like the author of the file, creation date,
> last used by, last saved date, etc.
>
> I've been reading up on the file specs for this and you can read the
> info, I believe the encoding on these files are UTF16.
>
> Is it possible to utilize your plugin to accomplish some of these
> tasks?
>
> I know the way the file spec reads it will show binary unicode lines
> of data like this:
>
> SID 0: Root Directory Entry _ab = ("R") (this should be "Root
> Entry") _cb = 0004 (4 bytes, includes double-null terminator) _mse =
> 05 (STGTY_ROOT) _bflags = 00 (DE_RED) _sidLeftSib = FFFFFFFF (none)
> _sidRightSib = FFFFFFFF (none) _sidChild = 00000001 (SID 1: "Storage
> 1") _clsid = 0067 6156 54C1 CE11 8553 00AA 00A1 F95B _dwUserFlags =
> 00000000 (n/a for STGTY_ROOT) _time[0] = CreateTime = 0000 0000 0000
> 0000 (none set) _time[1] = ModifyTime = 801E 9213 4BB4 BA01 (??)
>
> _sectStart = 00000003 (starting sector of MiniStream) _ulSize =
> 00000240 (length of MiniStream in bytes) _dptPropType = 0000 (n/a)
> 000400: 0052 0000 0000 0000 0000 0000 0000 0000 .R..............
> 000410: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000420: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000430: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000440: 0400 0500 FFFF FFFF FFFF FFFF 0100 0000 ................
> 000450: 0067 6156 54C1 CE11 8553 00AA 00A1 F95B .gaVT....S.....
> [ 000460: 0000 0000 0000 0000 0000 0000 801E 9213 ................
> 000470: 4BB4 BA01 0300 0000 4002 0000 0000 0000 K.......@.......
>
> Do you think your plugin would make a good candidate for processing
> what I need?
>
> Thanks,
>
> Tom
>
>
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>
--
http://elfdata.com/plugin/
"String processing, done right"