Saturday, September 12, 2009

Open Document Format + C# = <3

So you want to be able to manipulate ODF-documents with C#? That was exactly what I wanted to do this morning and after some quick research I found a solution.

Solution 1: The Hard Way (tm)
Turns out that the .odt- and .ods-filetypes (.ods being the one I was interested in) are basically zip-archives which can be decompressed by any archive software supporting it. Inside the archives you'll find a bunch of .xml-files. The file 'content.xml' is where the goodies are. The hard part is parsing the xml-code and getting something useful from it. This I leave to the reader since I wasn't able to get a good result in a hurry.

Solution 2: The Easy Way (tm)
Turns out that someone has already done all the work for us and put it in a small library called AODL . I recommend downloading the binary version (less files), however the documentation resides with the source code in the src-archive so you might as well grab that to while you're at it. Though this library seemed great at first, I quickly ran into a problem. But before we go into that, lets open a document.


using AODL.Document.SpreadsheetDocuments;
using AODL.Document.Content;
using AODL.Document.Content.Tables;

[...]

SpreadsheetDocument doc = new SpreadsheetDocument();
doc.Load("/home/god/blahblah.ods");
RowCollection rows = doc.TableCollection[0].RowCollection; //get rows in first page of doc

foreach(Row row in rows)
{
foreach(Cell cell in row.Cells)
{
Console.WriteLine(cell.OfficeValue); //return the string value of this cell
}
}

[...]

You might expect that this piece of code prints the contents of every cell in the document. You are sadly mistaken. As I found out, it printed everything except string values. That is, floats and ints etc. printed ok, but when a cell contained a string the WriteLine-statement gave an empty string instead. After some fruitless googling and and hour or two of desperate screaming I finally came to the rather embarassing conclusion that the empty string was actually a null string. Then, after some quick IntelliSensing (well, MonoDevelop's version of it anyway) I produced the following.


[...]
foreach(Row row in rows)
{
foreach(Cell cell in row.Cells)
{
string s = cell.OfficeValueType; //get type of value
if(s.Equals("string")
{
Console.WriteLine(cell.Content[0].Node.InnerText);
}
else
{
Console.WriteLine(cell.OfficeValue);
}
}
}
[...]
Why cell.OfficeValue (which is of the type string) doesn't work for strings evades me. But at least there is a quick solution. There are probably others, if so please leave a comment.

Friday, September 11, 2009

New author

I'd like to take this opportunity to present myself. I am manneorama; developer, death metal-connaisseur and all in all pleasant man. Welcome, myself, to volatileint.