On Accidental Serialization Formats

Let’s talk about the “just separate with comma and stick it into one field” type of serialization.

You had two strings (abc and def) and you joined them with a separator. What do you have now? One string with two elements, right? Right, abc,def. Well… two or more actually, depending on how many times the chosen separator occurred in the original strings: if they were a,bc and def, you’ve got a,bc,def, which is 3 elements according to our format. Oops. Leaving out the question whether leading and trailing spaces are significant.

Wanna add escaping for the separator then? a,bc and def are now serialized as a\,bc,def. Now the parsing became more complex. You can’t just split the string by the separator (you would get 3 elements: a\ and bc and def. You need to scan the serialized data, considering escaping when splitting. You also need to remove the escaping character from the result. How about escaping the escape character? If original data is a\bc, it is serialized as a\\bc). Yet another something not to forget.

Don’t like escaping then? How about encoding like in URL? a,bc becomes a%2Cbc. You can now once again split the string by the separator character… assuming it was encoded. Which characters you encode anyway? If you encode all ASCII characters, the result is 3 times the original and is completely unreadable. It least you are “safe” with regards to separator now, it is encoded for sure so no split problems. You have to add a decoding routine now though.

If your serialized thing goes into a database, consider how indexing would work. It probably won’t. Maybe you should model your domain properly in the database and not serialize at all. Hint: if the values ever need to be treated differently/separately by the database, they go into different cells/rows/columns/fields, not one. There are very rare exceptions. Notable exception is the ability of databases to handle JSON fields (examples: MySQL, PostgreSQL). Note that this capability can fit or not fit your use case.

Want to satisfy your artistic needs and do something clever about the serialization? Do it at home then please. Don’t waste time that your colleagues could use on something more productive than dealing with your custom format.

Strong advice: don’t do custom serialization format, use existing serialization formats and libraries.

Seen something to add to the above? Leave a comment!