File diff: [root]/code @ 51bd82a37d2 | log | faq | css

diff --git a/code/delimiters_must_die b/code/delimiters_must_die
new file mode 100644
index 0000000..93df518
-- /dev/null
++ b/code/delimiters_must_die
@@ -0,0 +1,82 @@
h1. On delimiters

h2. Escaping from backslash hell

OK, so you want to use simple comma-separated format to store your data.

bc. one,two,three,four

Good! Simple and clean, human readable too. You separate entries with , and series with newlines (\n). But then, you need to store comma inside of one of values. So you decide to escape commas with \, like all decent people do.

bc. one\,two,thr\,ee,four

Blech! Your parser cannot simply split lines by ",". It must check if \ doesn't precede it, and if it does, then strip \. But that still works.

Now, you see that \ can be encountered too, and maybe even directly before ,. So, let's escape it with itself.

bc. one\\\,two,thr\,ee,four

Three slashes! Isn't that fancy? Not much changed for your parser, you just tell him to strip one backslash from \\.

Now to think of it, newlines can be encountered inside entries too. So, let's make it \n (and \r for these obscene OSes).

bc. one\\\,two,thr\,ee,four

Senven? What the fuck.

Well, if you made it similar so far, congrats, you are a decent man. If not, you might have used quotes.

bc. one\\\,"two thr\,ee",four

Such a fine backslash soup! Now, imagine you would want to pack all this inside of another CSV entry. You get something like this:

bc. one\\\\\\\,\"two thr\\\,ee\"\,four\nfive\,\\\\nsix\,se\\nven,e\\"ig\\"ht

Well, I made up this example, but try coding in shell (which involves), and you'll understand all this.

h2. I Will Never Encounter This Set Of Bytes

Let's make up a bizarre, totally random string. It will never ever appear in our data, I'm assuring you. We'll start our entry with it and end with it.

bc. %%%%%%%%%%DATA BOUNDARY srfg345632rfefh56t34freg56y43rffgmy/dev/urandomsays hello#$^#$%TR%%%%%%%%%%%
SRgwerg24yg!#RG@2365u246jh4fgb345ik54y245g56u234rgfw43r8ty2348we9fuhg309ekxc09w3fu8tu32598jf03928qrg2938rhy093rjg293riyjg92384fj8934rjhg28975y 10wejmwodkvnn32w9048hjfq 3984hf9q38hf 398rh 93q8r hg98q2hr 9g813h9rthg9 3rhf98h219hgf1923gh9 qhf91jhgh1
%%%%%%%%%%DATA END srfg345632rfefh56t34freg56y43rffgmy/dev/urandomsays hello#$^#$%TR%%%%%%%%%%%

Know what? IT FUCKING WILL APPEAR. And if you want your system not to fail miserably, you have to scan through all this data and make sure it's not there. Not worth it. And anyway, scanning for this string is rather complicated.

Another good example (besides "HTTP multipart boundary": mocked above) is "CDATA":

h2. Taboo delimiter

People will never ever need ASCII 0 in their strings! I assure you! Let's use it as delimiter. No other options.

h2. Tolerable delimiting

Is implemented in "JSON": It uses backslashes plus very limited set of what can follow them. The format is quite readable and writeable by humans and parser-friendly. And also its page has nice graphics, I'd like to be able to make such myself.

h2. (in search of) Perfect delimiting

If you need simple strings, that will not encounter one character, you can delimit with that character. But for god's sake, do not try to allow strings do contain this character escaped.

bc. very long value

There is \n at and.

If you need byte strings that can contain any byte, specify length before data.

bc. 64 �d��W��uu&f(�69��須��?K4{u�
52 �d��W��uu&�d��

Lines _start_ with \n, there is single space after numbers, numbers consist of 0-9.

So, to summarize it: very strict format, *NO* escaping, taboo *OR* skip-n-bytes delimiting.

This kind of escaping is implemented in my serialization format called [[transfer]].
\ No newline at end of file

diff --git a/code/transfer b/code/transfer
new file mode 100644
index 0000000..b125e82
-- /dev/null
++ b/code/transfer
@@ -0,0 +1,49 @@
h1. "Transfer" data transfer protocol

This protocol can be used to transfer associative arrays with bytestrings over any byte transferring connection.

h2. Text mode commands

All fields are separated with single space, 'data' in FLD may contain spaces though.

I will use pseudo-abnf here, because plain abnf sucks. (something) in parentheses denotes "something" field, $something later refers to its value


space = ASCII 32
lf = ASCII 10
non-lf = anything but lf
non-space = anything but space

h3. MOD module_name

modCommand = "MOD" space *non-lf

Means start of data structure named 'module_name'. If another module have already been started, throw error.

h3. FLD name data

fldCommand = "FLD" space *non-space space *non-lf

Textual field 'name' with 'data' as content. 'data' may contain any character except of \n. Used for transferring small amount of data without linebreaks.

h3. DAT name size

datCommand = "DAT" space *non-space(name) space *digit(size) lf *<$size>any-byte(data)

Initiates data mode for field 'name'. Right after delimiting \n recipient should start reading data and switch back to lines mode after exactly 'size' bytes of data.

h3. END

Ends a module

Example of correct structure transfer:

bc. MOD query
FLD hops 1
FLD query hash
DAT args 64

Example of protocol implementation in Haskell can be seen here: []
\ No newline at end of file

By Voker57 on 2010-06-15 16:16:52 +0400 Powered by bitcheese wiki engine