Proposing a Centimal Number System for Geotagging and Geocoding

Geotagging of photos and other data items today is a tedious and error-prone affair, not least because Latitude and Longitude have to be entered 100% correctly. One wrong digit and you may be many meters or many miles off. Certainly there are numerous tools to these task easier and new ones are released almost daily in this age of geotagging craze. Even with these products, having to deal with high precision numbers, positive and negative, matching time etc. can be quite daunting. So as with domain names (which obviate remembering IP addresses), wouldn't it be easier to describe the Latitude and Longitude numbers with a simple mnemonic "babble language" consisting of "friendly names" for each location? Some alternative ways have been tried to do away with the numerical notation of location coordinates:

There are numerous grid systems using letters as well as numbers with the best known example being the Universal Transverse Mercator (UTM). Grid systems are not directly supported by modern GPS devices which use the WGS84 datum. So for easy conversion the Latitude and Longitude should be kept the established decimal numeric form.

It would be desirable to have a system that expresses numbers as pronounceable words, which could be understood in any language and in spoken conversation, such as over the telephone.

The bubble babble encoding uses alternating consonants and vowels, facilitating public key validation in verbal form, like over the telephone, as the resulting string can be more easily pronounced and understood than hexadecimal sequences of letters and numbers. Here's a port of Bubble Babble to C#.

Another take on a related problem is Koremutake, which devises a system to express large integer numbers as a sequence of syllables (phonemes). It is a binary encoding with a base of 128, using 128 distinct syllables to encode integers. While this system works fine for integers, trying to expand this system to fractions one quickly runs into the well-known problems of converting decimal numbers into binary format. Displaying a simple decimal number like 0.1 "would need an infinitely recurring binary fraction". So while the Koremutake-encoded integer 65535 is an easy-to-remember BOTRETRE, the floating point number 65535.1 would be encoded as BOTRETRE.FAFRAMOHESUDRA if Koremutake covered floating numbers, and precision would still be only adequate. As a sidenote, here's a port of Koremutake to C#.

To ward off the rounding and approximation issues of floating point numbers, a modified system based on decimal numbers should be used. First of all, decimal numbers are what users are familiar with and more importantly, they are supported by most operating systems and programming languages today (C#, C++, Java, Visual Basic), and even ECMA-Script/JavaScript will support them in the near future.

A base-100 (centimal, centesimal?) system is a tall order if only ASCII vowels and consonants are to be used. Citing the ancient major system, one suggested centimal system uses both numbers and syllables made up of consonants and vowels to encode integers. Then again,** the combination of numbers and letters are notoriously hard to remember**, probably because numbers and letters are stored and processed in different parts of the brain. Numbers mixed with letters also hamper the reading flow and introduce local translations (for the numbers) into the flow of (language-agnostic) syllables.

Living in Thailand, I'm used to using 18 consonants and 6 diphthongs, so I expanded the usual a, e, o, u with "ai", "an", "ao", "in", "oi", "on" to make 10 vowels, sorted alphabetically this makes the sequence "a", "ai", "an", "ao", "e", "in", "o", "oi", "on", "u". The artificial Itkhuil language uses the "expand the vowel stock"-approach but that's a different ball game entirely. 'I' and 'y' are not used as vowels (unlike Koremutake) since i is pronounced as the English 'ee' sound in many languages including German, Dutch and Italian.

We now need only 10 consonants to form all our base-100 "digits". For international compatibility 'c', 'j' (pronounced like i in German, Dutch) and w (pronounced like v in German, not used in some romance languages) are excluded from the consonants and so is v. Furthermore, 'r' is left out because it is not distinguished from 'l' in many spoken Asian languages. The result reminds one of Chinese, Vietnamese or Thai, due mainly to the expansion of vowels. But luckily, unlike the aforementioned languages, our system is not tonal ;). So now we have a coherent and predictable 10 x 10 base-100 number system consisting of easy-to-read and-pronounce syllables. Two issues remain:

What about negative numbers? One way to express them could be to use lower case for negative numbers and capitalize all letters for positive numbers. However, this is against the established customs for identifiers like web-URLs or e-mail addresses, where case does not matter. Consequently, many users will not distinguish much between lower and upper case. The prefix letter A could be prepended to negative numbers as one of only two letters in the alphabet (the other being H) it has a "full-width" vertical line. This also fits well into the flow of the languages, with a leading A generally easy to pronounce for all the syllables we will use.

The dot of fractional numbers should be replaced by a distinctive syllable. I was thinking of using "Poi" as a short for "Point", but I want to use both P as a consonant and oi as a vowel in the number system. So I settled for "Ha", which is also shorter. So 1.2 becomes BaiHaBan. The word "point" is too idiomatic for English, many European languages use "comma" instead of a point as the decimal separator. Here are the 100 distinct syllables of the centimal system:

Positive Numbers Negative Numbers Fractions (Floats)

mrusoff: having wrestled with this in a number of contexts, I think your colution is interesting, but fore some instances, I have found that (at least in the USA) using something that looks like a phone number is actually easier for people to remember becasue this is something that they are well trained to do. For example for lat/long 175-219-3840 080-133-4510 gives us 3 digits to the left of the decimal and 7 to the right. Another interesting sheme that was proposed for geocoding is to consider (again in the united states) a bounding box based on a phone area code or a postal zipcode would form the basis of a coordinate system dramatically reducing the number of digits required. Various telco applications have used a number of schems for mapping binary data to a stream of words, sylables or phonemes with varying degrees of success. Good luck with your approach!

Posted: 07 July 2007

comments powered by Disqus