Convert numbers with imperial or european separators
The first rule of war and business is to know your enemy
Good day, fellow reader.
Have you ever been in a position where you have not to validate a strict format, but to try to get the number provided even if it has spaces or different separator?
Chances are, if you work in a country where the dot is used to distinguish between the integer and decimal part, you probably didn't need to, as most coding languages and computational number representations use this style (e.g.: 123.45, 0.99).
In this article, I'm going to share a bit about this subject and how I overcame these differences with some python code.
Numbers format (Imperial vs European)
This article by Raluca Cristina Neagu: Number formatting in Europe vs. the US is a piece of gold in a subject. Not only because of the references included, but the way the author describes the subject as a short, clear story. I know I probably got your attention for maybe one extra minute or so. The quote below is for you, then:
Why are there different number formats? It appears that the comma separator was introduced to avoid confusion with the dot being used as operator, bringing chaos to these days.
So who to blame? Haven't read all the material about it, but it seems that decimal point popularity comes from Napier usage, and decimal comma popularity comes from Leibniz. You probably hate them both if you had the chance to learn from their work.
Ok, so what's the standard, who wins? Well, The International System of Units (SI) calls it a draw. You can use comma, or dot, and it is valid. The real losers here are the separators, they must be a space, or nothing at all. No dots, no commas. 9,999,999.99 is not valid, neither 9.999.999,99 it should be 9 999 999.99 or 9 999 999,99.
Dealing with conversion
This is a tiny list of some of the formats I've found while automatic processing probably a hundred of documents. Numbers are fictional, formats are not.
Provided number | Converted |
$ 159 195.533 | 159195.533 |
$ 155,132,158.0 | 155132158.0 |
99,999.99 | 99999.99 |
13.194,32 | 13194.32 |
$ 20.212,99 | 20212.99 |
You get the idea, so we first need to remove the $
and white spaces
Numbers with no $ and spaces |
159195.533 |
155,132,158.0 |
99,999.99 |
13.194,32 |
20.212,99 |
Now, only the first one won't raise any errors if we try to parse with python's useful float
method. We need to remove the three-digit separators.
- Imperial format: We remove the
,
and we are good, as the decimal separator.
would match the computational representation.
Numbers | Converted |
159195.533 | 159195.533 |
155,132,158.0 | 155132158.0 |
99,999.99 | 99999.99 |
- European format: If we remove the
,
first, we would lose reference of what is the integer, and what is the decimal part. So we need to first remove any.
digit separator (step 1), and then, replace the,
by a.
(step 2)
Numbers | Step 1 | Step 2 |
13.194,32 | 13194,32 | 13194.32 |
20.212,99 | 20212,99 | 20212.99 |
If you think like a machine, you probably noticed that we can't apply both indistinctly, that we need first to find out if we are dealing with one type of number format or another.
Don’t do the natural thing, the impulsive thing. - Dale Carnegie
Well, luckily, Dale Carnegie know nothing about programming, so we're going full impulsive here.
for i in range(-1, -len(text), -1):
if text[i] == "." or text[i] == ",":
foundSeparator = text[i]
break
We traverse the number string in reverse order until we find the decimal separator, then, we find our number format. No more questions, your honor.
Where's the full code?
I'll leave below the code with one giant warning. DO NOT RELY ON THIS FOR CRITICAL APPLICATIONS. Even when I wrote this in a work environment, the outcomes of this algorithm can be somewhat verified, and no lives or jobs depend on the accurateness of it (except, maybe mine).
I added some basic testing as well, so you can check when it fails.
import math
def getNumberWithAnySeparator(text):
text = str(text).replace(" ", "")
text = text.replace("$", "")
foundSeparator = ""
for i in range(-1, -len(text), -1):
if text[i] == "." or text[i] == ",":
foundSeparator = text[i]
break
if foundSeparator == ".": # Imperial format xxx,xxx.xx
text = text.replace(",", "")
if foundSeparator == ",": # European format xxx.xxx,xx
text = text.replace(".", "")
text = text.replace(",", ".")
return tryFloat(text)
def tryFloat(text):
try:
auxValue = float(text)
except ValueError:
return None
if not math.isnan(auxValue):
return auxValue
else:
return None
import unittest
class TestConversion(unittest.TestCase):
def test_zero(self):
"""
Test that 0 converts to 0
"""
text = "0"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 0)
text = "0,0"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 0)
text = "$0"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 0)
text = "$ 0,0"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 0)
text = "-0.0"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 0)
text = "0000000000000000.0"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 0)
def test_alpha(self):
"""
Test that alphanumeric converts to None
"""
text = "a0"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, None)
text = "zero"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, None)
text = "l2"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, None)
text = "OOO"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, None)
text = "0z"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, None)
def test_positive(self):
"""
Test that some positive numbers convert to their value
"""
text = "123"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 123)
text = "+99"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 99)
text = "$159195.533"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 159195.533)
text = "$ 155, 132, 158.0"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 155132158)
text = "32 913 646"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 32913646)
text = "12.529,52"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 12529.52)
def test_negative(self):
"""
Test that some negative numbers convert to their value
"""
text = "-123"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, -123)
text = "-99"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, -99)
text = "-$159195.533"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, -159195.533)
text = "-$ 155, 132, 158.0"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, -155132158)
text = "-32 913 646"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, -32913646)
text = "-12.529,52"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, -12529.52)
def test_mix_cases(self):
"""
Test cases where there is mix of patterns
"""
text = "-123.4124.2,521.2"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, None)
text = "-$99,125 242 5"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, -99.1252425)
text = ",15.9195533"
result = getNumberWithAnySeparator(text)
self.assertEqual(result, 15.9195533)
if __name__ == '__main__':
unittest.main()
See you around!
Portrait picture has been designed used "VS" image created by jcomp - www.freepik.es