The distinction between ‘semantics’ and ‘syntax’ escapes a lot of folks.
The word ‘semantics’ typically connotes ‘meaning’. It could be the meaning of an English word or sentence, or the meaning of an XML file. It can also denote the study of meaning in linguistics or logic, but that’s not the most common use today.
Syntax, on the other hand, deals with the structure of well-formed sentences. A sentence that is syntactically correct obeys the grammatical rules of a language. When you performed sentence diagramming in school, you were analyzing the syntax of a sentence, to prove or disprove its correctness, and perhaps improve it. Natural (human) languages have very complicated rules. Some of those rules are ambiguous, and some are well broken for artistic effect – the literary equivalent of Tom Waits’ singing that sounds like “it was soaked in a vat of bourbon, left hanging in the smokehouse for a few months, and then taken outside and run over with a car”1.
To understand the semantic/syntactic difference better, here is a famous example of a sentence that is syntactically correct, but semantically problematic2:
Colorless green ideas sleep furiously.3
To see that this is syntactically correct we can diagram it:
Diagram of semantically bankrupt but syntactically correct sentence.
Here, on the other hand, is an example of the converse, poor grammar but fairly clear meaning:
“You don’t have no more troubles, Roscoe,” I tell him, “you and me is just become partners.” 4
If we are talking about messages, it’s the semantics that really counts. If the medical instructions “Ambulate between the bars” are misinterpreted as “Amputate between the ears”, the consequences could be grave5.
Sometimes, of course, syntactic errors result in different semantics. The following examples of the value of punctuation show how different syntax can result in different semantics, even though all the words are still in the same the same sequence.
A woman without her man is nothing.
A woman: without her, man is nothing.
Twelve people knew the secret, all told.
Twelve people knew the secret; all told.
Let’s eat Grandma.
Let’s eat, Grandma.
I’ve often heard people say “that’s only semantics”, as if meaning is unimportant. What they really intend to convey is that quibbling over different shades of meaning of individual words can sometimes be a pointless hair-splitting waste that contributes little to the overall meaning they wished to convey.
On the other hand, mistaken semantics of a single word can cost lives. This example (taken from an unclassified document provided by the NSA6, 7) implies that at least 129,000 people died as a result of the meaning assigned to a single word8.
Reporters in Tokyo questioned Japanese Premier Kantaro Suzuki about his government’s reaction to the Potsdam Declaration. Since no formal decision had been reached at the time, Suzuki, falling back on the politician’s old standby answer to reporters, replied that he was withholding comment. He used the Japanese word mohusatsu, derived from the word for “silence.” As can be seen from the dictionary entry quoted at the beginning of this essay, however, the word has other meanings quite different from that intended by Suzuki. Alas, international news agencies saw fit to tell the world that in the eyes of the Japanese government the ultimatum was “not worthy of comment.” U. S. officials, angered by the tone of Suzuki’s statement and obviously seeing it as another typical example of the fanatical Banzai and Kamikaze spirit, decided on stern measures. Within ten days the decision was made to drop the atomic bomb, the bomb was dropped, and Hiroshima was leveled.
What does all this have to do with software?
Messages play an important role in an increasingly interconnected world. My scale reports my body mass index to my phone, and my wristband tells my phone how far I’ve walked, how long I slept, and what my pulse rate was throughout the day. It’s not much of an extrapolation to imagine that information (and more, such as blood sugar levels, cortisol, or blood pressure) automatically going to my doctor, to a pharmacy to adjust the dose of a prescription, or to the insurance company to adjust my rate. None of these messages existed a few years ago: someone needs to define how to build and interpret these messages, and who should be allowed to read which information.
It’s scary that without exception, every message definition effort I’ve seen has been obsessed with syntax. They expect the semantics will fall into place automatically. An example was a lovingly hand crafted (yet obscenely bloated) XML schema that actually failed to describe the information to be conveyed.
Consider the alternative: describe the semantics in an information model (UML, OWL, etc) of the problem domain and then programmatically turn it into XML, JSON, Protocol Buffer, and Avro schemas. You end up with different syntaxes that can all convey the exact same semantics. Presuming the information model is an accurate representation of the information in the problem domain, all of these schemas would be correct by construction, they would all convey the same information, and (with one-time preparation of marshalling/unmarshalling code) you could use any of those syntaxes interchangeably.
My point is that squabbles over syntax are counterproductive, if you define how to generate it from semantic models. At Prometheus we extensively generate syntax and executable code from semantic models – even exceedingly complicated models. Most people just can’t see that value, and for many developers, today’s reality involves huge wastes of time, bickering over different syntaxes to express poorly articulated, redundant, and noisy semantics.
Graff, Gary; Durchholz, Daniel. Musichound Rock: The Essential Album Guide. Omnibus Press. ISBN 0-8256-7256-2. ↩
It’s not too hard to think of similar grammatically correct yet incomprehensible examples, such as “The loquacious boulder floated heavily on the breeze.” ↩
Or maybe not. We all know a few people such a procedure might improve. ↩
The real situation may have been more complicated. The Potsdam Declaration was July 26. The newspaper articles were July 28. Little Boy was dropped on Hiroshima August 6, and Fat Man was dropped on Nagasaki August 9. If Suzuki really meant “no comment”, it seems he could have delivered an actual answer before one city or the other was bombed. On the other hand, even after the further demonstration on August 9, Japan didn’t surrender until August 14. Seven more bombs were in the pipeline, and the third could have been delivered as early as the day of surrender. It also seems possible that these bombings actually saved lives in the long run, compared to continued conventional war: the atomic bomb casualties were dwarfed by total Japanese casualties exceeding 2,600,000. ↩