How does it suck harder? Sure, the closing tags can be a bit annoying (but decent editors can insert those automatically), but attributes don't have that problem, and contrary to JSON it has comments.
In my experience, the biggest reason people hate it is because it has too many ways of structuring data, and it allows people who design the schemas to go overboard. Attributes or child nodes? Multiple namespaces or just extend the one you have? CDATA or embedding?
That said, simple XML is as good as simple JSON. XML is fine if you keep it simple when designing the schema. AND of course one can screw up with JSON too. But after almost 20 years of people over-engineering their XML schemas, I can't fault anyone for choosing a simpler data format.
The big difference is that mapping that XML into native data structures in your language is a mess in the general case because XML has so damn many ways of structuring the data. JSON maps more or less directly into JavaScript/Python/Perl or really any untyped language with arrays, hashes, and scalars.
I think part of it is the context that JSON and XML are used in, particularly the era of software that uses these formats for configuration and the design implications that has.
Recently I've had to work with XML config files for a tool called Oozie, which is a data pipeline scheduling tool within the hadoop/spark ecosystem, and it has been soul crushingly tedious for me. Everything feels verbose and opaque, the documentation seems to prioritize enumerating every possible configuration option over providing minimum viable configs for common use cases.
JSON configs often just feel more simple and developer friendly. I'd say this has less to do with technical differences between JSON and XML and more to do with how "modern" software systems have been designed to be more ergonomic for developers/administrators and these modern systems happen to be more likely to use JSON.
It's also fun to hear non-technical people at work talk about "Jason files".
There's a flip side to the common use case of documentation... How do you solve the uncommon use case when all the docs it seems are showing "how easy it is to do xyz." A month ago, I started on a new web front end. It was set up with React, Redux and webpack. Since it was primarily going to be me solo on the project, I wanted to integrate TypeScript. There are tutorials everywhere for starting with TypeScript but good luck finding something to say how to integrate it in.
I really wish everyone would think of JSON as a French word. "Jay-sonne" (or jay-SAWN). Not "Jase-in"... it would help keep the confusion to a minimum.
I find it interesting you're being downvoted. I think your question is reasonable: why does XML suck "harder" than JSON? This just seems a bit faddish to me. Things like namespaces and attribute/child distinction in XML are super annoying ... until you really need them in your ML (JSON, in this case). Then, suddenly, you're reinventing a wheel, hoping all of your downstream tools 'agree' on the behavior.
I can easily import/export the JSON and share it within my codebase. The XML is much worse in that regard (unless there's some clean way to do the same thing that I'm not aware of).
Millions of developers use XML all the time, it's called HTML.
People can easily read and edit it with deeply nested structures, attributes, and even malformed data. Frontend frameworks using components and variations like JSX also maintain the same style because it's very natural to use.
Now try taking a typical HTML document and expressing it in JSON. Even an empty page would be completely unmanageable without tooling. People have a strange reaction when they hear "XML" but it's much more structured, usable, and widespread than you think.
I think using the popularity of HTML (a rich text format) to justify using XML as a configuration format is a bit disingenuous. The two use cases (structured data and rich text) are genuinely very different from each other. In rich text, the primitives are text and markup. In structured data, the primitives are structures (lists, maps) and data (strings, bools, numbers). The impedance mismatch goes both ways - try taking a typical JSON document and expressing it in XML. Which parts end up as attributes? Elements? Text nodes? How do you express the semantic difference between a list and a map? A string and a boolean value? It’s arbitrary, ugly and verbose no matter how you slice it. And, just like it’s inverse, basically unmanageable without tooling.
Also HTML is not XML. HTML is non-strict (the spec is it’s parser not its format). HTML doesn’t have schema files. HTML doesn’t understand self-closing tags. HTML does have void elements (img, br, input, etc). HTML is designed for humans to write. And it’s a tool fit for its purpose, unlike the abomination that is XML.
Scroll down and I think it shows it perfectly how (X|HT)ML tags are much simpler than JSON syntax once things get complex and nested. It's not that hard to define basic primitives like we do with HTML (which has dozens of tags) but XML also lets you define your own schema if necessary to make things more compact.
I understand the technicalities between HTML vs XML but I don't see how it makes any practical difference when you're editing a bunch of tags in a text file. It's the same thing. The structure looks identical. What is the actual issue that makes HTML easy but XML hard?
This is such a contrived example, they take html and show the equivalent conversion in json with the same schema. The point is in a configuration file you can remove nearly all the cruft of xml, instead of having <input “key”=key “value”=value> you don’t need any brackets or extra info. You just type { key:value } and you’re done. As many times as you want. I’ve had to hand edit complex msbuild configurations (xml based) in past projects and I can tell you lists and maps are hell. We’ve since converted them to yaml (another discussion), but the point is xml is terrible for human editing.
And yet you have literally done the same exact thing in reverse - in XML, the exact equivalent would be <key>value</key>; you don't need any extra attributes, either.
Well, until you do need some metadata for that key-value pair. Which is why even in many JSON schemas it's pretty common to get something like { "key": "key", "value": "value", ... } (where ... is usually empty in practice).
The problem with MSBuild, for the most part, isn't XML - it's its own data model (which is not XDM, by the way).
The nuance that differentiates my example is I’m not picking some random web page (where html might actually make sense to use) and trying to apply json to a domain it most certainly is not optimized for. I’m taking an extremely common case and showing why xml is way too verbose for very simple scenarios.
My main point is in xml the closing tags are unnecessary, and you need an opening tag for every value where it’s obvious from nested context what that value is supposed to be. Xml is very redundant, there’s just all this zero entropy text everywhere which conveys no information. I disagree with your metadata in json example, why not just add more data to the value? The data model of msbuild is also annoying, but I’ve worked with some dotnet core projects now and using json as the project format definitely saves typing and actually allows you to intuit what a project is doing rather than just bombarding you with useless text.
Ok, but like what if your metadata is even more complicated than just a single field, it’s ridiculous to have to include even more opening and closing tags.
<map>
<kvp>
<key>k</key>
<value>v</key>
<metadata>
<field1>m</field1>
<field2>m2</field2>
</metadata>
</kvp>
</map>
It’s just crazy to look at tbh, and it was annoying to type on my phone. I’m amazed people are still arguing in favor of it. In reality you just want the markup to transmit the most information in the least amount of bits. This is a measurable quantity and xml objectively sucks at it.
> In reality you just want the markup to transmit the most information in the least amount of bits.
If that were the case, we'd be using binary serialization everywhere. But you yourself are making an argument that ease of writing matters. So does ease of reading. Overly verbose markup is a tax on both, but so is extreme brevity.
Anyway, this particular thread was a discussion to address a very specific point made in the article that overstated XML verbosity over JSON. I'm not actually arguing that XML is perfect, or even "good enough". Its syntax is overly verbose, and its data model has pointless distinctions and arbitrary restrictions. But it also had many good ideas, and it's unfortunate that those get ignored in the quest of simplifying everything - and then later, when the issues that were the original motivation for those ideas are rediscovered, that wheel gets reinvented in dozens of flawed and mutually incompatible ways.
.NET Core uses .csproj project files which are XML.
Do you have difficulty in reading a large HTML document? That's as verbose and repetitive as XML but also usually filled with junk comments and malformed tags. If you don't find it hard, then why is XML different?
> Millions of developers use XML all the time, it's called HTML.
HTML is not XML and even then few developers today use HTML without some sort of processing involved. HTML is also not a language used for configuration files, so it is really off-topic.
The question is, do people prefer XML configuration over JSON configuration. The answer is that people generally prefer JSON. JSON just maps far better to data types we use in everyday programming. Have you seen some of the common ways of expressing key-value pairs in XML? It's horrible.
> Now try taking a typical HTML document and expressing it in JSON.
That's not the problem anybody is trying to solve here. We're talking about configuration files, not complex documents. But let's say I wanted to solve that problem, the representation would look like this:
It's a technical distinction without a practical difference. They are both declarative languages to describe data.
Why is it so horrible? Have you ever typed a <ul><li> list? Or <input name="key" value="value">? What exactly is so difficult about this syntax in XML but completely fine in HTML?
The point isn't whether you can represent a document in JSON, it's about how easy it is for a human to manage it. The XML/HTML structure is far easier as things get more complex. The page you linked to shows this quite clearly even with the small "Bulleted List Example" at the bottom.
> It's a technical distinction without a practical difference.
There's a lot of practical differences when actually implementing configuration with either XML or JSON, due to technical distinctions.
> The point isn't whether you can represent a document in JSON, it's about how easy it is for a human to manage it.
If XML is easier to manage, why are people shifting from XML and to JSON for configuration? You may have some experience with HTML, but have you actually used some of these XML abominations that are used for configuration?
> The XML/HTML structure is far easier as things get more complex.
Configuration files shouldn't be complex. They're mostly key-value pairs, perhaps a couple of lists here and there and a modest amount of nesting. XML is complex to begin with.
Perhaps at some stage, for some use-cases, XML starts becoming simpler to edit. Configuration files generally isn't one of them.
> The page you linked to shows this quite clearly even with the small "Bulleted List Example" at the bottom.
It's an example of representing XML-like structure with JSON, which is fairly easy. Try it the other way around, things get hairy. If you were to represent the data as JSON, you wouldn't write it like that.
If config is small, both formats are easy and it doesn't matter. When config is large and complex then both formats can be hard, however XML is much easier than JSON as complexity climbs. My evidence is how many people easily edit large complex document structures in HTML already without issue.
I'm just not sure what the argument against this is other than verbosity. Config files shouldn't be complex? Sure, but what if they are? Are people really moving to JSON everywhere or is it just tied to the rise of Javascript and web frameworks?
> If config is small, both formats are easy and it doesn't matter.
Even if I would agree with this (which I don't), if it doesn't matter then you should pick JSON because it's easier to work with as a developer and it doesn't require XML parsing as a dependency.
> My evidence is how many people easily edit large complex document structures in HTML already without issue.
This isn't evidence at all. An HTML document is not a configuration file. Even then, people rarely author large HTML documents by hand these days. It's more likely that they're using a simpler language like Markdown to author documents.
> Are people really moving to JSON everywhere or is it just tied to the rise of Javascript and web frameworks?
If Javascript and web frameworks are responsible for the rise of JSON, but XML is better, why didn't they stick with XML? Remember, SOAP was XML. AJAX stands for "asynchronous Javascript and XML". The way to do an HTTP request (before fetch) from Javascript was called (inappropriately) XMLHTTPRequest. JSON on the other hand was just an informal spec with several flaws, yet it won out over XML.
1) What I mean by "it doesn't matter" is that at small scale, neither one easier than the other.
2) JSON does need to be parsed and does add a dependency. Javascript is not the only language out there.
3) HTML not being config files is not the point. The document/tag structure is identical. If you can edit large complex HTML files (regardless of how they are generated) then you can edit large complex XML files. And it's rather easy to do so, against the claims that XML is so hard to work with.
4) There are plenty of XML configs out there, backing just about every piece of software you use. If you only focus on web/js projects then JSON "won out" because it is a simple dump of the in-memory representation rather than a more formal serialization like XML. The missing features like JsonPath and Json Schema are now being added back to turn it into a proper serialization format. When storing configs, these schema features are rather important.
1) Yes, if they're both easy to use, pick JSON - for simplicity's sake.
2) A JSON parser is a far smaller and simpler dependency than an XML parser, especially when we're adding XPath to the mix.
3) XML isn't necessarily hard, it's tedious.
4) There's plenty of legacy software out there using all kinds of stuff for config. You don't see a lot of people choosing XML these days. You see YAML or TOML or JSON, even though all of these have issues of their own. JSON happens to be the simplest and most commonly supported.
HTML is distinctly not XML. There is XHTML if you want to use a form of HTML that is processed like XML, but XML processing is a lot stricter than HTML processing in which you can take a lot more liberties in the markup and still end up with a readable result, maybe even what you intended, whereas an XML processor will usually refuse to render incorrect XML markup at all.
I'm comparing the data structure and editing ergonomics (which are identical) between XML and HTML, not how they are processed. It's good for config files to have precise parsing instead of tolerating errors.
Javascript isn't the only language though. There are lots of XML and JSON parsers and serializers available to convert to an in-memory object. Most relational databases have great XML support too.
I don't think people realize that JSON doesn't actually have a querying system at all, you have to deserialize it to an object to use. There is the coming JsonPath standard but that's not well supported yet and is pretty much the same as XPath.
I think JSON is just universally more easy to use. Every language I’ve used with it is mainly the same in the way it works. JavaScript, Ruby, Python, Perl, PHP...
But you’re not wrong. Yes you have to deserialise it. Hence it being an Object Notation system.
While you say JSON is trying to catch up with XML in regards to XPath, JSON is trying to catch up to XML with things like JSON Schema.
> While you say JSON is trying to catch up with XML in regards to XPath, JSON is trying to catch up to XML with things like JSON Schema.
Absolutely. I would say XSD is the strongest advantage of XML over JSON right now. XSD still has many things to improve but JSON schema is even further behind.
> I don't think people realize that JSON doesn't actually have a querying system at all, you have to deserialize it to an object to use.
You don't have to deserialize it to an object, you just do that because it's convenient in Javascript (and other dynamic language). It's a feature.
Now try using XML like an object, what do you get? A DOM. Which is fine if you wanted a DOM, but I don't want a DOM. I don't need XPath or any of that stuff. These are tools to deal with the complexity of XML, which I don't have, because I am using JSON.
A DOM is just a tree (and deserialized XML is not always W3C DOM; indeed, in most modern languages, it usually isn't). Deserialized JSON is also just a tree. And even if you deserialize JSON by eval'ing it from JS, it is still deserialization - you're just happening to be reusing the JS parser for that purpose. But any parser is fundamentally a deserializer from the language syntax to an AST.
So you're really saying that JSON deserializes to something that is a more natural fit for the language that you're using. And it's true in many cases; but also not so much in others, like when you're dealing with 64-bit integers, or dates, or all those other things that JSON doesn't spec because "complexity". In practice, it just means a proliferation of incompatible ways to represent these things, and utterly insane deserialization behavior in corner cases when implementations try to be "smart" to transparently compensate for JSON lacking something (e.g. https://github.com/JamesNK/Newtonsoft.Json/issues/862).
Conversely, if you are writing in a language that has integral support for XML - say, XQuery, or even VB.NET (https://docs.microsoft.com/en-us/dotnet/visual-basic/program...), the complexity is mostly not there. At the very least, if you control the format - which you have to, if you're in a position to decide what to use - then you can certainly stick to the subset of XML that is not anymore complex than JSON.
Remember, we're comparing with XML. There's no dates or numeric types in XML at all. This kind of proliferation is far worse in XML.
Sure, there are corner cases and limitations with JSON. I've never experienced that as a significant issue.
> Conversely, if you are writing in a language that has integral support for XML - say, XQuery, or even VB.NET...
I am not using any of that stuff, nor is there any reason for me to start using it.
> At the very least, if you control the format - which you have to, if you're in a position to decide what to use - then you can certainly stick to the subset of XML that is not anymore complex than JSON.
Nor does it doesn't require an out-of-band schema - you can slap xsi:type on any element. And you can do that without breaking the data model, because namespaces keep data and metadata unambiguously separate, and code can easily deal with the former while being completely oblivious to the latter, unless it needs it.
JSON also has similar higher-level abstraction layers with more metadata. The problem is that nobody can agree on which one to use, or even whether to use one at all, and most code that's deserializing JSON in the wild is not going to be able to distinguish metadata from data.
Sure, you can add information until you arrive at a point where a string attribute or text content will be interpreted as a certain data type, but XML itself doesn't have it.
> JSON also has similar higher-level abstraction layers with more metadata.
JSON has all the basic data types built right in, there's no need for more metadata to do simple things. There's a reasonable mapping to basic data types and structures for almost any language.
> The problem is that nobody can agree on which one to use, or even whether to use one at all, and most code that's deserializing JSON in the wild is not going to be able to distinguish metadata from data.
...which is generally fine because of the aforementioned mapping. Your JSON library doesn't have to (and shouldn't) do any magic.
That's ridiculous - there are lots of pragmatic xml deserializers available. Not to mention, sometimes you want something like xpath; lack of xpath and lack of validation (schema, whatever) aren't features, they're bugs.
The only real advantage json has when it comes to deserialization is that it suggests to a human reader that keys are like object properties, leading to a very obvious deserialization strategy. But even that's a bit misleading, since json allows duplicate keys, just like xml, so a pragmatic deserialization library is going to make that feature impossible.
Make no mistake - the culture of straightforward deserialization is hugely valuable! But that's because of its history and other human factors more than the language itself.
Fair, but many developers are intimately aware of the shape of that kind of XML. Ever tried making sense of an XML document that tries to express something as complex as a web page, but without the requisite domain knowledge? It's painful.
But XQuery (and XPath, being its subset) is pretty much just pure sequence comprehensions for the XML Data Model. It might have been unusual from a mainstream PL perspective back when it was introduced, but today, when C# has LINQ, and JS developers preach the miracles of map/filter/fold over immutable data structures, I don't think XQuery is all that exotic.
JSON has no querying system, it can only be accessed by deserializing to an in-memory object or using whatever custom APIs are available (like postgres json functions).
JsonPath is the proposed standard, and it looks pretty much like XPath. And that's before getting into Json Schema.
And the problem would not exist if we just used the good ol' s-expressions. Gets all the structural benefits of XML and JSON with few of the drawbacks.
The problems people have with XML is not the syntax. Also, how would you represent (optional) attributes in s-expression? I could think of a couple ideas but none that is nice.
Basic S-expressions also don't distinguish between lists and maps, which is something that turns out to be very convenient in practice. Sure, a map is just a list of pairs - but the deserializer needs to be aware of its meaning to parse it into the appropriate data structure. So you either need a schema even for the most trivial cases, or you need a distinct syntax.
Basic S-exp syntax can easily be extended to denote dictionaries. Just like #(...) gives us vectors and #S structures, some #H can provide hash tables.
If you really needed, you could just define that the second list element is an attribute list/map, turning <foo bar="baz"><quux /></foo> into (foo ((bar "baz")) (quux)).
The problem with XML itself is being unnecessarily verbose (and thus difficult for both human and machine to read) for what's just a way to encode trees. Attributes are arguably XML's self-inflicted gunshot wound in the foot; you mainly need them because of visual noise caused by regular nodes.
I believe that more than being verbose (closing tags for example) it is that the whole spec is enormous, with entities and namespaces making it even more complex. Still we see that some of that is actually needed as various json path/schema projects show.
mapping a 1-1 s-expression translation on HTML/XML/JSON/YAML etc. solves nothing. YAML has (had) code execution security problem and parser incompatibility issues. HTML/XML actually need namespaces in a few cases. JSON is abused as "compile target".
There can be no one format that works for everything. This is why I like TOML, it is really good at what it tries to do and stops there.
You may be right, some JSON parsers even support comments. But you never know which parser is strict and which not, because it's not part of the specification.
Your link shows nested data structures working? But regardless TOML is a configuration file format (like INI), not a general purpose data structure format (like JSON is).
General purpose until you need to store dates/times, self-referential structures, enums, etc.
json and toml are both serializations of data. I don't think the line you're trying to draw between "configuration file format" and "data structure format" is a well-defined line.
Sure it's a blurry line but some formats are better than others for some specific tasks. JSON isn't the right tool for every job but it's a good enough lowest common denominator for data sharing.
I wouldn't want to use if for config files or passing complex data but then it's not designed for that.
TOML also has more "normal" notations for dicts and lists that are almost the same as JSON.
There is a distinction between markup languages e object notation languages. XML, HTML and Markdown are markup, JSON, YAML, TOML and most other are object notations.
(here my distinction, in general, is if they allow unquoted text)
I mean, we're not going back to xml anyhow; and xml is considerably older anyhow, so this is a pretty hypothetical case.
But I'd posit that most - but not all - of the issues with xml config files have little to do with xml per se, and everything to do with the crazily detailed stuff people squeezed in it. You don't have to stick everything in a namespace, and nest even trivial things 3 deep with annotations for obvious types. And if you do those things in json it turns into a mess too.
SOAP is a perfect example of that. But: just because soap is messy doesn't imply xml everywhere has to be that.
But sure, xml has some downsides. Then again, so does json. Oh well!
But do you know what sucks even harder? XML. If I can replace an XML configuration file with JSON then screw it.