Alojamento Web

In this article I will introduce you to XML vs CSV vs JSON data formats and show you, simply, how they are in practice.

What is Data Format?

The Data format is the organization of information according to predefined specifications and is normally used for computer processing, that is, they are used for:

  • System integrations
  • Data transfers
  • Data storage
    • Stored in files
    • Serialized to be stored in a database or in local storage of the browse or cache

Data formats are used for communication between parts of a system or application, as well as for communication between independent systems. They are easily readable by humans and easy to programmatically manipulate.

What is XML Data Format?

XML (Extensible Markup Language) is a markup language that defines a set of rules for encoding hierarchically organized documents. The XML language is classified as extensible because it allows defining the markup elements.

A markup language is an aggregate of codes that can be applied to data or text to be read by computers or people. For example, HTML is a markup language designed to organize and format a website. XML has the same concept, but to standardize a sequence of data to organize, separate the content and integrate it with other languages.

The main characteristic of XML is the portability of information between computers and applications. For example, news feeds received via RSS using an XML file to structure the information.

Although XML is a very readable format, it has the drawback of being very bulky and may have tags, attributes, namespaces, and schemas. This means that it can eat up a lot of the bandwidth while transferring a small amount of data.

Example of an XML document:

<?xml version="1.0" encoding="UTF-8"?>
<firends>
    <friend>
        <name>John Ferreira</name>
        <age>26</age>
        <city>Porto</city>
        <profession>Full Stack Web Developer</profession>
        <hobby>Fitness</hobby>
    </friend>
    <friend>
        <name>Leonardo Marinho</name>
        <age>18</age>
        <city>London</city>
        <profession>Electric Engineer</profession>
        <hobby>Build lego</hobby>
    </friend>
    <friend>
        <name>Caroline Azevedo</name>
        <age>34</age>
        <city>Salvador</city>
        <profession>Entrepreneur</profession>
        <hobby>Sing</hobby>
    </friend>
</friends>

What is CSV Data Format?

The CSV (Comma-separated Values) is a format used for the representation of tabular data, widely used in database import/export and spreadsheet applications. Commonly, SQL database systems, like MySQL Workbench, SQL Server, and PhpMyAdmin, support importing and exporting data as CSV.

In other words, CSV is a plain text format delimited by lines where each line is a data record. Each record consists of one or more fields, separated by commas. Usually, the first line of the CSV is the header for the data of the remaining lines.

It’s easy to read and write by humans, and easy to interpret by computers. Programmatically a system can read all lines from a CSV and for each line split values between commas characters.

Example of a CSV data format:

name, age, city, profession, hobby
John Ferreira, 26, Porto, Full Stack Web Developer, Fitness
Leonardo Marinho, 18, London, Electric Engineer, Build lego
Caroline Azevedo, 34, Salvador, Entrepreneur, Sing

It is easy to see that CSV files take up much less storage space, as their structure is simple and small, so it is also great for streaming large volumes of data.

What is JSON Data Format?

JSON (JavaScript Object Notation) is a compact data format used for simple and fast data exchange between systems, and its basic structural unit is based on key-value pair.

A value can be one of the following data type or structure, and these structures can be nested:

  • string in double quotes
  • number
  • object
  • array
  • boolean (true or false)
  • null

See the json.org to get a full detail about JSON structure.

JSON is a JavaScript-based format, one of the most popular programming languages in the world. It has become very popular when using APIs and configuration files, such as package.json, which is created when we use NPM to manage the dependencies of a JavaScript project.

Example of a JSON data format:

{
    "friends":
    [
        {
            "name": "John Ferreira",
            "age": 26,
            "city": "Porto",
            "profession": "Full Stack Web Developer",
            "hobbies": ["Fitness", "Games"]
        },
        {
            "name": "Leonardo Marinho",
            "age": 18,
            "city": "London",
            "profession": "Electric Engineer",
            "hobbies": ["Build legos", "Robots", "Swim"]
        },
        {
            "name": "Caroline Azevedo",
            "age": 34,
            "city": "Salvador",
            "profession": "Entrepreneur",
            "hobbies": ["Sing", "play guitar"]
        }
    ]
}

Each item in the friends list is an object, just as we write objects in JavaScript. We say it is a self-contained document because for each object (a friend) we describe the keys and their values. Here is an example of a single item in the list, which is nothing more than a JSON object with all the necessary information for the item:

{
    "name": "John Ferreira",
    "age": 26,
    "city": "Porto",
    "profession": "Full Stack Web Developer",
    "hobby": "Fitness"
}

This becomes both a positive and a negative point:

  • Positive point: we have all the information we need in a single piece of code.
  • Negative point: we have code repetition and therefore more bandwidth data transfer consumption and more space needed for storage. Obviously, this point is only of concern if we are working with large volumes of data, as in Big Data.

Currently, JSON is the most used data type among developers. We can obtain proof of this by analyzing the volume of demand for the three formats (XML, CSV, and JSON) over several years, through Google Trends.

JSON, CSV and XML comparison

JSON, CSV, and XML comparison

As we can see, XML was widely used in the past, but JSON maintains a position far ahead of others. In turn, CSV is also in constant and growing demand, despite not having a large volume like JSON.

Now, I’m curious to try to understand why both three formats had a drop in demand since September 2020. Would you have an opinion?

Conversion of Data Formats

The need to convert between types of data has become very common, a fact that we can also certify through Google Trends.

CSV to JSON conversion comparison
CSV to JSON conversion comparison

CSV and JSON, being the two most popular data formats, it is very useful to have tools to convert CSV to JSON, or from JSON to CSV.

Conclusion

You may still be wondering what type of data format is best, but this will depend on your system’s goals. There is no data format or programming language that is the best for everything, but yes, there may be a better one for a specific need, according to the required requirements.

I would use XML as a technology for integrating systems or applications. I would use CSV to store large volumes of data that need to be concatenated at all times, such as log files, and also for data streaming. Finally, I would use JSON to exchange data between parts of an application, to receive data from a request to an API, as well as to send new data to it.

I hope that this article has been useful for you and that it has introduced you to the world of data formats with XML, CSV and JSON, which are part of every programmer’s daily life.