Multiline — a Python package for multi-line JSON values
JSON is widely used for storing, exchanging and transporting data. It follows a particular syntax in which the data needs to be formatted, to be considered valid.
One such syntax is that the keys and their corresponding values should be stored in a single line. In case, we want to store multiline strings then we will have to manually convert them by removing all the newlines or replacing them with \n character.
Python’s default json package can parse a Json file or a string, only if it is valid that means it should only contain single line values. The package ‘multiline’ objective is to solve this challenge.
What does the package “multiline” offer ?
It can parse values spanning over multiple lines, whether it is in the form of a string or in a file. In addition to that it also offers the features/methods present in the default json package.
It offers four methods:
- load (takes in a file object, returns a dict object)
- loads (takes in a string, returns a dict object)
- dump (takes in a dict object and file object, writes data to file)
- dumps (takes in a dict object, return json string)
From the methods definition, it looks like, it is exactly similar to the methods available in default json package. Well, that is because multiline package creates a wrapper function on top of those methods.
load and loads method have additional functionality to convert multiple line values into single line value, and then provide it as an input to the default json.loads method, whereas dump and dumps method, simply calls the corresponding default json package method.
Well the next questions which comes to mind is:
Why do we need a wrapper on top of all four methods?
That is done to ensure that all the four methods are similar such that if json package is replaced with multiline it does not break the code. Also, we would need to import just one package to get all the functionalities.
How to use it?
It is available as an open source python package and can be installed using the command:
pip install multiline
After installing it, it needs to be imported, which can be done using the following command:
import multiline
Once it is done, its usage is completely similar to the default json package.
If multline parsing is required while loading the json, then an additional argument ‘multiline=True’ needs to be provided to trigger the custom parser. Lets understand it with few examples.
raw_json_string = """{
"key1": {
"subkey": ["'quoted
vaue'", "lorem dipsum
lorem \\"dipsum
lorem dipsum", "lorem dipsum line 1
lorem dipsum line 2"]
},
"key2": {
"subkey": {
"subsubkey1": {
"subsubsubkey1": "lorem dipsum line1
lorem dipsum line2
lorem dipsum line 3"
},
"subsubkey2": {
"subsubsubkey1": "lorem dipsum\\n line1
lorem dipsum line2
lorem dipsum line 3"
}
}
}
}"""
Lets consider this as our input string. It contains multi-line values, both as part of an array as well as a separate value. Also it contains special characters like \n and \” in the values.
Since it contains multi-line values, we will have to use the multiline package.
json_dict = mutliline.loads(raw_json_string, multiline=True)
This will convert it into a valid json and the end ouput will be a dictionary object.
Lets assume we have a json file named metadata.json containing similar multiple line values, then we can load it using below syntax:
with open('metadata.json', 'r') as fp:
json_dict = multiline.load(fp, multiline=True)
Deep dive into code
The additional functionality on top of load and loads function is provided through a decorator which in turn calls a parser function which replaces new line character with //n using regex.
Code snippet:
Complete code can be found at:
If anybody is interested to learn more about the package or have any ideas on how to improve it, please drop me a message.