Python - Iterate Over Massive JSON Files
Preparation
To demonstrate that we can work with massive JSON files, we first need to create an example. Execute the following PHP script in order to create a huge JSON file containing one million objects and will be roughly 842 MiB in size.
<?php
$numItems = 1000000;
$items = [];
for ($i=0; $i<$numItems; $i++)
{
$items[] = [
"uuid" => md5(rand()),
"isActive" => md5(rand()),
"balance" => md5(rand()),
"picture" => md5(rand()),
"age" => md5(rand()),
"eyeColor" => md5(rand()),
"name" => md5(rand()),
"gender" => md5(rand()),
"company" => md5(rand()),
"email" => md5(rand()),
"phone" => md5(rand()),
"address" => md5(rand()),
"about" => md5(rand()),
"registered" => md5(rand()),
"latitude" => md5(rand()),
"longitude" => md5(rand()),
];
}
file_put_contents(__DIR__ . '/data.json', json_encode($items, JSON_PRETTY_PRINT));
Steps
Ensure you have the ijson
package installed by running:
pip install ijson
The script below demonstrates how we can loop over a massive JSON file without having to load it all into memory.
import json
import ijson
def doSomethingWithObj(obj):
print("%s" % (obj['uuid']))
def main():
counter = 0
# https://github.com/isagalaev/ijson/issues/62
with open('data.json', 'rb') as data:
for obj in ijson.items(data, 'item'):
counter = counter + 1
doSomethingWithObj(obj)
print("There are " + str(counter) + " items in the JSON data file.");
main()
You should see it stream out the identifiers as it goes along, and when it finishes, it will tell you that there were 1,000,000 items.
References
- Stack Overflow - Reading rather large json files in Python [duplicate]
- Pypi.org - ijson
- GitHub - isagalaev/ijson - Iterating over a collection!?!?
Last updated: 22nd April 2021
First published: 12th April 2021
First published: 12th April 2021