Programster's Blog

Tutorials focusing on Linux, programming, and open-source

Python - Iterate Over Massive JSON Files

Preparation

To demonstrate that we can work with massive JSON files, we first need to create an example. Execute the following PHP script in order to create a huge JSON file containing one million objects and will be roughly 842 MiB in size.

<?php

$numItems = 1000000;
$items = [];

for ($i=0; $i<$numItems; $i++)
{
    $items[] = [
        "uuid" => md5(rand()),
        "isActive" => md5(rand()),
        "balance" => md5(rand()),
        "picture" => md5(rand()),
        "age" => md5(rand()),
        "eyeColor" => md5(rand()),
        "name" => md5(rand()),
        "gender" => md5(rand()),
        "company" => md5(rand()),
        "email" => md5(rand()),
        "phone" => md5(rand()),
        "address" => md5(rand()),
        "about" => md5(rand()),
        "registered" => md5(rand()),
        "latitude" => md5(rand()),
        "longitude" => md5(rand()),
    ];
}

file_put_contents(__DIR__ . '/data.json', json_encode($items, JSON_PRETTY_PRINT));

Steps

Ensure you have the ijson package installed by running:

pip install ijson

The script below demonstrates how we can loop over a massive JSON file without having to load it all into memory.

import json
import ijson

def doSomethingWithObj(obj):
    print("%s" % (obj['uuid']))


def main():
    counter = 0

    # https://github.com/isagalaev/ijson/issues/62
    with open('data.json', 'rb') as data:
        for obj in ijson.items(data, 'item'):
            counter = counter + 1
            doSomethingWithObj(obj)

    print("There are " + str(counter) + " items in the JSON data file.");


main()

You should see it stream out the identifiers as it goes along, and when it finishes, it will tell you that there were 1,000,000 items.

References

Last updated: 22nd April 2021
First published: 12th April 2021