Programster's Blog

Tutorials focusing on Linux, programming, and open-source

Python - Iterate Over Massive JSON Files

Preparation

To demonstrate that we can work with massive JSON files, we first need to create an example. Execute the following PHP script in order to create a huge JSON file containing one million objects and will be roughly 842 MiB in size.

<?php

$numItems = 1000000;
$items = [];

for ($i=0; $i<$numItems; $i++)
{
    $items[] = [
        "uuid" => md5(rand()),
        "isActive" => md5(rand()),
        "balance" => md5(rand()),
        "picture" => md5(rand()),
        "age" => md5(rand()),
        "eyeColor" => md5(rand()),
        "name" => md5(rand()),
        "gender" => md5(rand()),
        "company" => md5(rand()),
        "email" => md5(rand()),
        "phone" => md5(rand()),
        "address" => md5(rand()),
        "about" => md5(rand()),
        "registered" => md5(rand()),
        "latitude" => md5(rand()),
        "longitude" => md5(rand()),
    ];
}

file_put_contents(__DIR__ . '/data.json', json_encode($items, JSON_PRETTY_PRINT));

Steps

Ensure you have the ijson package installed by running:

pip install ijson

The script below demonstrates how we can loop over a massive JSON file without having to load it all into memory.

import json
import ijson

def doSomethingWithObj(obj):
    print("%s" % (obj['uuid']))


def main():
    counter = 0

    # https://github.com/isagalaev/ijson/issues/62
    with open('data.json', 'rb') as data:
        for obj in ijson.items(data, 'item'):
            counter = counter + 1
            doSomethingWithObj(obj)

    print("There are " + str(counter) + " items in the JSON data file.");


main()

You should see it stream out the identifiers as it goes along, and when it finishes, it will tell you that there were 1,000,000 items.

References

Last updated: 22nd April 2021
First published: 12th April 2021

This blog is created by Stuart Page

I'm a freelance web developer and technology consultant based in Surrey, UK, with over 10 years experience in web development, DevOps, Linux Administration, and IT solutions.

Need support with your infrastructure or web services?

Get in touch