/ php

PHP - Async cURL Requests

APIs are the hot new thing and we use them more and more frequently in our code. Sometimes we need to perform lots of API requests and they aren't dependent on each other. For example, maybe you need to GET a bunch of information from multiple sources, before stitching it together? You could massively improve the performance of such code by performing the requests asynchronously rather than sequentially. Then your code would take roughly as long as the slowest request takes, rather than the sum of them together.

There are two main ways to perform asynchronous requests. You can either go "native" with the in-built [curl_multi_*(http://php.net/manual/en/function.curl-multi-exec.php) functions, or you can use the Guzzle library to simplify your life. The Guzzle library uses these curl_multi functions, but provides you with an easier interface and documentation.

Native

The code below will send off a bunch of requests at once and print out the response data. You will not be able to process any of the responses until they have all returned.

<?php

class ParallelGet
{
    function __construct($urls)
    {
        // Create get requests for each URL
        $mh = curl_multi_init();
        
        foreach($urls as $i => $url)
        {
            $ch[$i] = curl_init($url);
            curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 1);
            curl_multi_add_handle($mh, $ch[$i]);
        }
        
        // Start performing the request
        do {
            $execReturnValue = curl_multi_exec($mh, $runningHandles);
        } while ($execReturnValue == CURLM_CALL_MULTI_PERFORM);
        
        // Loop and continue processing the request
        while ($runningHandles && $execReturnValue == CURLM_OK) 
        {
            // !!!!! changed this if and the next do-while !!!!!
            
            if (curl_multi_select($mh) != -1) 
            {
                usleep(100);
            }
            
            do {
                $execReturnValue = curl_multi_exec($mh, $runningHandles);
            } while ($execReturnValue == CURLM_CALL_MULTI_PERFORM);
        }
        
        // Check for any errors
        if ($execReturnValue != CURLM_OK) 
        {
            trigger_error("Curl multi read error $execReturnValue\n", E_USER_WARNING);
        }
        
        // Extract the content
        foreach ($urls as $i => $url)
        {
            // Check for errors
            $curlError = curl_error($ch[$i]);
            
            if ($curlError == "") 
            {
                $responseContent = curl_multi_getcontent($ch[$i]);
                $res[$i] = $responseContent;
            } 
            else 
            {
                print "Curl error on handle $i: $curlError\n";
            }
            // Remove and close the handle
            curl_multi_remove_handle($mh, $ch[$i]);
            curl_close($ch[$i]);
        }
        
        // Clean up the curl_multi handle
        curl_multi_close($mh);
        
        // Print the response data
        print "response data: " . print_r($res, true);
    }
}

$urls = array(
    'localhost',
    'localhost',
    'localhost',
    'localhost'
);

$getter = new ParallelGet($urls);

Guzzle

The first thing you need to do is install guzzle with:

composer require guzzlehttp/guzzle

There are two examples below taken from the Guzzle documentation. With these, it is easy enough to process the responses as they come in and not wait until they have all returned.

Example 1

<?php

require_once(__DIR__ . '/vendor/autoload.php');
$client = new GuzzleHttp\Client();

$promises = [
    $client->getAsync('http://localhost')->then(function ($response) { echo $response->getBody(); }),
    $client->getAsync('http://localhost')->then(function ($response) { echo $response->getBody(); }),
    $client->getAsync('http://localhost')->then(function ($response) { echo $response->getBody(); }),
    $client->getAsync('http://localhost')->then(function ($response) { echo $response->getBody(); }),
    $client->getAsync('http://localhost')->then(function ($response) { echo $response->getBody(); }),
    $client->getAsync('http://localhost')->then(function ($response) { echo $response->getBody(); }),
    $client->getAsync('http://localhost')->then(function ($response) { echo $response->getBody(); }),
    $client->getAsync('http://localhost')->then(function ($response) { echo $response->getBody(); }),
    $client->getAsync('http://localhost')->then(function ($response) { echo $response->getBody(); }),
];

$results = GuzzleHttp\Promise\unwrap($promises);

// Wait for the requests to complete, even if some of them fail
$results = GuzzleHttp\Promise\settle($promises)->wait();

print "finished." . PHP_EOL;

Example 2

This is another example that does the same thing but is written in a slightly different way

<?php

require_once(__DIR__ . '/vendor/autoload.php');

$client = new GuzzleHttp\Client([
    'base_uri' => 'http://localhost', // Base URI is used with relative requests
    'timeout'  => 20.0, // You can set any number of default request options.
]);


$promises = array();

for($i=0; $i<20; $i++)
{
    $promise = $client->getAsync('http://localhost');
    $promise->then(function ($response) { echo $response->getBody() . PHP_EOL;});
    $promises[] = $promise;
}

foreach ($promises as $promise)
{
    $response = $promise->wait();
}

print "finished." . PHP_EOL;

Testing

For testing that the code works, run an Apache or Nginx webserver with the following index.php file:

<?php
print "Time start: " . time() . PHP_EOL;
$sleepAmount = rand(1, 10);
print "Sleeping for $sleepAmount seconds." . PHP_EOL;
sleep($sleepAmount);
print "Time end: " . time() . PHP_EOL;

Using php -S localhost will not work as that will run a webserver that can only handle one request at a time.

When you run the code, you should get output similar to below:

Time start: 1500815846
Sleeping for 2 seconds.
Time end: 1500815848
Time start: 1500815846
Sleeping for 2 seconds.
Time end: 1500815848
Time start: 1500815846
Sleeping for 3 seconds.
Time end: 1500815849
Time start: 1500815846
Sleeping for 4 seconds.
Time end: 1500815850
Time start: 1500815846
Sleeping for 4 seconds.
Time end: 1500815850
Time start: 1500815846
Sleeping for 7 seconds.
Time end: 1500815853
Time start: 1500815846
Sleeping for 8 seconds.
Time end: 1500815854
Time start: 1500815846
Sleeping for 8 seconds.
Time end: 1500815854
Time start: 1500815846
Sleeping for 9 seconds.
Time end: 1500815855
finished.

The key point here is that the "Sleeping for" messages are in order of duration, and the Time start values are all the same. This shows that the webserver was hit with all of the requests at the same time. Then the requests that had the lower sleep durations finished first and we output their responses. Thus the output is in order of the sleep durations.

References

Stuart Page

Stuart Page

Stuart is a software developer with a passion for Linux and open source projects.

Read More