PostgreSQL Parallel Queries

Parallel query execution is an exciting new feature introduced in the latest version of PostgreSQL (9.6). Unfortunately this feature is not enabled by default, but this tutorial will show you how to enable it.

What are Parallel Queries

"Parallel query is a method used to increase the execution speed of SQL queries by creating multiple query processes that divide the workload of a SQL statement and executing it in parallel or at the same time." - Source

This means that you could see a dramatic improvement in speed, with a best-case scenario of getting a speed improvement factor equal to the number of cores you have (ignoring hyperthreading and SMT).

Steps

We just need to set the new configuration parameter max_parallel_workers_per_gather to a value larger than zero.

sudo vim /etc/postgresql/9.*/main/postgresql.conf

Then restart the service for the change to take effect:

sudo /etc/init.d/postgresql restart

Testing

Now that the feature is enabled you may want to test it. One easy to do this is to use the explain tool.

Below is the output I was getting before I made the changes:

test=# explain select max(value) from randomintegers;                                                                                 
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Aggregate  (cost=498330.80..498330.81 rows=1 width=4)
   ->  Seq Scan on randomintegers  (cost=0.00..425621.04 rows=29083904 width=4)
(2 rows)

Now with the feature enabled, I get the following:

test=# explain (costs off) select max(value) from randomintegers;
                      QUERY PLAN                       
-------------------------------------------------------
 Finalize Aggregate
   ->  Gather
         Workers Planned: 4
         ->  Partial Aggregate
               ->  Parallel Seq Scan on randomintegers
(5 rows)

As you can see, my simple query to get the maximum value from a table will have 4 different processes scan different chunks of the table and give me back the aggregated answer. This can have a massive performance benefit (so long as you have the disk performance).

As a quick test, I switching on timings (\timing) and ran a select count(*) query on a table with 30000000 rows with the feature turned on and off. When the feature is off I got:

test=# select count(*) from randomintegers;  
  count   
----------
 30000000
(1 row)

Time: 1944.191 ms  

And when the feature was on, I got:

test=# select count(*) from randomintegers;  
  count   
----------
 30000000
(1 row)

Time: 608.578 ms  

That's a 3.2x speedup. Your results may vary.

Limitations

For now, only read-only queries where the driving table is accessed via a sequential scan can be parallelized. Hash joins and nested loops can be performed in parallel, as can aggregation (for supported aggregates).

Parallel Query Variables

There are a number of new variables for this new feature. They are listed below with their descriptions. I highly recommend reading them all (except perhaps the last).

max_parallel_workers_per_gather

Sets the maximum number of workers that can be started by a single Gather node. Parallel workers are taken from the pool of processes established by max_worker_processes. Note that the requested number of workers may not actually be available at run time. If this occurs, the plan will run with fewer workers than expected, which may be inefficient. Setting this value to 0, which is the default, disables parallel query execution.

Note that parallel queries may consume very substantially more resources than non-parallel queries, because each worker process is a completely separate process which has roughly the same impact on the system as an additional user session. This should be taken into account when choosing a value for this setting, as well as when configuring other settings that control resource utilization, such as work_mem. Resource limits such as work_mem are applied individually to each worker, which means the total utilization may be much higher across all processes than it would normally be for any single process. For example, a parallel query using 4 workers may use up to 5 times as much CPU time, memory, I/O bandwidth, and so forth as a query which uses no workers at all.

parallel_setup_cost

Sets the planner's estimate of the cost of launching parallel worker processes. The default is 1000.

parallel_tuple_cost

Sets the planner's estimate of the cost of transferring one tuple from a parallel worker process to another process. The default is 0.1.

min_parallel_relation_size

Sets the minimum size of relations to be considered for parallel scan. The default is 8 megabytes (8MB).

force_parallel_mode

I've put this last on purpose because you probably don't want to mess with it.

Allows the use of parallel queries for testing purposes even in cases where no performance benefit is expected. The allowed values of force_parallel_mode are off (use parallel mode only when it is expected to improve performance), on (force parallel query for all queries for which it is thought to be safe), and regress (like on, but with additional behavior changes as explained below).

More specifically, setting this value to on will add a Gather node to the top of any query plan for which this appears to be safe, so that the query runs inside of a parallel worker. Even when a parallel worker is not available or cannot be used, operations such as starting a subtransaction that would be prohibited in a parallel query context will be prohibited unless the planner believes that this will cause the query to fail. If failures or unexpected results occur when this option is set, some functions used by the query may need to be marked PARALLEL UNSAFE (or, possibly, PARALLEL RESTRICTED).

Setting this value to regress has all of the same effects as setting it to on plus some additional effects that are intended to facilitate automated regression testing. Normally, messages from a parallel worker include a context line indicating that, but a setting of regress suppresses this line so that the output is the same as in non-parallel execution. Also, the Gather nodes added to plans by this setting are hidden in EXPLAIN output so that the output matches what would be obtained if this setting were turned off.

References

Author

Programster

Stuart is a software developer with a passion for Linux and open source projects.

comments powered by Disqus
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites. More info.