Understanding Temp Files in Postgres

Databases can feel like black boxes that hum along perfectly until suddenly they don’t. And when bad things happen with databases…they’re usually bad! One example of an error you might run into unexpectedly with your database is something about running out of temporary file space.

This is not intended to be a comprehensive look at temp files and the internals of how they work, but just to be a primer to get you started with a solid understanding of what they are and what kind of statistics you might run into as you are figuring out how to optimize your database to handle complex queries and temp files—I link to several other resources throughout that might help you make a deeper dive into this. Let’s get to it!

At their core, databases are file systems. Data is persisted to the disk for long term storage, but data is also cached in memory if it’s frequently accessed, or currently being accessed, for quicker access.

When you write complex queries against a Postgres database that includes joins, sorting, DISTINCT queries, etc., those can require more memory than is available. When that happens, Postgres falls back to what are called temp files - these are actually stored on disk but only live for the length of the query. Once the query returns, the temp files are erased — hence the name!

Slower than memory, at least. That’s why databases default to memory when available, but using temp files are better than the query failing to execute. Usage of temp files also typically indicates long-running queries, so these are queries that are likely to be slow to return anyway, and may cause slowdowns on your application.

If you run out of temp file space you could see a PGError that reads something like could not write to temporary file: No space left on device. This will cause the queries in question to fail to execute entirely. And the problem can be compounding — too many temp files generated by one query can cause another query to not have the space it needs to execute.

You probably are using temp files in some scenarios. This is okay in moderation, it’s when it starts causing errors or other slowdowns that you’ll run into trouble. For that reason, it’s probably a good thing to keep an eye on every now and then so it doesn’t sneak up on you once it’s already urgent. Continually optimizing queries never hurts!

You have a couple options for knowing if you’re using temp files (Note that some of the strategies below require having pg_stat_statements installed on your database, and it only includes statistics for queries that were successfully executed, not those that were cancelled or terminated):

  • By enabling log_temp_files on your database, you will be able to search your Postgres logs for temporary file to see if your queries are using them.
  • Some of the things returned from running SELECT * FROM pg_stat_database; is temp_files and temp_bytes, referring respectively to the number of temp files and the number of bytes written to temp files. These numbers however, return an aggregation of all the files created over the lifetime of the database, and do not reflect the current temp file usage. These stats can be reset with select pg_stat_statements_reset(), and you can learn more about the information returned from that view here.

This query will show you which queries have been run on your database and make heavy usage of temp files:

SELECT interval '1 millisecond' * total_time AS total_exec_time,
to_char(calls, 'FM999G999G999G990') AS ncalls,
total_time / calls AS avg_exec_time_ms,
interval '1 millisecond' * (blk_read_time + blk_write_time) AS sync_io_time,
query AS query
FROM pg_stat_statements WHERE userid = (SELECT usesysid FROM pg_user WHERE usename = current_user LIMIT 1)
AND temp_blks_written > 0
ORDER BY temp_blks_written DESC

The best way to reduce temp files and their impacts is to increase your query efficiency/decrease your query complexity. Some strategies for doing this might include:

  • Making sure your schema is well-defined to reduce unnecessary joins
  • Using indicies appropriately
  • Only including the information you need in your SELECT statements

The work_mem setting on your Postgres database dictates how much memory can be used before writing to temporary files. You can read more about it here. It defaults to 4mb, which is quite low for many use cases, but the correct value can depend on how your database is used and what kind of hardware it is running on.

While it might seem like a no-brainer to increase this configuration value to prevent temp files from being used, configuring the correct value for work_mem on your database is a complicated process. At one extreme, if your setting is too low, you may see temporary files being created when you run queries against your database. At the other extreme, if the setting configured to too high of a value, you may see out of memory errors. Neither of those things is ideal! This is a good article that goes more in depth on how it works and what things to take into account when configuring it.

In general when tweaking the configuration values of your database, you should try the new settings on a non-production database first, and make small incremental changes to see how they impact your runtime.