Why does Vertica need so many resource memory, disk etc

Document ID : KB000048478
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

The Vertica database is one that is effectively only CPU bound when configured optimally. This means that the amount of work that the database can concurrently process is bound by the number of physical CPUs on the server, not the amount of memory or disk speed (again, when properly configured). Additionally, because CPUs are the bottleneck, hyper-threading (i.e. the sharing a single CPU with two threads) is not recommended; by design, both threads will constantly be busy trying to perform their respective calculations, and, thus, the tasks would have to share the physical CPU. The aforementioned sharing will have a net effect of slowing down queries.

Solution:

To ensure that he database is properly CPU bound, it is recommended that the proper resources are available to support the Vertica database. One such resource is memory. Ensuring that we have the proper amount of memory to support the database will go a long way towards ensuring a fast and responsive database.

Vertica recommends a minimum of 4 GB of memory per CPU, with a recommend amount of 8 GB per CPU (more than 8 GB will not cause issues, but also doesn't provide much benefit).

The reason for this large amount of memory is that in order to best utilize the CPU resources, the Vertica database attempts to perform all queries in memory.

Since our planned execution is equal to the number of cores, that means you could have one query per core being executed. So, divide the amount of memory in the system by the number of cores and you will get the amount of memory each query could use concurrently before running out of memory.

If there is an insufficient amount of memory, then the query is stopped, and started again, this time writing everything to disk instead of to memory. This spill-to-disk operation is a very expensive task and significantly impacts the performance of the Vertica database. Ensuring that the proper amount of memory has been configured will go a long way towards ensuring a fast and responsive database.

The last major resource that Vertica uses a lot of is disk space. Again because Vertica is designed to be a CPU bound database, we need to make sure that all resources that provide data to the CPU's can do so as fast as possible.

Vertica has found that network attached storage (NAS) have speeds that are too sporadic to provide a consistent experience due to the fact that data has to traverse a network. For this reason they strongly recommend direct attached storage (DAS) to provide reliable transfer speeds.

In terms of RAID, Vertica recommends a raid level of 1+0. This provides a complete copy of the data while ensuring no impact to read or write performance (other levels of raid have varying impacts to performance or tolerance).

Finally, Vertica recommends that you always have at least 40% free disk space. This free disk space is used by Vertica when performing temporary loads, deletes, etc... As well as providing disk space needed if a spill-to-disk operation occurs during the execution of a given query.