Database Persistent Collections - Advantages/Disadvantages/Sizing

Document ID : KB000020343
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

I would like to know more about Database Persistent Collections to help augment scalability for Introscope. I want to separate resources for real time queries from longer running historical queries without using a separate interface. I also would like to know if multiple simultaneous historical queries are thought to perform better or worse on a SQL oriented database?

Solution:

This is from the performance engineers who have tested the persistent collections

What is persistent collections/Advantages/Disadvantages:?

The persistent collections is the original mechanism for storing performance data in a relational database. It was replaced by SmartStor, which is based on HFS and is about 100 times faster than relational for updates. In general, time-series data, which is what APM is about, does not have any relationships to be exploited by a RDBMS. Also, the APM use case is about 99% updates, and very few queries.

That said, creating persistent collections can be useful as part of a data warehouse strategy. The types of data being warehoused are usually 1 sample per hour and for a very small subset of available metrics. You simply do not need high resolution data in a data warehouse as the major use case is long-term trending and chargeback. Putting this subset of data into an RDBMS table makes is possible to automate the import of this data into the warehouse.

Data warehousing of performance data is generally a poor idea because, in my experience, no one ever uses the data! This is easy to expose simply ask for a sample report/analysis of performance data they are collecting you will never get one!!! Thus I refer to this strategy as data landfill of performance data.

Details on Sizing /performance:

We have done some internal testing of the rate at which metrics can be persisted to a DB. In our test environment, the DB server was NOT local to the EM. They were in geographically relatively distant (@50 miles) subnets, we were able to persist about 15,000 metrics per minute. With a local DB am assuming it will be faster than that .A field engineer did mention that it was about 20K metrics. We do not have any information on disk space required. Nor have we done any work on DB tuning for persistent collections.

As far as we know, the EM only writes to persistent collections. It can not query them. So, when you say, &without using a separate interface, either you are thinking that we can point historical queries in the EM to the external DB, or thinking of using the JDBC interface to SmartStor. The former idea is a misunderstanding about the function of persistent collections. The latter has not been performance tested, to my knowledge.

It does make sense to isolate large historical queries to a separate system, so as not to adversely impact an EM under heavy load. Comparing the performance of historical queries on an EM to historical queries against metrics stored in a relational DB is kind of an apples to oranges comparison. The environments are so different, and there would be so many variable factors, that the comparison would not have much meaning.

We don't have any practical experience with this, but believe it should be possible to mirror Smartstor onto a separate EM with no live Agents, exclusively for historical reporting. If the purpose is just offloading historical metric queries, and not interfacing with other products that know how to talk to relational DBs, this might serve your needs better than persistent collections. In fact (and, once again, We haven t tried this) it may be possible to hook both those EMs to a MOM, and accomplish
the &without using a separate interface requirement.