This document summarizes common performance tuning steps and techniques to help debug VSE performance issues.
Preparing your machine for production/benchmarking
The number one performance constraint is the VSE and client side logging setup. Under high load, logging can significantly slow performance. The logging
setup is controlled in LISA_HOME/logging.properties and you should ensure the log level is set to INFO; this will alert you to heap sizing
problems on the servers.
Recommended logging.properties configuration
The two most important lines are
# see http://logging.apache.org/log4j/docs/index.html and http://logging.apache.org/log4j/docs/manual.html
# This file is polled regularly by LISA applications at runtime so you can add new DEBUG loggers
# or change system-wide logging without restarting LISA apps.
# The following lines adjust the log levels of third-party libraries used by LISA so that
# they don't clutter the logs with messages unrelated to LISA.
log4j.appender.A1.layout.ConversionPattern=%d [%t] %-5p %-30c - %m%n
# Keep a separate log for VSE transaction match/no-match events, this makes debugging much easier.
# Change INFO below to WARN for production systems, the logging is expensive and will slow down
# systems with high transaction rates. Do not simply comment out the following line; explicitly
# set the log level to OFF or WARN instead of INFO
# If you want to add VSE logging to other log destinations, comment out the next line
log4j.appender.VSEAPP.layout.ConversionPattern=%d [%t] %-5p - %m%n
# If periodic thread dumps are enabled, this is where they will be sent. Note that we log at INFO level so
# to actually get the thread dumps just change WARN in the next line to INFO, even with LISA servers and/or
# workstation running and you should get a thread dump in the named file within 30 seconds. Search for 'threadDump'
# in lisa.properties for more info. This makes it very simple to get thread dump to debug performance issues, simply
# change WARN in the next line to INFO, wait for a minute or two, then (maybe) change it back to WARN to prevent the
# thread dumps.
# You can also generate a point-in-time thread dump with the LISA_HOME/bin/ServiceManager application
# e.g. ServiceManager -threadDump tcp://hostname:2014/Simulator or use standard java tools such as jstack
log4j.appender.THREAD_DUMPS.layout.ConversionPattern=%d [%t] %-5p - %m%n
VSE production settings
VSE is configured to clear its database cache every 10 seconds. This is a reasonable trade-off in a development environment where service images are being
constantly updated but when you are benchmarking or going into production, you should tune this value to reflect the static nature of the service image
Preparing your VSE instance
LISA_HOME/lisa.properties describes the setting and its best to override it in LISA_HOME/local.properties as follows:
(5.x only, 6.x does not use a database)
By default we cache Eclipselink object for no longer than 10 seconds. This means that changes made to the service image database via the service image
editor will propagate to a running virtual service model in no longer than 10 seconds and on average 5 seconds, assuming the object is already cached by
the VSE. If the object is not already cached the propagation will be immediate. 10 seconds is a reasonable trade-off in a development environment where you
are making changes to service images. In a production environment with a heavily loaded VSE server you might consider revising this number upwards to at
least a minute (60000 ms) or even an hour (3600000 ms) to reduce the number of SQLs issued and thus increase performance.
It is sometimes a good idea to crank up the maximum size of the internal event queue in local.properties. This event queue defaults to 8192 and under
extreme load this becomes a bottleneck. You would normally leave this alone. The only way right now to see if this is the bottleneck is to run
jstack against the VSE and see how many threads are blocked inside the EventHandler class waiting for the queue to be drained...
Disable Pathfinder. It adds a reasonable overhead, especially when you are going for maximum throughput.
(6.x does this for you so this is a 5.x thing only)
Reduce the amount of history that each VSE worker thread keeps. This reduces heap requirements and saves CPU.
Virtual Service Models should, in general, be deployed to production with 0% think time and a Capacity of more than 1 (the default) but probably no more
than 10 to 20, depending on how many other service models the VSE has deployed.
The Execution Mode should be set to "Most Efficient;" see also the discussion of recording the original service model (more flexible versus more
efficient). A "more efficient" model is almost twice as fast as a "more flexible" model.
Capacity here really means how many threads there are to service requests for this service model. The screen capture above shows 20 but the service is
running on the 32-way trex machine. A good rule of thumb for x86 machines is capacity= two times the number of cores. A good rule of thumb for x86 machines
is capacity= two times the number of cores. It is highly unlikely, unless the virtual service is I/O bound, that adding capacity beyond this will result in
increased throughput: the service will normally (should normally) be CPU-bound.
Under the hood
VSE works much the same way as most modern, high performance applications servers do. Each service has an endpoint, typically a
network socket. VSE has one Selector Thread per service socket. The Selector Thread uses a Java "SelectableChannel" to very
efficiently process inbound and outbound network IO. Inbound IO is handed to a thread pool bound to the service. The size of the
thread pool is fixed by the service Capacity.
In practice, it is very difficult to saturate the thread pool with standard models and a capacity of 10, so you might want to start with a
Capacity of only 5. Much work has gone into making the VSE execution path extremely "tight" and it is generally only when logging, XML
parsing or call-outs to other systems get involved that the VSE request/response cycle is not CPU bound.
Of course, if you really do want to model the original service think time, then the Think Scale setting will be 100% and the worker threads will put the
response on the DelayedResponseQueue and then returned to the pool. But this document is about maximizing throughput... you want a Think Scale of 0.
LISA load test clients
If you are using LISA to actually drive the load on the VSE (and there are many good reasons to do this) then you must ensure that your client side does
not become the bottleneck too quickly. Unless you have hand-crafted Java load clients, no matter which framework you use, it gets in the way eventually unless you spread them over several physical machines.
There are a few simple things you can do in your LISA staging doc and test case to ensure you won't totally slow down after a few vusers.
If you must have a report in the staging doc, make it a LoadTest report. The default LISA report will kill you under load, it simply collects too much
data and hits the reports DB too hard. It's a great report for functional testing but it's no good in performance tests.
Do not use global data sets in a performance test. They are effectively "remote singletons" and every instance will line up to access these remotely.
When your vuser calls the singleton it also (potentially) serializes the entire TestExec state across the wire in case the data set refers to testexec
properties. At the very least there is a small subset of the testexec that is serialized as part of the call, so do not use global data sets.
Use TestRunner to launch the test. There is less test event overhead. If you use Workstation and stage either locally or remotely make sure the Events
tab in the GUI is set for the terse filter; otherwise your test will spend more time sending events to the coordinator and workstation than it does
hitting the server.
Make sure you give your simulator VMs plenty of headroom. Out of the box they get 512m of heap: it is usually a good idea to do -Xms1024m -Xmx1024m.
Make sure you are using the Hotspot Server VM. This is the out of the box setting so you should be using it. Verify by looking in the various log
files, early in the piece:
vse_log.log:2009-08-31 19:50:11,641 [main] INFO
com.itko.lisa.test.Environment - JVM info:
java.vm.name=Java HotSpot(TM) Server VM,
java.vm.specification.name=Java Virtual Machine Specification,
java.vm.specification.vendor=Sun Microsystems Inc.,
java.vm.vendor=Sun Microsystems Inc.,
Design your LISA test such that it take "a while" (but not too long) to execute. A test with a single step and several hundred vusers pounding
it will spend most of the CPU cycles on setup and teardown, not generating the client load. Usually the easiest way to achieve this is to use a Counter
data set to loop the test and finish at the end of the data set (say 1 to 10,000 in increments of 1).
Tuning Garbage Collection
JVM garbage collection has been observed to cause VSE performance issues, with responses sometimes taking several seconds. To troubleshoot and resolve
this, add the following to VSE's vmoptions file. These options apply only to Sun's / Oracle's JVM.
Bounce VSE, and watch the output (stdout). You will see lines like this:
3603.403: [GC [PSYoungGen: 121111K->26065K(132608K)] 533555K->441076K(553280K),0.0761500 secs] [Times: user=0.19 sys=0.00, real=0.11 secs]
3603.481: [Full GC [PSYoungGen: 26065K->0K(132608K)] [PSOldGen: 415010K->388919K(425792K)] 441076K->388919K(558400K) [PSPermGen:
55456K->55456K(59456K)], 3.6979719 secs] [Times: user=4.51 sys=0.09, real=4.70 secs]
In this case, the OldGen garbage collection was causing VSE to slow down significantly, which you can see by comparing it to the output when only YoungGen
collection happened to the output when OldGen collection happened. Switching garbage collection strategies may help. Add these lines to your vmoptions
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:MaxGCPauseMillis=500
Other options can be found at http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html
Surprisingly, it can also help to decrease the memory you have allocated to VSE.