Tech Unpacked – Research & Fundamentals with Nitin Sharma

Friday, January 24, 2014

In-Memory Caching

The in-memory caching system is designed to increase application performance by holding frequently-requested data in memory, reducing the need for database queries to get that data.

The caching system is optimized for use in a clustered installation, where you set up and configure a separate external cache server. In a single-machine installation, the application will use a local cache in the application's server's process, rather than a cache server.

Parts of the In-Memory Caching System
In a clustered installation, caching system components interoperate with the clustering system to provide fast response to client requests while also ensuring that cached data is available to all nodes in the cluster.

Application server. The application manages the relationship between user requests, the near cache, the cache server, and the database.

Near cache. Each application server has its own near cache for the data most recently requested from that cluster node. The near cache is the first place the application looks, followed by the cache server, then the database.

Cache server. The cache server is installed on a machine separate from application server nodes in the cluster. It's available to all nodes in the cluster (in fact, you can't create a cluster without declaring the address of a cache server).

Local cache. The local cache exists mainly for single-machine installations, where a cache server might not be present. Like the near cache, it lives with the application server. The local cache should only be used for single-machine installations or for data that should not be available to other nodes in a cluster. An application server's local cache does not participate in synchronization across the cluster.

Clustering system. The clustering system reports near cache changes across the application server nodes. As a result, although data is not fully replicated across nodes, all nodes are aware when the content of their near caches must be updated from the cache server or the database.

How In-Memory Caching Works
For typical content retrievals, data is returned from the near cache (if the data has been requested recently from the current application server node), from the cache server (if the data has been recently requested from another node in the cluster), or from the database (if the data is not in a cache).

Data retrieved from the database is placed into a cache so that subsequent retrievals will be faster.

Here's an example of how changes are handled:

    1.    Client makes a change, such as an update to a user profile. Their change is made through node A of the cluster, probably via a load balancer.
    2.    The node A application server writes the change to the application database.
    3.    The node A app server puts the newly changed data into its near cache for fast retrieval later.
    4.    The node A app server puts the newly changed data to the cache server, where it will be found by other nodes in the cluster.
    5.    Node A tells the clustering system that the contents of its near cache have changed, passing along a list of the changed cache items. The clustering system collects change reports and regularly sends them in a batch to other nodes in the cluster. Near caches on the other nodes drop any entries corresponding to those in the change list.
    6.    When the node B app server receives a request for the data that was changed, and which it has removed from its near cache, it looks to the cache server.
    7.    Node B caches the fresh data in its own near cache.

Cache Server Deployment Design
In a clustered configuration, the cache server should be installed on a machine separate from the clustered application server nodes. That way, the application server process is not contending for CPU cycles with the cache server process. It is possible to have the application server run with less memory than in a single-machine deployment design. Also note that it is best if the cache servers and the application servers are located on the same network switch. This will help reduce latency between the application servers and the cache servers.

Choosing the Number of Cache Server Machines
A single dedicated cache server with four cores can easily handle the cache requests from up to six application server nodes running under full load. All cache server processes are monitored by a daemon process which will automatically restart the cache server if the JVM fails completely. Currently, multiple cache servers are not supported for a single installation.
In a cluster, the application will continue to run even if all cache servers fail. However, performance will degrade significantly because requests previously handled via the cache will be transferred to the database, increasing its load significantly.

Adjusting Cache-Related Memory

Adjusting Near Cache Memory
The near cache, which runs on each application server node, starts evicting cached items to free up memory once the heap reaches 75 percent of the maximum allowed size. When you factor in application overhead and free space requirements to allow for efficient garbage collection, a 2GB heap means that the typical amount of memory used for caching will be no greater than about 1GB.
For increased performance (since items cached in the near cache are significantly faster to retrieve than items stored remotely on the cache server) larger sites should increase the amount of memory allocated to the application server process. To see if this is the case, you can watch the GC logs (or use a tool such as JConsole orVisualVM after enabling JMX), noting if the amount of memory being used never goes below about 70 percent even after garbage collection occurs.

Adjusting Cache Server Memory
The cache server process acts similarly to the near cache. However, it starts eviction once the heap reaches 80 percent of the maximum amount. On installations with large amounts of content, the default 1GB allocated to the cache server process may not be enough and should be increased.
To adjust the amount of memory the cache server process will use, edit the /etc/jive/conf/cache.conf file and uncomment the following two lines and set them to new values:
#JVM_HEAP_MIN='1024'
#JVM_HEAP_MAX='1024'
Make sure to set the min and the max to the same value -- otherwise, evictions may occur prematurely. If you need additional cache server memory, recommended values are 2048 (2GB) or 4096 (4GB). You'll need to restart the cache server for this change to take effect.

Why you should almost always choose Redis as your database

Whenever the topic of databases/persistence arises, I almost always recommend using Redis instead of MySQL or even any other NoSQL solution.

There are two reasons for choosing Redis almost always:
1) Redis data structures are far more intuitive and versatile means of storing data than relational databases.
To me, relational databases are a very limiting and antinatural way of structuring data. I’ve always felt that mapping the concepts of your program (whatever your style of programming, but especially if it is object-oriented or functional) to relational databases is both painful and frustrating. This is for two reasons:

- Relational databases have no concept of hierarchy – that is, no nesting. What you have is a set of arrays, instead of having a tree. There’s nothing bigger than a table, and nothing smaller than a field.

- The links between nodes are of a weak and limited type: foreign keys. So you have to bend over backwards to implement some sort of network model for your data.

(BTW, this is why ORM is a rabbit hole of the kind that nothing of real beauty can come from. No matter how good the solution is, it’s always a variant of fitting a square peg in a round hole).

Redis, although it isn’t a true tree or a graph, is far closer to either of them, because it has a rich set of data structures which are very similar to those of today’s high level programming languages. From what I’ve looked around, no other NoSQL tool offers a comparable set of data structures.

This means you’ll do far less violence to the concepts of your program when you persist its data with Redis. This makes for faster development and will considerably improve the quality of your code. More importantly, your code will be more beautiful.

2) Redis runs in RAM
Although it persists to disk, Redis data is read from and written to RAM. Since RAM is about an order of magnitude faster than a disk, this translates to queries and write operations that are roughly an order of magnitude faster. Sure, many caveats and exceptions apply, but that’s the essence of Redis’ blazing performance.

So, to sum up:
Redis will make your application 1) easier and more enjoyable to program, because it maps better to the concepts of your program; and 2) faster.
Yet…

You should not use Redis if your dataset is large (more than 2gb).
This is because it is non-trivial (though possible) to create a cluster of Redis instances, each of them holding up to 2gb (or 4, or 8). Also, if your application stores larges volumes of data, then probably Redis will never be your option because of economics (can you afford terabytes of RAM?). In that case, you should give a deep, meaningful look to Amazon S3.

(Did you notice I’m implying you should never use MySQL?)

These counterarguments to using Redis are invalid:
MySQL is the default and Redis is not production-ready: there’s much to argue against using the default technological choice for anything – and unless your clients insist of vanilla-grade software, you should seek something better than the median tool. And Redis is very, very production ready. Just look around and see who’s using it.

- Redis is not truly persistent because it runs in RAM: both of Redis’ persistence operations (journal and snapshotting) are good enough. For me, the ideal would be to have a reverse journal, where you store the negative changes (what you should apply to goback instead of starting from 0 and going forwards) – if you combine this with snapshotting, you’d have something that’s virtually lossless and fast. But going back to the main point, Redis persistence to disk is secure and reasonably fast.

Tech Unpacked – Research & Fundamentals with Nitin Sharma

Popular Posts

Search This Blog

Friday, January 24, 2014

In-Memory Caching

Why you should almost always choose Redis as your database

My Profile

Featured Post

🚀 Introducing the Universal API Testing Tool — Built to Catch What Manual Testing Misses

!! IMPORTANT LINKS !!

!! INTERESTING TALKS !!

Contact Form

Labels

Total Pageviews