Friday, January 24, 2014

11 Common Web Use Cases Solved In Redis

Common Redis primitives like LPUSH, and LTRIM, and LREM are used to accomplish tasks programmers need to get done, but that can be hard or slow in more traditional stores. A very useful and practical article. How would you accomplish these tasks in your framework?

    1.    Show latest items listings in your home page. This is a live in-memory cache and is very fast. LPUSH is used to insert a content ID at the head of the list stored at a key. LTRIM is used to limit the number of items in the list to 5000. If the user needs to page beyond this cache only then are they sent to the database.

    2.    Deletion and filtering. If a cached article is deleted it can be removed from the cache using LREM.

    3.    Leaderboards and related problems. A leader board is a set sorted by score. TheZADD commands implements this directly and the ZREVRANGE command can be used to get the top 100 users by score and ZRANK can be used to get a users rank. Very direct and easy.

    4.    Order by user votes and time. This is a leaderboard like Reddit where the score is formula the changes over time. LPUSH + LTRIM are used to add an article to a list. A background task polls the list and recomputes the order of the list and ZADD is used to populate the list in the new order. This list can be retrieved very fast by even a heavily loaded site. This should be easier, the need for the polling code isn't elegant.

    5.    Implement expires on items. To keep a sorted list by time then use unix time as the key. The difficult task of expiring items is implemented by indexing current_time+time_to_live. Another background worker is used to make queries using ZRANGE ... with SCORES and delete timed out entries.

    6.    Counting stuff. Keeping stats of all kinds is common, say you want to know when to block an IP addresss. The INCRBY command makes it easy to atomically keep counters; GETSET to atomically clear the counter; the expire attribute can be used to tell when an key should be deleted.

    7.    Unique N items in a given amount of time. This is the unique visitors problem and can be solved using SADD for each pageview. SADD won't add a member to a set if it already exists.

    8.    Real time analysis of what is happening, for stats, anti spam, or whatever. Using Redis primitives it's much simpler to implement a spam filtering system or other real-time tracking system.

    9.    Pub/Sub. Keeping a map of who is interested in updates to what data is a common task in systems. Redis has a pub/sub feature to make this easy using commands likeSUBSCRIBE, UNSUBSCRIBE, and PUBLISH.

    10.    Queues. Queues are everywhere in programming. In addition to the push and pop type commands, Redis has blocking queue commands so a program can wait on work being added to the queue by another program. You can also do interesting things implement a rotating queue of RSS feeds to update.

    11.    Caching. Redis can be used in the same manner as memcache.

The take home is to not endlessly engage in model wars, but see what can be accomplished by composing powerful and simple primitives together. Certainly you can write specialized code to do all these operations, but Redis makes it much easier to implement and reason about.

In-Memory Caching

The in-memory caching system is designed to increase application performance by holding frequently-requested data in memory, reducing the need for database queries to get that data.

The caching system is optimized for use in a clustered installation, where you set up and configure a separate external cache server. In a single-machine installation, the application will use a local cache in the application's server's process, rather than a cache server.

Parts of the In-Memory Caching System

In a clustered installation, caching system components interoperate with the clustering system to provide fast response to client requests while also ensuring that cached data is available to all nodes in the cluster.

Application server. The application manages the relationship between user requests, the near cache, the cache server, and the database.

Near cache. Each application server has its own near cache for the data most recently requested from that cluster node. The near cache is the first place the application looks, followed by the cache server, then the database.

Cache server. The cache server is installed on a machine separate from application server nodes in the cluster. It's available to all nodes in the cluster (in fact, you can't create a cluster without declaring the address of a cache server).

Local cache. The local cache exists mainly for single-machine installations, where a cache server might not be present. Like the near cache, it lives with the application server. The local cache should only be used for single-machine installations or for data that should not be available to other nodes in a cluster. An application server's local cache does not participate in synchronization across the cluster.

Clustering system. The clustering system reports near cache changes across the application server nodes. As a result, although data is not fully replicated across nodes, all nodes are aware when the content of their near caches must be updated from the cache server or the database.

How In-Memory Caching Works
For typical content retrievals, data is returned from the near cache (if the data has been requested recently from the current application server node), from the cache server (if the data has been recently requested from another node in the cluster), or from the database (if the data is not in a cache).

Data retrieved from the database is placed into a cache so that subsequent retrievals will be faster.

Here's an example of how changes are handled:

    1.    Client makes a change, such as an update to a user profile. Their change is made through node A of the cluster, probably via a load balancer.
    2.    The node A application server writes the change to the application database.
    3.    The node A app server puts the newly changed data into its near cache for fast retrieval later.
    4.    The node A app server puts the newly changed data to the cache server, where it will be found by other nodes in the cluster.
    5.    Node A tells the clustering system that the contents of its near cache have changed, passing along a list of the changed cache items. The clustering system collects change reports and regularly sends them in a batch to other nodes in the cluster. Near caches on the other nodes drop any entries corresponding to those in the change list.
    6.    When the node B app server receives a request for the data that was changed, and which it has removed from its near cache, it looks to the cache server.
    7.    Node B caches the fresh data in its own near cache.

Cache Server Deployment Design
In a clustered configuration, the cache server should be installed on a machine separate from the clustered application server nodes. That way, the application server process is not contending for CPU cycles with the cache server process. It is possible to have the application server run with less memory than in a single-machine deployment design. Also note that it is best if the cache servers and the application servers are located on the same network switch. This will help reduce latency between the application servers and the cache servers.

Choosing the Number of Cache Server Machines
A single dedicated cache server with four cores can easily handle the cache requests from up to six application server nodes running under full load. All cache server processes are monitored by a daemon process which will automatically restart the cache server if the JVM fails completely. Currently, multiple cache servers are not supported for a single installation.
In a cluster, the application will continue to run even if all cache servers fail. However, performance will degrade significantly because requests previously handled via the cache will be transferred to the database, increasing its load significantly.

Adjusting Cache-Related Memory

Adjusting Near Cache Memory
The near cache, which runs on each application server node, starts evicting cached items to free up memory once the heap reaches 75 percent of the maximum allowed size. When you factor in application overhead and free space requirements to allow for efficient garbage collection, a 2GB heap means that the typical amount of memory used for caching will be no greater than about 1GB.
For increased performance (since items cached in the near cache are significantly faster to retrieve than items stored remotely on the cache server) larger sites should increase the amount of memory allocated to the application server process. To see if this is the case, you can watch the GC logs (or use a tool such as JConsole orVisualVM after enabling JMX), noting if the amount of memory being used never goes below about 70 percent even after garbage collection occurs.

Adjusting Cache Server Memory
The cache server process acts similarly to the near cache. However, it starts eviction once the heap reaches 80 percent of the maximum amount. On installations with large amounts of content, the default 1GB allocated to the cache server process may not be enough and should be increased.
To adjust the amount of memory the cache server process will use, edit the /etc/jive/conf/cache.conf file and uncomment the following two lines and set them to new values:
#JVM_HEAP_MIN='1024'
#JVM_HEAP_MAX='1024'
Make sure to set the min and the max to the same value -- otherwise, evictions may occur prematurely. If you need additional cache server memory, recommended values are 2048 (2GB) or 4096 (4GB). You'll need to restart the cache server for this change to take effect.

My Profile

My photo
can be reached at 09916017317