Thick Clients and CMS

Thick Clients and CMS

I originally started this blog off with a once very famous thick client (Adobe Flex); and after half a decade I am seeing a surge of the thick clients once again – this time it’s more where the standards and adoption lies – HTML5 and JavaScript. (I am not going to define thick clients here as there are numerous definitions available out there). My 2nd innings with thick clients come at a point when I am not working on building typical Web application – the classic CRUD operations. For the last 2 years I have been engaged in building Content Management Platform using products like Adobe AEM 5.x (6.x), SDL Tridion, etc.

If you have worked with any of the systems, you would know that the typical design and implementation lifecycle of these projects has been and remains to be:

  1. Site Developers write the HTML
  2. Handover the HTML to the CMS developers
  3. CMS developers then use the HTML and integrate with backend code (JSPs in AEM)
  4. Take assistance from Site Developers to fix integration issues mostly JS/CSS

This is nothing new; we have been doing this for years, but as the technologies have progressed I have started to see that the time we now need to take the HTMLs and put them into AEM is increasing significantly compared to when I used to do SpringMVC/Struts based applications. The integrations weren’t issues, but only tasks that the developers would do over the course of story implementation. Some of these did need SMEs to go and fix the finer details but thats about it.

Current State

A typical CMS based site is such where we push content out for consumption of our readers, the content could be simple (text, articles, etc.) or rich (images, videos, etc.). However, in the current world the nature of consumer based sites is changing rapidly and they are getting to a point when we need a dynamic element to the content as well. This dynamic element can be in several shapes and forms as in Omni-channel (responsive is one way of doing it), feeds from external systems (social networks or internal feeds on fan engagement sites like for Live Leaderboard). Barring a few areas where we have to introduce dynamic content, not a lot of content management websites need to be built on principles of a thick client. Good looking old school models of server taking care of presentation tier should work fine and it works very well as well. Those few dynamic areas can be handled very well using the jQuery (AJAX) sort of frameworks that bring together the two worlds and provide a final view to our consumers. So where does the problem lies or why dow e have to change anything if all this works so very well for us?

The Problem

This is not a technical problem that we have to solve for; as I said that technical integrations are well addressed in these products as they provide their templating languages and ways to integrate with them. If we look at Adobe they had a good way of making this happen (I am not going to talk about the patterns implemented), but their component architecture is actually nice that allow you write templates and what not very effectively. The problem we have on our hands is that of a process as I defined earlier. As we have moved to this product philosophy and as the technology landscape has expanded we have reached a point where I believe we have felt a need to have specializations in our domains. Once where there were no boundaries, now there are BOLD lines drawn as to who is going to be responsible of doing what job in a story implementation. A Site Developer will only work on HTML/JS/CSS (and there can be further divisions based on what skills we need), while a CQ developer will only be responsible writing Java/JSP, Dialogs (and again there are further divisions). While we have strived to achieve specializations in technology domains, we have not progressed on the process side of the things – we still want to do it that exact same way as we were doing it several years ago.

The Impact

As we move to achieve specialization on technology (which is a different matter); we are creating a big Gap between how the 2 worlds come together. The world of Agency where we have HTML/JS/CSS being coded doesn’t understand what the server side world is and vice versa. The marketing world starts conceptualizing the idea and the code even before the backend technology comes into play and by the time it reached the backend it’s already too late as the agency will do whatever they would find fit. This results in that problem – where we now need a mammoth effort to make the client work in a product like AEM. The developers to begin with have little clue what HTML is and how to fragment this into components that we will have to build as part of CMS; but the bigger problem is that HTML and JS to begin with is not conducive to such development. In cases/projects where we have the knowledge of backend systems available to us, we do little justice to that knowledge as the specialization pushed our site developers not to care about the server side technology. Some take care, but there are hardly anything that comes together as a symphony. As a result – we see increasing efforts to bring the two world together and making the projects more expensive than they ever need to be.

The Solution / The Dilemma

Recently there has been some flirtation around the idea of using thick clients within the mix of a CMS world. The idea of using technologies like Angular, Handlebar, Backbone, etc. to build a client which knows nothing about the backend solution. The content should be then exposed as “services” (CaaS) as XML or JSON feeds that then gets consumed by these clients to show the final presentation to the user. When I started to search about this, I was actually surprised that people have already tried this HippoCMS is one such example (at least 6 months old).

Other things that came up while reading on this topic is where they have explained the challenged as described above and a specification to help solve for it. Without diving into merits and demerits of the solution, the idea we have been flirting around is something very similar – of course the technologies being looked upon are different (I am looking at Angular to fit in this space). What I don’t know if CMS world is really the world to apply this pattern. This pattern will absolutely fix the problem if people are willing to unlearn and learn about this new methodology, but the real challenge I have with is CMS and thick clients and if that technology is really something that should be used.


At this point – I don’t know if that is the answer, but it seems a wrong variable is being used to solve for the problem we have on hand.

Caching Architecture (Adobe AEM)

Cache (as defined by Wikipedia) is a component that transparently stores data such that future requests for data can be faster. I hereby presume that you understand cache as a component and any architectural patterns around caching and thereby with this presumption I will not go into depth of caching in this article. This article will cover some of the very basics of fundamentals of caching (wherever relevant) and then will take a deep dive into the point-of-view on the caching architecture with respect to a Content Management Plan in context to Adobe’s AEM implementation.

  Read More

Need to re-learn JavaScript

I was that developer. Over course of last few years I have noticed that developers have moved away from doing anything in JS and they look upto Site Developers to write client side code – JS, AJAX and what not. I just told my team that we need to go back to the place where once we used to program for browsers as well and it was so much fun back then. And then when I see node.js I find myself believing more and more that JavaScript is a programming language all the developers need to have no matter what. We dont need everyone to be experts but the usual dependency on a Site Developer to write all the JS can’t just work. This good article should serve some motivation.

Learning MongoDB // The CRUD

If you have not been following my blog for the training i am undertaking on mongodb, it is time you do so now because this is getting way too much interesting. Last night I took the Week 2 course and it was a slam dunk. I opened tbe course at around midnight and I wanted to finish the same yesterday as I wanted to submit my homework and get on the scorecard. It took me around 3 hours and honestly I skipped the last 5 Lectures. I realized that a) I should not be doing this past midnight as it gets real hard to focus with you wanting to sleep and b) should not have a movie running on the side. Anyways, I managed to finish the course in just under 3 hours and it felt good.


What I learnt?

Week 2 is about CRUD which in mongo terms is Insert, Find, Update, Remove (they didn’t have a cool acronym like CRUD). First half of the course was heavy in mongoshell and work with various commands, The session was high enough to share all possible commands especially when you have query with various operators like $gt, $lt, $and, $or and several others. The 2nd half of the course moved over into Java and we were asked to do similar operations using the Java driver for Mongo // there were no frameworks like Spring and it was all native Mongo. To list down all that I learnt:

a) insert()

b) findOne() and find()

c) update() (including multi updates)

d) remove()

e) Query using $gt, $lt, regex, $exists, $type, $or, $and, arrays, $in, $all, $set, $unset, $push, $pop, $pushAll, $pullAll

f) Cursors and find, sort, limit (paginate)

When I tell you that this week will come as a big surprise and it is heavy considered to what Week 1 is // so be prepared to focus.


My Takeaways

This week we did not get into modelling where I really want to go to, but this week was insightful as it allowed me to understand the power of the mongo’s schemaless design principle. There are ways in mongo that we can get data and only get what we need, we can even set what we need to. So the need to work on various joins and managing schemas as the domains grows is really pretty easy in mongo. We have been looking to use mongo in a setting to manage data for various tenants and the profile itself will vary not just from a tenant to tenant, but also between tenants as they might work with multiple profiles and also at times just may need to update profiles. Last night, provided me a good understanding of how we can meet all those uses and not dealing with a lot of Java boilerplate code aka delegating the stuff/tasks over to mongo and let it do it for us.

The other big takeaway was that working with mongo’s document structure is much more simpler to what RDBMS is. While was I working in mongoshell where I was using JavaScirpt to write code and later when i moved over to Java classes it was fairly simple. It was all about knowing 3 classes (if not 2) and you are done. Working with Maps (or Dictionaries) is pretty simple and using that knowledge in any language cant really be difficult. I was able to go back and learn JavaSciprt all over again and code a logic pretty quick.


Key Takeaways

a) The flexibility that MongoDB brings with itself is pure awesomeness and amazing // it has the capability to get anyone started in matter of minutes

b) The operations and way they work on the underlying documents is pretty slick and present several opportunities – must try

Learning MongoDB // It sinks in

Today I finished the 1st week of the course and as they promised it was didnt take very long – under 5 hours with what they wanted me to do (listen to lectures, answer quizzes and submit myhomework) + some other stuff I wanted to try out.


About the course so far

A true 101 courseware, and this one was even more basic – it was all about setting up the environments, running some basic commands and the quizzes were almost novice assuming that people won’t know any tools. They asked us to execute commands on a terminal window and put the output in a textbox. Barring the aspect that I am being treated like a school kid who seems to have got his hands on a computer for the first time I liked it overall.


What I learnt about MongoDB?

a) how to install it

b) how to interface with the database // add records, query records (very simple like “select *”)

c) its operating model // schemaless nature of it

d) comparison of a relational model and how i could possibly fit it into a no-schema document store like MongoDB


My Takeaways

My primary objective has been to build a POV that allows me to decide when to use a Mongo instead of a relational database. I definitely took a step in that direction. My current viewpoint is that Mongo will force us to think of data structure more close to what our Class Object Models are. We have strived for some time to put in ORM tools like iBatis and Hibernate to abstract the developers from the underlying data-models. I am not sure why we never wanted out developers to work with data-models, but Mongo will definitely take the folks away from relational models and allow them to think of objects as store them as is.


Other key aspect is if I was to architect an application (a simple layer cake design), it would be interesting for me to see where do i draw the line between the conventional Services -> DataAccessor layer cake design. I looks at Spring’s implementation of the Mongo and it very different than what Spring has (or had if they have changed) for say hibernate and MySQL. In past Spring had a clear distinction of what a DataAccessor is and what a service is and the demarcations of Transactions etc. Spring’s implementation of Mongo is pretty simple and almost unilateral when it comes to layers that define – so much so that the lines between the Service and Access are blurred and I struggle to define if i even need it.


As for modelling (which will remain key), unlike in RDBMS where we have a normal form and rules in place and where a 1st normal form was almost bad, and 4th normal form is where we wanted to go, Mongo will leave the modelling (of whatever schema it has) to the logical thinking of the application. It is my understanding that the balance of modelling the “collections” will rely with Application Architects (not with DBAs). It might be too early to say, but it seems that we need to design for “fast reads” and we aim to put whatever data we can in a dictionary that can hold all of it. The interesting aspect is when you need to see the data from another dimension. Here is an example (building upon what was presented to me course). The course is asking me to create a web application that is a blog basically. Keeping other stuff aside, we have “Post” and every post has “Tags”. As a requirement we need to show all tags for a post next to it and also for a tag as we can show all posts.


We all know a many-to-many relation in a relational database will do this trick for us, but in a document database we put (or so was I asked) Tags as a array in the Post Object itself. The power of it was now i dont need to join the two tables and the retrieval is super fast but now when I need to meet my need for getting all Posts for a given tag or if i need to change the name of the Tag globally – what was supposed to be a simple label change in RDBMS now becomes a marathon. As for the later case, I was arguing how many times does that really happen (aka change tag labels) but the first one is pretty important for my blog requirements. The course didn’t answer that one. They explained the benefit (no join hence fast fetch) and left it at it.

I start off the Week 2 tomorrow and I will want and see if they actually explain some of these over the course of the classroom else I will go about digging? Would any of you know?


Key Takeaways

a) Pretty quick to get going off the ground. I’d actually use this one instead of a mysql if i am trying a few things and need to use quick-start backend. (where in production – i don’t know yet)

b) If you are taking this course be mindful of not making judgements on the technology or how you’d want to use it especially model your schemas (or shall I say documents). Wait for the now or go and read more.

Learning MongoDB // Getting Started

My status on FB today was “Back to School”. After several years or running with the philosophy of “I don’t like sitting in classrooms” to learn technology, today I finally went back to an online classroom and it was to learn development with MongoDB.


The objective I started off was pretty simple – “I have worked with RDBMS several times in my life and now I want to know what does a “Document Store” like MongoDB brings to the table”. I hear everyone that we should move over to a NoSQL but I am not convinced in my head what problems it will really solve for me – scale, no-schema or something else. More importantly, I want to form a POV as to what I should be using a RBMS vs. a “Document Store” like a MongoDB.


I had registered for this “free” online course here – (via a cross post on VOX that I cant seem to find). This course started on July 29th, but today is my first day for the class (and this is what I loved – i can’t work on my own terms and schedule). The week 1 course has about a dozen or so online lectures and they range from introducing Mongo, differences from relational, installations and some commands.


My first experience today after spending 1 hour on the course:

  1. Pretty easy to install and get going // I use a Mac and hence some manual steps but I had the mongo db and shell up and running in about 10 minutes (download time included)
  2. Working with Shell is pretty simple // similar to working on a SQL shell for MySQL. No need to create any schema – just fire away and if you know JSON (which i did) i am entering records in under a minute
  3. Fetching records was even simpler – another couple of commands and I was rolling


Tomorrow is another day // so keep tuned.

A POV on Slice (a Adobe CQ Framework)

I have worked in Adobe CQ for last 18 months now; a large part of my not being active on this blog is because i was busy but the most important reason was that I wanted to understand the underlying technology before I go about writing it – best practices, what works and what does not and what not. Last I did about this was back in 2008 when I wrote about another Adobe product Flex 3.x.

Adobe CQ (Sling) background

If you don’t know what Adobe CQ is, you should go and read the documentation here and you will learn something new about a CMS system. I am looking at this framework today is Slice. I start by analysing the problems getting solved and I read the section they explain Why use it? I agree to all of these pain points in the Adobe CQ development system (actually let me clarify – i dont think any of this is Adobe CQ issue, this is a programming construct I see being used in underlying frameworks Sling). Refer to this document and you will notice that the problem arises where a lot of code sits naturally in a JSP (scripting language). This does not work well when you got to build enterprise applications and platforms on top of a product – the cost of managing the code is too high in the long run. The only way out is to fall back on enterprise standards (or best practices) of writing java and jee applications.

First of all, I have to appreciate that fact that people are thinking of solving for problems that we face during sling development. If you see the documentation, the model presented there is not ideal for a platform/enterprise version. I am not saying that is recommended from Adobe, and I am sure it has it’s place to be used. But, good to see others trying to solve for cases (Someday soon I will write more about the issues in current programming model).


What problems do we really need to solve for?

So my assuming you understand the construct and do understand the architectural principles of Sling (CQ) development, let me dive into my POV as if should I be using this framework or not. In CQ (Sling) I have been dealing with the following use cases.  All the following use cases are about one thing i.e. reading data from the underlying data-store. Persisting the the data in CQ and how to render that (using JSP or other scripting language) is out of scope of this discussion primarily because this framework does not solves for those. Back to use cases, to list them (without a lot of detail)

  • Read data stored on the same node where the component lives (stored via Dialog box)
  • Read data stored in another node
  • Aggregate data from several other nodes
  • Search data using default (for now) APIs built into CQ (sling)


Does Slice addresses these issues

Use case 1: Read data stored on the same node where the component lives (stored via Dialog box)

If I read the basics of this framework here, use for the very first first use case does not make any sense; that is something we get from the framework OOTB. Sling framework puts all of the properties from the current node into a HashMap and exposes the same to us in JSP simply. I like to call this Content-in-Content-out approach of managing contents. So, why use a model in a JSP for something that already exists!! My bet is that this is not the case for which the framework was written and this was just the starting point like a building block to offer something more.


Use case 2: Read data stored in another node

Where things get interesting is the 2nd use case (read this advanced capability here). However, what this does is basically allows me to instead of making a call to an underlying JCR API to read the node, i can now refer to my Model to load the same stuff via a POJO. This makes things interesting, but do not interest me too much because of one primary reason this will make me model all of my content in to a model, but not string enough to reject this fully.

In all of my enterprise world we have been taught to work with DAOs and VOs basically define a model of our objects and then refer to them everywhere. This is a powerful construct that we all have been using it for ages and i wont even debate why we should not use it. But, in case of CQ, where the underlying framework converts the entire node into a HashMap. This programming model does not gives you a structure at times, but if the models are not complex and nested it is actually a pretty powerful way of dealing with data. CQ default is a Page based CMS and converting that to a content-based CMS kind of makes things tricky anyways.

Most of the components will invariably represent either a content-type or meta-data to fetch content can’t be modelled into POJO, there are so many of those (and lets not forget that we get all of that free or cost in properties). This will be POJO hell in no time. This leaves us with the use case of industry domains like in sports we might have an Athlete or a Venue that you can model into a POJO. You got to ask yourself the question, if you want to manage via POJOs or just use HashMap which makes the programming flexible. Work with properties – simple key value pairs or keep managing POJOs. I think at some point, I might use this framework for modelling the underlying content-structures into POJOs. However, i’d like to following certain principles in mind that i dont understand yet with Slice as to how they play out. Like we are injecting PageManagers which are core Day CQ APIs. What worries me is that if I have so much of underlying APIs strapped into my code, if anything was to change, I will end up re-writing so much of this. Maybe for a project a few months (maybe a quarter) I will give this a shot – saves me time to manage content. But, if i was writing a platform that would like several years and might actually see a major CQ upgrade, I really will think twice (maybe thrice) to use this framework.


Use case 3 and 4

The last two use cases are either built on top of this a Model and search is not catered by this use case. So basically if you’d ask me this framework provides a small coverage of a use cases and with the increased debt of managing another framework like Guice etc, if something was to change in CQ I might just be taking a lot of risk in refactoring later. Something that doesn’t makes me feel very comfortable.



Put simply in plain english

  1. For anything that is enterprise or platform(ish), I won’t go for this. The technical overhead that this framework and what it solves for is not worth the ROI of managing another framework
  2. If I’d come across strongly types content-types, I would consider to use this. The ultimate tie breaker would be how how many of the content-types need to be displayed “as-is”. If all i had to show are several compositions of data (search or what have you) maybe not. But, definitely something to consider
  3. I checked the roadmap, and I see very little activity – not something that will make me commit to using this long term

Not a lot of rope to hang off of; very little use cases to use and not exciting me much.

Modeling Content in CQ54

CQ54 is not a a typical RDBMS where I can model a set of relationships in table and soon a pretty picture starts to present itself. CQ54 stores everything in its content repository (CRX) as nodes which follow an entirely different data model i.e. Hierarchical Structure. My experience with hierarchical databases has been with day to day applications like MS Windows File Explorer, outlook Folder structure and in application development directory services like LDAP. So, I am going to start off by listing down what I understand of hierarchical database before I ponder down to my set of questions.

A hierarchical database model means that my data is arranged into a structure that is similar to tree (organization chart). This resides on the premise of a 1:N relationship where a child can have only one parent, where in a parent can have multiple child records. It has characteristics that differ a lot from a relational database. To list a few:

1. Every node is a record
2. Data is stored as properties on the node
3. Every node can be of a different data type – a hierarchical model does not mandate to have same record types under a same parent
4. A child node can be a child to one and only one parent

Hierarchical databases have their advantages:

Performance: Navigating records in a hierarchical model is faster because the references are basically pointers to the nodes/records directly. I don’t have to search in a index or a set of indexes. This however, is true in a case only when my data model does not have a lot of references. If i am working off with a content-model that includes multi-level references, performance will head south
Easy to understand: It is a simple hierarchy; and it represents something that is “non-technical”. It naturally represents what exists.
And Hierarchical databases have their limitations:

Unable to draw complex relationships between various child nodes – Given the premise that a child node will have only one parent, they are identified only by their parents. We have the capability like XPath to navigate directly to a node, which may be faster. If we do not know the exact path, we will have to navigate the tree (up to a parent, maybe the root) and then down to all nodes before we find what we are looking for. Some questions that I am asking myself:
1. What qualifies as a reference for an object?
2. Should speed at which the data can be fetched a driver to defining a reference?
3. What are the best practices that I should be aware off, when I am modeling my domain?
4. When do i decide I need a network model instead of hierarchical model?
Difficult of maintain – hierarchical models also mean that I do not have a command like ALTER TABLE. This essentially means then if I later decide to add another property to a specific node type I will have to write code to update all the nodes
1. Is there a way where I can update a node-type thus updating all the objects which are of that node type?
2. Is there a way to avoid such situations (apart from saying that lets get it right in Release 1.0 and pray to God client will not ask for a change request :))
Lack of Flexibility – In this article, Scott Ambler quotes – “Hierarchical databases fell out of favor with the advent of relational databases due to their lack of flexibility because it wouldn’t easily support data access outside the original design of the data structure. For example, in the customer-order schema you could only access an order through a customer, you couldn’t easily find all the orders that included the sale of a widget because the schema isn’t designed to all that.”. This is a typical case of where reporting is a must and it might be in many systems.
1. Are there other scenarios?
With all the context set of Hierarchical, it is now important we look at CQ54’s content repository – CRX. While CRX is a hierarchical repository it should not be confused with a hierarchical database. CRX provides us with JCR node types which allow us to force structure. We also have the capability of creating custom nodes, but should do it with care. The principle is not to go overboard with structure.

Question remains – “how do I manage content in CQ54”. I do not have a “go-to” answer, but what I have described below is how I am going to think when I start the process.

Content modeling: Look at the requirements i.e. wireframes, creative design assets and identify various content types, structures and relationships between content types. We can take the object-oriented approach and define everything as an object or keep similar content types together. There are several things that should be considered when taking one approach over the other:

What is the business process for crating an object type. Do the content types follow same workflow?
1. Steps that are required to activate a content. An article, a blog, a discussion forum entry may have the same process flow of an author and a reviewer then there is a case of having a single abstract content type
2. However, if an article needs a legal review and can be used in several other business process than just a simple article we may want to bring article out as its own content type
1. What kind properties do they share
2. Modeling content for an education system where we have content types like a college or a school where we see a lot of similarities there is a case we can build on creating an abstract content
How does the content author wants to look at the content
1. If we have a set of users who want to manage their content as structured content like books, movies etc we should look to provide those content types very specifically
2. In another scenario if we have authors who do not worry a lot about specific objects i.e. Page-centric content creation then we can decide to club content types together


Managing Relationships: In CQ, given it has a hierarchy based data storage model which complies with JCR specifications, we do not have a way to create strict rule-based relationships. We can create relationships using one of the following ways:

Path based references: We can do this by creating properties on objects that hold a “path” or a “list of path” to which the content has relationships with
1. They are semantic
2. Not bound to an “obscure IDs”
3. Do not enforce integrity constraints which may create troubles in extensibility later
4. Being REST-ful they allow us to navigate directly to the node, thus making navigations very quick
5. Being REST-ful, they allow author to visualize their content relationships well thus providing them a business view of the content
Taxonomy based references: CQ uses tags to represent a taxonomy. However,  we can not extend tags to hold various profile information. So, you will need to have a mapping system that maps a tag to a content in CRX
1. Taxonomy is the foundation on which the IA stands. Taxonomy allows us a classification system and how the users will view the content on the site.
2. Allow us to clearly identify where in the system the content type resides
3. Is a conceptual framework allowing customers and their customer to locate what they need easily
4. It is hierarchical
Relational Database
1. Can be used in case we reach a point where relationships are too complex
2. Transactional Data should be kept out of CMS and placed in a relational database (or similar)
3. If we do not have to manage the lifecycle of the content
4. Please note that this  will make architecture complex, but if this is needed that it is

The Language of Risk « The IT Risk Manager | Kapil | Scratch Pad | Java | Architecture | Design | Open Source

The language is important because it helps you think about the problem in the right way.

This statement stuck a chord reminding me of an instance not so long ago . She asked me – “why did my project had just a couple of risks?” . She went on to probe us (my PM and me) to understand if we were not thinking of the risk.

At that time, I did not answer her in the way the Author summarizes it. We presumed that we had a functional scope documented and hence the risk of not being able to deliver what was required did not exist.

via The Language of Risk « The IT Risk Manager | Kapil | Scratch Pad | Java | Architecture | Design | Open Source.

Write Through Cache

I was a young budding developers when I was first introduced to the concept of Cache. My Senior Architect then told me

Cache is a component that will magically store data so that future requests of that same data will not be to the Remote Server, and hence it will improve the performance of our application significantly faster

We were working on a website which was to integrate with an existing application through the use of APIs and for purposes of closer integration we had decided to store the data in form of XML as artifacts in this tool as tracker items – no RDBMS. It was like running two applications joint at the hip.

I had gathered enormous experience working with SourceForge platform APIs as I had integrated them with Ms-Excel and now we were going to build an entire application on ALM space using the same APIs. Our biggest challenge was going to be performance because of the use of APIs against a remote server sitting in a different geography. And Cache was going to be instrumental in helping us solve that problem.

We had just decided to implement Write-through cache that helped us build a system which was significantly faster that anyone could have thought about. And since then this is one pattern that I have come to use (if possible) whenever I am working with diversified systems. Surprisingly, the percentage of people who have used cache have never heard of this pattern of cache implementation when the underlying issues with the systems can be solved using this pattern (I am still mystified as to why not?).

Before we dig deep, I want to run through some definitions that we are going to use during the course of this article:

  • Cache hit refers to a request to the data and finding it in the cache
  • Cache miss refers to a request to the data and not finding it in the cache
  • Dirty refers to a cached data if it has not the same as the original data
  • Lazy refers to an action if it not performed real time, but only when it is required

In its simplest form, a cache implementation is going to something similar to the image below.

Cache - Workflow

Cache – Workflow

As you can already see that this is just one part of the cache implementation and implementing this workflow alone would mean that once I have a data in the cache it would only always fetch the data from the local cache location and go back to the original data source only if the cache data is Dirty. And there are several ways we have to mark the data as dirty – it can be an action we configure in our system like – “If we are update records in the data source, we try to find the key and mark it dirty”. Another way is to decide a time after which the the cache should be expired automatically.

While, this approach is simple enough, this does presents a unique “problem” (and it may not be a problem for everyone). Lets revisit the reason for which we decided to implement cache.

 Cache is a component that will magically store data so that future requests of that same data will not be to the Remote Server, and hence it will improve the performance of our application significantly faster.

This implies that “Caching is a mechanism that is faster when compared to our data source when it comes to data loading”. I have observed cache implementations against RDBMS sitting next to a Application Server, which means that loading data from the RDBMS is still faster and hence there is really no need to improvise on the cache flow as defined earlier.

In our case, we were dealing with an external system from where we fetched XML over HTTPS and then converted the XML to an object. This entire process was time consuming – 3 seconds for one object and there was nothing we could do about reducing the transportation time. It also meant that the classic workflow for us would not work either, especially if a user would update a specific record, and if the cache would be marked dirty, the next request would mean a significant delay time.

We improvised and it was then we used the write-through cache logic that allowed us to manage the data in the cache in real-time with the data-source. The workflow was changed to the one below:

As simple as it may look it was not so. Lets see first what we did. We added a hook to the code which was required to save the data in the DataSource to do two things:

  1. Find the cache entry and mark it dirty if it was a hit and;
  2. Update the cache after a successful update to the data store

This allowed us to keep the cache in sync with the DataSource and hence not requiring to spend additional time to load the data back again.

But, as I feel that every solution will bring its challenges, this one had as well especially when we decided to scale and move over to a cluster of application server. This meant that a local cache would simply not work because it was meant that an update to the DataSource meant that the cache on other application servers was out dated and users would not get the most latest data making it impossible to work. We did use version to records to manage the concurrency checks, and not keeping the cache in sync meant that other users will see their updates fail because of that very fail-safe. Eventually, we had to find a cache that can be scaled in a cluster which only made things more complicated.

A pattern that I learn in my development adolescence, this had proved to be a powerful technique to build solutions that would work fine in a given scenario.