Caching Architecture (Adobe AEM) – Part 1


Cache (as defined by Wikipedia) is a component that transparently stores data such that future requests for data can be faster. I hereby presume that you understand cache as a component and any architectural patterns around caching and thereby with this presumption I will not go into depth of caching in this article. This article will cover some of the very basics of fundamentals of caching (wherever relevant) and then will take a deep dive into the point-of-view on the caching architecture with respect a Content Management Plan in context to Adobe’s AEM implementation.

 

Problem Statement

Principles for high performance and high availability don’t change but for conversation sakes lets assume we have a website where we have to meet the following needs.

  • 1 Billion hits on a weekend (hit is defined by a call to the resource and includes static resources like CSS, JS, Images, etc.)
  • 700 million hits in a day
  • 7.2 million page views in a day
  • 2.2 million page views in an hour
  • 80K hits in a second
  • 40K page views in a minute
  • 612 page views in a second
  • 24×7 site availability
  • 99.99% uptime
  • Content availability to consumers in under 5 minutes from the time editors publish content

While the data looks steep the use case is not uncommon one. In current world where everyone is moving to devices, and digital there will be cases when brands are running campaigns. When those campaigns are running there will be needs for support such steep loads. These loads don’t stay for long but when then come they come fast, they come thick and we will have to support them.

For the record, this is not some random theory I am writing, I have had the opportunity of being on a project (I cant name) where we supported similar number.

The use case I picked here is of a Digital Media Platform where we have a large portion of the content is static, but the principles I am going to talk here will apply to any other platform or application.

Continue reading

Need to re-learn JavaScript


I was that developer. Over course of last few years I have noticed that developers have moved away from doing anything in JS and they look upto Site Developers to write client side code – JS, AJAX and what not. I just told my team that we need to go back to the place where once we used to program for browsers as well and it was so much fun back then. And then when I see node.js I find myself believing more and more that JavaScript is a programming language all the developers need to have no matter what. We dont need everyone to be experts but the usual dependency on a Site Developer to write all the JS can’t just work. This good article should serve some motivation.

Learning MongoDB // The CRUD


If you have not been following my blog for the training i am undertaking on mongodb, it is time you do so now because this is getting way too much interesting. Last night I took the Week 2 course and it was a slam dunk. I opened tbe course at around midnight and I wanted to finish the same yesterday as I wanted to submit my homework and get on the scorecard. It took me around 3 hours and honestly I skipped the last 5 Lectures. I realized that a) I should not be doing this past midnight as it gets real hard to focus with you wanting to sleep and b) should not have a movie running on the side. Anyways, I managed to finish the course in just under 3 hours and it felt good.

 

What I learnt?

Week 2 is about CRUD which in mongo terms is Insert, Find, Update, Remove (they didn’t have a cool acronym like CRUD). First half of the course was heavy in mongoshell and work with various commands, The session was high enough to share all possible commands especially when you have query with various operators like $gt, $lt, $and, $or and several others. The 2nd half of the course moved over into Java and we were asked to do similar operations using the Java driver for Mongo // there were no frameworks like Spring and it was all native Mongo. To list down all that I learnt:

a) insert()

b) findOne() and find()

c) update() (including multi updates)

d) remove()

e) Query using $gt, $lt, regex, $exists, $type, $or, $and, arrays, $in, $all, $set, $unset, $push, $pop, $pushAll, $pullAll

f) Cursors and find, sort, limit (paginate)

When I tell you that this week will come as a big surprise and it is heavy considered to what Week 1 is // so be prepared to focus.

 

My Takeaways

This week we did not get into modelling where I really want to go to, but this week was insightful as it allowed me to understand the power of the mongo’s schemaless design principle. There are ways in mongo that we can get data and only get what we need, we can even set what we need to. So the need to work on various joins and managing schemas as the domains grows is really pretty easy in mongo. We have been looking to use mongo in a setting to manage data for various tenants and the profile itself will vary not just from a tenant to tenant, but also between tenants as they might work with multiple profiles and also at times just may need to update profiles. Last night, provided me a good understanding of how we can meet all those uses and not dealing with a lot of Java boilerplate code aka delegating the stuff/tasks over to mongo and let it do it for us.

The other big takeaway was that working with mongo’s document structure is much more simpler to what RDBMS is. While was I working in mongoshell where I was using JavaScirpt to write code and later when i moved over to Java classes it was fairly simple. It was all about knowing 3 classes (if not 2) and you are done. Working with Maps (or Dictionaries) is pretty simple and using that knowledge in any language cant really be difficult. I was able to go back and learn JavaSciprt all over again and code a logic pretty quick.

 

Key Takeaways

a) The flexibility that MongoDB brings with itself is pure awesomeness and amazing // it has the capability to get anyone started in matter of minutes

b) The operations and way they work on the underlying documents is pretty slick and present several opportunities – must try

Learning MongoDB // It sinks in


Today I finished the 1st week of the course and as they promised it was didnt take very long – under 5 hours with what they wanted me to do (listen to lectures, answer quizzes and submit myhomework) + some other stuff I wanted to try out.

 

About the course so far

A true 101 courseware, and this one was even more basic – it was all about setting up the environments, running some basic commands and the quizzes were almost novice assuming that people won’t know any tools. They asked us to execute commands on a terminal window and put the output in a textbox. Barring the aspect that I am being treated like a school kid who seems to have got his hands on a computer for the first time I liked it overall.

 

What I learnt about MongoDB?

a) how to install it

b) how to interface with the database // add records, query records (very simple like “select *”)

c) its operating model // schemaless nature of it

d) comparison of a relational model and how i could possibly fit it into a no-schema document store like MongoDB

 

My Takeaways

My primary objective has been to build a POV that allows me to decide when to use a Mongo instead of a relational database. I definitely took a step in that direction. My current viewpoint is that Mongo will force us to think of data structure more close to what our Class Object Models are. We have strived for some time to put in ORM tools like iBatis and Hibernate to abstract the developers from the underlying data-models. I am not sure why we never wanted out developers to work with data-models, but Mongo will definitely take the folks away from relational models and allow them to think of objects as store them as is.

 

Other key aspect is if I was to architect an application (a simple layer cake design), it would be interesting for me to see where do i draw the line between the conventional Services -> DataAccessor layer cake design. I looks at Spring’s implementation of the Mongo and it very different than what Spring has (or had if they have changed) for say hibernate and MySQL. In past Spring had a clear distinction of what a DataAccessor is and what a service is and the demarcations of Transactions etc. Spring’s implementation of Mongo is pretty simple and almost unilateral when it comes to layers that define – so much so that the lines between the Service and Access are blurred and I struggle to define if i even need it.

 

As for modelling (which will remain key), unlike in RDBMS where we have a normal form and rules in place and where a 1st normal form was almost bad, and 4th normal form is where we wanted to go, Mongo will leave the modelling (of whatever schema it has) to the logical thinking of the application. It is my understanding that the balance of modelling the “collections” will rely with Application Architects (not with DBAs). It might be too early to say, but it seems that we need to design for “fast reads” and we aim to put whatever data we can in a dictionary that can hold all of it. The interesting aspect is when you need to see the data from another dimension. Here is an example (building upon what was presented to me course). The course is asking me to create a web application that is a blog basically. Keeping other stuff aside, we have “Post” and every post has “Tags”. As a requirement we need to show all tags for a post next to it and also for a tag as we can show all posts.

 

We all know a many-to-many relation in a relational database will do this trick for us, but in a document database we put (or so was I asked) Tags as a array in the Post Object itself. The power of it was now i dont need to join the two tables and the retrieval is super fast but now when I need to meet my need for getting all Posts for a given tag or if i need to change the name of the Tag globally – what was supposed to be a simple label change in RDBMS now becomes a marathon. As for the later case, I was arguing how many times does that really happen (aka change tag labels) but the first one is pretty important for my blog requirements. The course didn’t answer that one. They explained the benefit (no join hence fast fetch) and left it at it.

I start off the Week 2 tomorrow and I will want and see if they actually explain some of these over the course of the classroom else I will go about digging? Would any of you know?

 

Key Takeaways

a) Pretty quick to get going off the ground. I’d actually use this one instead of a mysql if i am trying a few things and need to use quick-start backend. (where in production – i don’t know yet)

b) If you are taking this course be mindful of not making judgements on the technology or how you’d want to use it especially model your schemas (or shall I say documents). Wait for the now or go and read more.

Learning MongoDB // Getting Started


My status on FB today was “Back to School”. After several years or running with the philosophy of “I don’t like sitting in classrooms” to learn technology, today I finally went back to an online classroom and it was to learn development with MongoDB.

 

The objective I started off was pretty simple – “I have worked with RDBMS several times in my life and now I want to know what does a “Document Store” like MongoDB brings to the table”. I hear everyone that we should move over to a NoSQL but I am not convinced in my head what problems it will really solve for me – scale, no-schema or something else. More importantly, I want to form a POV as to what I should be using a RBMS vs. a “Document Store” like a MongoDB.

 

I had registered for this “free” online course here – https://education.10gen.com/courses/10gen/ (via a cross post on VOX that I cant seem to find). This course started on July 29th, but today is my first day for the class (and this is what I loved – i can’t work on my own terms and schedule). The week 1 course has about a dozen or so online lectures and they range from introducing Mongo, differences from relational, installations and some commands.

 

My first experience today after spending 1 hour on the course:

  1. Pretty easy to install and get going // I use a Mac and hence some manual steps but I had the mongo db and shell up and running in about 10 minutes (download time included)
  2. Working with Shell is pretty simple // similar to working on a SQL shell for MySQL. No need to create any schema – just fire away and if you know JSON (which i did) i am entering records in under a minute
  3. Fetching records was even simpler – another couple of commands and I was rolling

 

Tomorrow is another day // so keep tuned.

A POV on Slice (a Adobe CQ Framework)


I have worked in Adobe CQ for last 18 months now; a large part of my not being active on this blog is because i was busy but the most important reason was that I wanted to understand the underlying technology before I go about writing it – best practices, what works and what does not and what not. Last I did about this was back in 2008 when I wrote about another Adobe product Flex 3.x.

Adobe CQ (Sling) background

If you don’t know what Adobe CQ is, you should go and read the documentation here and you will learn something new about a CMS system. I am looking at this framework today is Slice. I start by analysing the problems getting solved and I read the section they explain Why use it? I agree to all of these pain points in the Adobe CQ development system (actually let me clarify – i dont think any of this is Adobe CQ issue, this is a programming construct I see being used in underlying frameworks Sling). Refer to this document and you will notice that the problem arises where a lot of code sits naturally in a JSP (scripting language). This does not work well when you got to build enterprise applications and platforms on top of a product – the cost of managing the code is too high in the long run. The only way out is to fall back on enterprise standards (or best practices) of writing java and jee applications.

First of all, I have to appreciate that fact that people are thinking of solving for problems that we face during sling development. If you see the documentation, the model presented there is not ideal for a platform/enterprise version. I am not saying that is recommended from Adobe, and I am sure it has it’s place to be used. But, good to see others trying to solve for cases (Someday soon I will write more about the issues in current programming model).

 

What problems do we really need to solve for?

So my assuming you understand the construct and do understand the architectural principles of Sling (CQ) development, let me dive into my POV as if should I be using this framework or not. In CQ (Sling) I have been dealing with the following use cases.  All the following use cases are about one thing i.e. reading data from the underlying data-store. Persisting the the data in CQ and how to render that (using JSP or other scripting language) is out of scope of this discussion primarily because this framework does not solves for those. Back to use cases, to list them (without a lot of detail)

  • Read data stored on the same node where the component lives (stored via Dialog box)
  • Read data stored in another node
  • Aggregate data from several other nodes
  • Search data using default (for now) APIs built into CQ (sling)

 

Does Slice addresses these issues

Use case 1: Read data stored on the same node where the component lives (stored via Dialog box)

If I read the basics of this framework here, use for the very first first use case does not make any sense; that is something we get from the framework OOTB. Sling framework puts all of the properties from the current node into a HashMap and exposes the same to us in JSP simply. I like to call this Content-in-Content-out approach of managing contents. So, why use a model in a JSP for something that already exists!! My bet is that this is not the case for which the framework was written and this was just the starting point like a building block to offer something more.

 

Use case 2: Read data stored in another node

Where things get interesting is the 2nd use case (read this advanced capability here). However, what this does is basically allows me to instead of making a call to an underlying JCR API to read the node, i can now refer to my Model to load the same stuff via a POJO. This makes things interesting, but do not interest me too much because of one primary reason this will make me model all of my content in to a model, but not string enough to reject this fully.

In all of my enterprise world we have been taught to work with DAOs and VOs basically define a model of our objects and then refer to them everywhere. This is a powerful construct that we all have been using it for ages and i wont even debate why we should not use it. But, in case of CQ, where the underlying framework converts the entire node into a HashMap. This programming model does not gives you a structure at times, but if the models are not complex and nested it is actually a pretty powerful way of dealing with data. CQ default is a Page based CMS and converting that to a content-based CMS kind of makes things tricky anyways.

Most of the components will invariably represent either a content-type or meta-data to fetch content can’t be modelled into POJO, there are so many of those (and lets not forget that we get all of that free or cost in properties). This will be POJO hell in no time. This leaves us with the use case of industry domains like in sports we might have an Athlete or a Venue that you can model into a POJO. You got to ask yourself the question, if you want to manage via POJOs or just use HashMap which makes the programming flexible. Work with properties – simple key value pairs or keep managing POJOs. I think at some point, I might use this framework for modelling the underlying content-structures into POJOs. However, i’d like to following certain principles in mind that i dont understand yet with Slice as to how they play out. Like we are injecting PageManagers which are core Day CQ APIs. What worries me is that if I have so much of underlying APIs strapped into my code, if anything was to change, I will end up re-writing so much of this. Maybe for a project a few months (maybe a quarter) I will give this a shot – saves me time to manage content. But, if i was writing a platform that would like several years and might actually see a major CQ upgrade, I really will think twice (maybe thrice) to use this framework.

 

Use case 3 and 4

The last two use cases are either built on top of this a Model and search is not catered by this use case. So basically if you’d ask me this framework provides a small coverage of a use cases and with the increased debt of managing another framework like Guice etc, if something was to change in CQ I might just be taking a lot of risk in refactoring later. Something that doesn’t makes me feel very comfortable.

 

Verdict

Put simply in plain english

  1. For anything that is enterprise or platform(ish), I won’t go for this. The technical overhead that this framework and what it solves for is not worth the ROI of managing another framework
  2. If I’d come across strongly types content-types, I would consider to use this. The ultimate tie breaker would be how how many of the content-types need to be displayed “as-is”. If all i had to show are several compositions of data (search or what have you) maybe not. But, definitely something to consider
  3. I checked the roadmap, and I see very little activity – not something that will make me commit to using this long term

Not a lot of rope to hang off of; very little use cases to use and not exciting me much.

Modeling Content in CQ54


CQ54 is not a a typical RDBMS where I can model a set of relationships in table and soon a pretty picture starts to present itself. CQ54 stores everything in its content repository (CRX) as nodes which follow an entirely different data model i.e. Hierarchical Structure. My experience with hierarchical databases has been with day to day applications like MS Windows File Explorer, outlook Folder structure and in application development directory services like LDAP. So, I am going to start off by listing down what I understand of hierarchical database before I ponder down to my set of questions.

A hierarchical database model means that my data is arranged into a structure that is similar to tree (organization chart). This resides on the premise of a 1:N relationship where a child can have only one parent, where in a parent can have multiple child records. It has characteristics that differ a lot from a relational database. To list a few:

1. Every node is a record
2. Data is stored as properties on the node
3. Every node can be of a different data type – a hierarchical model does not mandate to have same record types under a same parent
4. A child node can be a child to one and only one parent

Hierarchical databases have their advantages:

Performance: Navigating records in a hierarchical model is faster because the references are basically pointers to the nodes/records directly. I don’t have to search in a index or a set of indexes. This however, is true in a case only when my data model does not have a lot of references. If i am working off with a content-model that includes multi-level references, performance will head south
Easy to understand: It is a simple hierarchy; and it represents something that is “non-technical”. It naturally represents what exists.
And Hierarchical databases have their limitations:

Unable to draw complex relationships between various child nodes – Given the premise that a child node will have only one parent, they are identified only by their parents. We have the capability like XPath to navigate directly to a node, which may be faster. If we do not know the exact path, we will have to navigate the tree (up to a parent, maybe the root) and then down to all nodes before we find what we are looking for. Some questions that I am asking myself:
1. What qualifies as a reference for an object?
2. Should speed at which the data can be fetched a driver to defining a reference?
3. What are the best practices that I should be aware off, when I am modeling my domain?
4. When do i decide I need a network model instead of hierarchical model?
Difficult of maintain – hierarchical models also mean that I do not have a command like ALTER TABLE. This essentially means then if I later decide to add another property to a specific node type I will have to write code to update all the nodes
1. Is there a way where I can update a node-type thus updating all the objects which are of that node type?
2. Is there a way to avoid such situations (apart from saying that lets get it right in Release 1.0 and pray to God client will not ask for a change request :))
Lack of Flexibility – In this article, Scott Ambler quotes – “Hierarchical databases fell out of favor with the advent of relational databases due to their lack of flexibility because it wouldn’t easily support data access outside the original design of the data structure. For example, in the customer-order schema you could only access an order through a customer, you couldn’t easily find all the orders that included the sale of a widget because the schema isn’t designed to all that.”. This is a typical case of where reporting is a must and it might be in many systems.
1. Are there other scenarios?
With all the context set of Hierarchical, it is now important we look at CQ54’s content repository – CRX. While CRX is a hierarchical repository it should not be confused with a hierarchical database. CRX provides us with JCR node types which allow us to force structure. We also have the capability of creating custom nodes, but should do it with care. The principle is not to go overboard with structure.

Question remains – “how do I manage content in CQ54″. I do not have a “go-to” answer, but what I have described below is how I am going to think when I start the process.

Content modeling: Look at the requirements i.e. wireframes, creative design assets and identify various content types, structures and relationships between content types. We can take the object-oriented approach and define everything as an object or keep similar content types together. There are several things that should be considered when taking one approach over the other:

What is the business process for crating an object type. Do the content types follow same workflow?
1. Steps that are required to activate a content. An article, a blog, a discussion forum entry may have the same process flow of an author and a reviewer then there is a case of having a single abstract content type
2. However, if an article needs a legal review and can be used in several other business process than just a simple article we may want to bring article out as its own content type
Reuse
1. What kind properties do they share
2. Modeling content for an education system where we have content types like a college or a school where we see a lot of similarities there is a case we can build on creating an abstract content
How does the content author wants to look at the content
1. If we have a set of users who want to manage their content as structured content like books, movies etc we should look to provide those content types very specifically
2. In another scenario if we have authors who do not worry a lot about specific objects i.e. Page-centric content creation then we can decide to club content types together

 

Managing Relationships: In CQ, given it has a hierarchy based data storage model which complies with JCR specifications, we do not have a way to create strict rule-based relationships. We can create relationships using one of the following ways:

Path based references: We can do this by creating properties on objects that hold a “path” or a “list of path” to which the content has relationships with
1. They are semantic
2. Not bound to an “obscure IDs”
3. Do not enforce integrity constraints which may create troubles in extensibility later
4. Being REST-ful they allow us to navigate directly to the node, thus making navigations very quick
5. Being REST-ful, they allow author to visualize their content relationships well thus providing them a business view of the content
Taxonomy based references: CQ uses tags to represent a taxonomy. However,  we can not extend tags to hold various profile information. So, you will need to have a mapping system that maps a tag to a content in CRX
1. Taxonomy is the foundation on which the IA stands. Taxonomy allows us a classification system and how the users will view the content on the site.
2. Allow us to clearly identify where in the system the content type resides
3. Is a conceptual framework allowing customers and their customer to locate what they need easily
4. It is hierarchical
Relational Database
1. Can be used in case we reach a point where relationships are too complex
2. Transactional Data should be kept out of CMS and placed in a relational database (or similar)
3. If we do not have to manage the lifecycle of the content
4. Please note that this  will make architecture complex, but if this is needed that it is