Websites like Harvard Business Review and Wall Street Journal take the data on their websites very seriously. The websites are highly secure and compliant and extremely scalable, they have the best software architects on their payrolls who drive the industry trends with best practices. These companies make it close to impossible to hack into their systems or steal their articles and user data and doing so makes them trusted names in the market and leads to enterprises and corporations trusting them for years. In this article, we look at the development of a content management and delivery platform such as Harvard Business Review and everything that goes in the back-end.
Complex and critical projects require complex software architectures that may require leveraging benefits provided by more than one technology and combining them together under a single project. A few years ago WordPress came out with an option where they API-fied the entire code and you could now use the functionality provided by WordPress and use aspects of their CMS in your project by calling their APIs. This started a new trend and soon Drupal, Joomla etc. all other CMSs API-fied their code and this started a new practice (still very nascent, even in 2019) which involved using a headless CMS and using a decoupled architecture. Let’s see why people started using it and why it works out for particular projects.
Let’s take the example of an enterprise scaled new website like wall street journal and let’s assume you were to make a web platform on the same scale. You could break it down into various parts such as –
- The author-editor-website owner component which deals with the content creation and publishing aspect of the platform and
- The user / subscriber view which involves reading the content that has been pushed to the front end
- In addition to the above two, we may have smaller components such as data being received from other external sources, external API integrations etc.
Since WordPress has now API-fied their code, this means you could handle the first section using WordPress content management functionality where you would also get access to the text editor, user roles and permissions etc. The second aspect which involves maintaining user related data such as complete information related to each user including their subscription information and facilitating their signups and logins can be handled by a faster / more scalable technology (NodeJS), the front end where the users can read articles can be handled by an asynchronous technology to load up articles super quick and without requiring page refresh, such as Angular or React (in our case we will go with React). For React to communicate with nodeJS and manage states of our application we can use a middleware such as Redux or MobX (in our case, let’s go with redux as our application may require going to a bigger scale) and for NodeJS to store user data, we need to communicate with a highly scalable database like mongoDb and to do this, we need an abstraction layer or middleware like Express. In addition to this, your platform may have functionality like chatting or charting and analysis, this may need data to be stored in to a real-time database so that you can run real-time operations on the data to show charts and analysis, and for this, we may use firebase. Below is a representation of how it would technically work.
Now, we have theoretically constructed a platform where we have combined not only three separate technologies but 3 entirely different stacks –
- WordPress + Mysql
- React + Redux + NodeJS + Express + MongoDB
- Redux + Firebase
So far everything looks good, sounds interesting and we’re ready to get a pat on the back from the client. However, the problems start arising few months down the line, when time comes to scale up the application. So, let’s look at a few limitations with this approach —
For a SAAS application, we have multi-tenancy structure, each tenant represents a different company that’s using our application and each tenant has a variable number of users which could be the employees in that company. The tenants could be structured in three ways –
- Multiple tenants in a collection
- Separate collection per tenant
- Separate database per tenant
The first approach is the most cost effective as you don’t put a lot of load on your resources, there is data separation but no data isolation. While the last approach is the most secure as it has complete data isolation per tenant and this is the only structure that is GDPR compliant. Now, if you choose to go the GDPR compliant way , then you have different data for each tenant in a different database, the problem comes in when there are dependencies between the WordPress layer and the NodeJS layer, as the NodeJS part can easily be structured in the GDPR compliant way but for each user we may be storing some part of information in the WordPress and mysql component, and since this cannot be compartmentalized based on tenants as this component has nothing to do with tenants and stores generic information.
Each user may have information specific to him such as reports that he may have purchased or articles that he has access to. Now, since this information is in mysql (linked with WordPress), we need to make these requests each time the user logs in, not only does this open us up to a lot of errors where wrong requests could lead to data exposure of other client but we may also be susceptible to attacks (such as man in the middle), also the PHP server is a blocking server and for this reason alone, it becomes difficult to handle multiple requests at the same time. This creates a lot of problems and all of the problems are created due to the fact that WordPress is a part of our project. We can eliminate it from the project by shifting everything back to simply NodeJS, but that also means we have to build complete CMS part ourselves and that’s a lot of development time added to our project!
This is why we need to take a call based on case to case basis on which architecture may work best for us and there always be a trade off between extreme scalability and low development time or reinventing the wheel.