How often do enterprise companies face data consistency issues? As it turns out, these problems arrive pretty often. A lot of companies adopted APIs as a main way to exchange data between processes, services and departments. Both internal and external interfaces, those API mashups sometimes need careful analysis and audit to avoid data consistency and integrity issues. This article is about how Konstankino built a product and integrated it with existing systems in order to fix a data-related issue. We will share what worked for us and why, so you can see what instruments and tools enabled us to highlight problematic data sources and build a tool to fix them. We call these tools API Flatteners or API Aggregators.
When you have a lot of systems that produce data and quite a few departments working with those systems, data modification sometimes became a fragile operation, frequently creating consistency and data integrity issues. The more features we add to our platforms the harder it becomes to manage data, which can very often lead to issues. A lot of people think that once they have their systems built that data consistency issues will not happen or simply will not be possible. But we know the opposite. It’s better to plan for a slow and unreliable network connections, bad data, and interruptions, and then design systems for fault-tolerance.
To fix data consistency issues, there are quite a few challenges in the way. First, you need to get permissions and allocate resources. Secondly, you need to define boundaries and clearly understand the ramifications of every change you make. It is kind of a contract work that you probably can outsource. Either way, initially someone has to proactively review the data in order to understand it. You need to find someone who knows all the systems involved and who invests a lot of diligence to make sure the changes are in sync with the rest of the systems. That is not an easy task, especially if you have a lot of services, APIs, and data sources. It becomes even more dangerous if you have poorly designed public APIs exposed to your customers without good versioning and security in place, but that is a separate topic.
We love technologies that enable us to achieve our goals more easily and efficiently. Nothing is better than technologies and tools that boost our productivity while solving specific thorny issues. This is why we tend to say: the right tool for the right job. But realize that it is a challenge to understand what the right tool is for that specific job.
Our customer has a lot of data and a lot of data integrity issues associated with it. The data was represented in a various forms and shapes, so to say. Quite a few Salesforce instances and a few data centers, not to mention data sources from various platforms. We were privileged to work with quite a few backend systems, quite a few APIs, data sources from external sources, and a Salesforce installation.
Here is a high-level diagram that shows what has been done and the general system architecture along with our additions.
We started to analyze API endpoints to make a high-level APIs review that was exposed for various systems – the root data sources. We have built a tree of data and mappings so that we know how to combine large chunks of data, which by itself does not solve anything but is still necessary. Before you can tell what causes data integrity issues, you need to fetch a huge portion of the data, map it, and then analyze it, so you can understand how the whole infrastructure works.
We needed a solid aggregation tool, one that would fetch data quickly and build a detailed view of it. We put together a solid ReactJS-based application with a Single Sign-On (SSO) mechanism and started to talk to tens of APIs to query the data. ReactJS was a perfect tool for the job. The tool we built also had an ability to modify data quickly and “on the fly” and re-initiate some business processes associated with that data. In other words, we enabled our customer to fix the data and re-submit it to the various underlying systems with a real-time result feedback.
We also added few additional internal API endpoints to fetch intermediate data from some existing legacy Java systems and created some proxy routines to efficiently talk to APIs. We had to work with legacy Spring Boot Java systems and some other ones adding new code and modifying existing ones, adding appropriate API response status codes, better structured JSON payloads, etc. Also, we had to talk to a Salesforce (SFDC) instance via its REST-ful API to get data for analyzing and push back to SFDC. We had to create a proxy between our ReactJS app and the SFDC installation to avoid information disclosure within the user’s browser and for better performance that allowed us to have caching capabilities within our NodeJS proxy.
Here is how this tool looks and works.
This may feel like putting a glue tape on top of the problem, and it is true that it only helps to reduce the number of “open” and known issues. So, yes, data consistency issues are still in place. However, we dramatically improved the situation and gave our client the ability to “fix” data, look at the problems closely, and enabled them to start attacking those issues.
Building a data-analyzing tool allowed us to quickly come up with a solution and a road-map for our customer to shed a light on their data issues. Upon its final release, our tool was talking to dozens of APIs, Salesforce platform endpoints, extracting data for analysis in one secure place – a fast and snappy web application!
As we can see, building data analyzing tool allowed us to quickly came up with a solution and a road-map for our customer to shed a light on data issues.