Wednesday, October 29, 2014

Microservices... Where to Start?

Micro-services are becoming a "thing" now and are probably de-facto when someone begins a new project and are thinking about hosting in the cloud but where do you start when you have a brown field project. Now I don't have any hot answers or amazing insights here all I can do is describe what my first "micro-service" was and how it came into being.

Over time the application was getting more use and the number of servers involved started to increase; we were using auto-scaling and the number of servers increased in line with usage but wavered between 8 and 24 instances. This quite rightly caused some consternation so we tinkered with number of core settings for each instance and thresholds for triggers to scale up and down but nothing seemed to alter the number of total cores being used. We actually have a hefty bit of logging and we can control the output through logging levels so we decided to change the logging to try and get more diagnostic information and this is when things got interesting. As this is a production system getting hold of this log information was initially problematic and slow so we had already started forwarding all the messages to SplunkStorm using the available API and all was well (for over a year) and we were very impressed with how we could use that information for ad-hoc queries. However when we changed the logging levels the servers started scaling and we started to get database errors; unusual ones involving SQL connection issues rather than SQL query errors. We quickly reverted the changes and decided to try and replicate the problem in our CI/SIT environments.

What we realized was that it was our own logging that was causing our performance issues and even more awkwardly was also responsible for the SQL connection issues as the logging to SplunkStorm via its API was using up the available TCPIP connections; this was even more pronounced when we changed the logging level. What we needed to do was refactor our logging such that we could get all our data into SplunkStorm (and Splunk as we were also in the process of migrating to SplunkStorm's big brother) with minimum impact to the actual production systems. Thankfully our logging framework used NLog, which we had wrapped in another entity for mocking purposes, so what we decided to do was write a new NLog target that would instead log to a queue (service-bus) and then have another service read messages from that queue and forward them to Splunk and SplunkStorm and thus our first micro-service was born.

The new NLog target took the log messages, batch pushed them to the queue, then a microservice was written that monitors the queue, pulls messages off in batches, and then pushes them to Splunk and SplunkStorm, also in batches. The initial feasibility spike took 1/2 a day with the the final implementation being ready and pushed into production the following week. Because we were using .NET we could also take advantage of multiple threads so we used thead-pools to limit the number of active Splunk/SplunkStorm messages being sent in parallel. What we found after deployment was that we could scale back our main application servers to 4 instances with only a pair of single core services dealing with the logging aspect, we also noticed that the auto scaling never reaches its old thresholds and the instance count has been stable ever since. Another advantage is that the queue can now be used by other services to push messages to Splunk and can even use the same NLog target in their projects to deal with all the complexities.

I hope the above shows that your first micro-service does not have to be something elaborate but instead deal with a mundane but quite essential task and the benefits can be quite astounding.

No comments:

Post a Comment