jeudi 15 décembre 2016

A plea for modern testing and the keyword approach



Observing how mainstream corporations try to adapt and keep up with modern trends such as agile development, REST-based architectures, continuous integration and browser-based emulation has been fascinating to me lately. While these new ideas always seem very sexy on paper, they're not always easy to implement and take advantage of. And it's not just because of lack of talent or proper management.

At the same time, their IT system is oft a ticking bomb, unsupported stacks crawl around and when delayed too long, major technical decisions tend to become radical and a bit brutal. So figuring out how to transition, toward what (there a whole ocean of technology out there, and it's getting bigger and more confusing by the minute) and with which goals in mind, is definitely a key element in the success of these companies these days.

For instance, the SaaS transformation which has taken many markets by storm over the last few years still poses many technical as well as legal problems to insurances, banks or administrations, and so most of these companies still need to produce, test and operate their own software at scale. Software which happens to be very complex due to the nature of their businesses.

But today I'd like to focus on just one of these practices. I want to talk about how testing has been done in many of these environments, and how testing should be done, in my opinion. What I'm about to describe here is my personal experience working with and for such companies and what I think the future holds for them as far as testing goes.

And I want to start by mentioning that their biggest challenges are not or not just about scaling services to millions of users but are also about ensuring they're reaching solid test coverage in applications which may produce hundreds of thousands of very different and peculiar use cases and which contain a lot of oddities due to historical reasons or regulations. These aspects impact a lot of things down the line when you're trying to modernize your system. They also turn test automation and test artefact reusability into absolutely critical factors which need to be taken in account when designing or redesigning a test factory.

The way testing has been done in most of these environments over the past decades involves an approach where each person or team who's concerned with a particular aspect of testing would implement not just their own strategy, but also their own scripts, buy their own tools, set up their own environment, etc. I call this "siloed testing".

There are many reasons for why I believe this approach is ill-advised, but I'm just going to cover a few of them here. These are big ones though, so buckle up, because if you're still testing in an unstructured way with old funky legacy tools, you're in for a ride.

Oh one last thing : I'm going to assume the reader understands the benefits of test automation here. I believe certain situations call for a bit of manual testing (let's not be dogmatic about anything), but I also don't think there's a single project I've worked on where anyone could argue in a solid way against a target of at least 80% of automated test coverage.

So let's dive into it!

The obvious reason : redundancy


I can't event count anymore how often I've come across corporations who have at least 3, but sometimes 4 or 5 different teams in charge of different types of tests for the same application or group of applications. 

It seems like a healthy approach, since we're just trying to spread responsibility across different people and distribute workload here. But this configuration is only sustainable and scalable if you can do it efficiently enough. In my experience, it's not unusual to see a developer team, a functional testing team, a system/integration team, a load and performance testing team and then sometimes even a synthetic monitoring team re-script the same use case 5 times, maybe with 5 different tools, and often without even being aware of it.

Worse, they'll probably throw these scripts away entirely when the next release comes around.




Scripting a use case correctly can be pretty difficult and will sometimes involve hours or days of work, depending on how complex the scenario is and how proper the tooling and simulation strategy are.

I don't think there is a situation where that kind of waste makes sense. And I'm barely even going to mention the fact that in many cases, the resulting data - if any - is often meaningless because it's corrupt, incomplete or misinterpreted in some way. The complexity of this practice, sometimes the lack of skilled testers, but also the poor quality of the tools that people use, are all part of the problem here.

Nevertheless, it's important we make sure to get rid of this island of script waste in the middle of our ocean of Testing.


The serious reason : competency


Not only are we wasting assets such as scripts and test artefacts here, but we're also wasting some very precious time.

While a developer can probably write a solid Selenium or Grinder script easily, it's much less the case or at least it's less natural for a functional tester who works with tables like an Excel spreadsheet, or even a performance engineer, who's more comfortable analizing system behaviour, looking at charts and trying to identify incoherent, faulty or slow behaviour.

Asking a monitoring team to write a million scripts to monitor your services synthetically is also a bad investment. They already have a lot on their plate, trying to put everything together and making some sense out of the data that they're already gathering. Sometimes, if you're the CTO or an application owner for example, you'll come in and bug them for another dashboard so you can see what's going on in your system, but maybe you're making them waste a ton of time on things that seem mindless to them and that they're not truly equipped for. And maybe, that's the main reason why the dashboard you've asked for isn't already available.



Bottom line is (and it seems people forget or ignore this in testing departments sometimes) : to work efficiently, people should be working on what they're good at, with the correct tool and proper training. But wait now, this doesn't mean that people shouldn't learn new things or talk to each other. Our philosophy sits at quite the opposite end of the spectrum actually, which I'll demonstrate in the last section of this blog post.


The unacceptable reason : maintainability


So at this point in this old school scenario, we've got the wrong people working on the wrong tasks, possibly with the wrong tools, which is already pretty bad, but we're also having them do that very same task up to 5 times in a completely redundant way.

But wait. It gets even worse, much worse.

I remember distinctly at least 5 or 6 times in my career where I had to take over a testing project from someone for various reasons. Either they were gone or when the new release came out and was ready to be tested, they were busy working on something else at that time. Either way I had to try and reuse their code and in almost every case, I was unable to reuse what they had done. And that's not because I'm just horrible at scripting or didn't put in the time and effort, I beg you to trust me here.

It's because of two main factors :
  • big long monolithic code is awfully hard, and in some cases, impossible to maintain
  • the target applications we were supposed to test changed very rapidly
That's a very bad yet very common combination.

Sometimes people would come and try to help me make their own code work but it still wouldn't work. Sometimes that person would even be myself. Trying to test the same application with my own code 6 months after I wrote it was a true challenge. Of course, you're going to say :


"The answer is easy : just write more modular code!"

And... this is where the tooling problem comes in play. I don't want to call out the proprietary and sometimes legacy tools which I was literally forced to use at the time, but let's just say they don't particularly make the job easy for you in that department. It's very hard to create and maintain library code with these tools (you know who you are), when it is even possible at all. And it's even harder to share it with colleagues. Sure, we tried managing our script repository professionally and versioning them and everything, but the process was always so awkward, slow and buggy because these tools simply were not designed to work that way.

These tools are what I call "throw-away" oriented because they both expect you to throw your code away at the end of your testing campaign and make you want to throw them away too (the tools themselves).

I must say I did encounter the occasional big long term project which had managed to make it work because they had direct insight into the code changes and had a huge budget to pay people to keep maintaining the scripts and test artefacts. But that's just a silly, brute-force way of solving the problem which absolutely can not scale and should never be duplicated as far as my opinion goes.

Thankfully, tools such as step and Selenium have emerged in the mean time and the options provided by the open-source scene are much more satisfying these days. I must say also that some of the older open-source tools like Grinder or JMeter (which still seems to have a strong community to this day) didn't work too bad and relieved some of the pain in the performance testing department at times. It just depended on what exactly you had to do with them. The main problem being that again, they were not initially designed with the idea of sharing and reusing for other tasks in mind (across an entire dev-ops pipeline).

But tools themselves are not enough and I can't blame this entirely on them. A lot also had to do with the way we were approaching these things and back when I was mostly a Performance Tester, I'll confess I wasn't what I now consider a good developer in the sense that no one had really taught me what a good way of organizing my code was, nor why. Not even when I was in college, studying software engineering. Or did I just not pay enough attention to that part of the course back then? ;)



True modularity : the keyword approach


So there I was, stuck in that situation where no one really had a thoughtful approach to solving this problem at the industrial level at the time. And then things changed. I found out about the keyword approach and got a chance to work on a very innovative project for a client, in which we were able to design our testing platform from scratch. And we decided to embrace some of the concepts of that keyword approach and shoot for the best possible tradeoff between modularity, performance, maintainability and ease-of-use.

If I had to summarize this approach in one sentence I would say it's a way for people to create their own framework, their own domain-specific language almost, which will then allow them to describe the aspects of automation and test logic in a way that's relevant to their own business or application model.

And we'll do this in a very clean way, meaning that we'll use some solid interfaces for passing inputs and outputs to and from these keywords. We'll treat each little script as a little blackbox, so that higher level users don't have to concern themselves with code. Instead, they'll just use scripts like Lego's to implement their testing strategy.

As a result of that initial descriptive effort, many services and tasks can be streamlined and served in a unified way, where before, each would potentially have their own implementation based on a number of factors.

This basically means that we're going to analyze our needs and split the business or application logic into atomic chunks of behaviour. The level of atomicity or "amount of behaviour" we'll wrap into a keyword object is key here because the more chunks the more scripts and keywords to maintain and execute, but the larger the chunks, the less modular and less reusable.

Anyhow: once you've exposed your scripts as keywords with a definition of their expected inputs and outputs, you can use them as building blocks to design your tests cases.

Whether it's the logic that varies (you're testing different scenarios by using different sequences or combinations of keywords) or the data (you're performing data-driven scenarios with large volumes of different inputs but using mostly the same logic and thus, keyword combination), it becomes very easy to combine them into a test plan. The image below illustrates this concept.




By the way, in practice, we find that the Pareto rule applies and that on average, 20% of the keywords (the most important ones) will be involved in 80% of the test scenarios. Therefore, and in order to limit costs, it is advised to identify these key functions early and implement them first. You can always decide to maximize your automated test coverage down the line.

Now, as I stated earlier, one of the beauties of this, is that once you've found the granularity level that you need, you'll be able to streamline all of the different tasks related to testing, based on that framework that you've just created for yourself.

Here are some of these services you'll be able to deliver in a truly unified way, meaning independently from the underlying scripting tool, coding style or simulation approach, just as long as you're building your tests around keywords:
  • instantly exposing all test artefacts to other testers and users for reuse
  • presenting test results, errors and performance data (and in a way that makes sense to both technical and business teams)
  • reusing and deploying test cases for multiple purposes and types of tests (synthetic monitoring, performance testing, batched functional testing, etc)
and also (although these are not pure advantages from the keyword approach, but they become much easier to implement and build into your platform if you're using that approach) :
  • monitoring your tests in real time
  • managing test results and test meta data
  • rationalizing and managing your test infrastructure resources across all projects
  • planning, orchestrating and triggering test executions

Now, this is what the word "efficiency" means to us. We're not just talking about isolating a piece of code as a library anymore here !

On a more technical note, in the context of the step project, all of this meant that we had to come up with a dedicated architecture and a fully optimized stack to make this possible at scale and to provide our users with the best possible experience. And this was quite a challenge. I'll write more on this in future blog entries because I think there's a lot of interesting technical content to share there.


Signing off from Le Bonhomme.