French men can't code: 2016

jeudi 15 décembre 2016

A plea for modern testing and the keyword approach

Observing how mainstream corporations try to adapt and keep up with modern trends such as agile development, REST-based architectures, continuous integration and browser-based emulation has been fascinating to me lately. While these new ideas always seem very sexy on paper, they're not always easy to implement and take advantage of. And it's not just because of lack of talent or proper management.

At the same time, their IT system is oft a ticking bomb, unsupported stacks crawl around and when delayed too long, major technical decisions tend to become radical and a bit brutal. So figuring out how to transition, toward what (there a whole ocean of technology out there, and it's getting bigger and more confusing by the minute) and with which goals in mind, is definitely a key element in the success of these companies these days.

For instance, the SaaS transformation which has taken many markets by storm over the last few years still poses many technical as well as legal problems to insurances, banks or administrations, and so most of these companies still need to produce, test and operate their own software at scale. Software which happens to be very complex due to the nature of their businesses.

But today I'd like to focus on just one of these practices. I want to talk about how testing has been done in many of these environments, and how testing should be done, in my opinion. What I'm about to describe here is my personal experience working with and for such companies and what I think the future holds for them as far as testing goes.

And I want to start by mentioning that their biggest challenges are not or not just about scaling services to millions of users but are also about ensuring they're reaching solid test coverage in applications which may produce hundreds of thousands of very different and peculiar use cases and which contain a lot of oddities due to historical reasons or regulations. These aspects impact a lot of things down the line when you're trying to modernize your system. They also turn test automation and test artefact reusability into absolutely critical factors which need to be taken in account when designing or redesigning a test factory.

The way testing has been done in most of these environments over the past decades involves an approach where each person or team who's concerned with a particular aspect of testing would implement not just their own strategy, but also their own scripts, buy their own tools, set up their own environment, etc. I call this "siloed testing".

There are many reasons for why I believe this approach is ill-advised, but I'm just going to cover a few of them here. These are big ones though, so buckle up, because if you're still testing in an unstructured way with old funky legacy tools, you're in for a ride.

Oh one last thing : I'm going to assume the reader understands the benefits of test automation here. I believe certain situations call for a bit of manual testing (let's not be dogmatic about anything), but I also don't think there's a single project I've worked on where anyone could argue in a solid way against a target of at least 80% of automated test coverage.

So let's dive into it!

The obvious reason : redundancy

I can't event count anymore how often I've come across corporations who have at least 3, but sometimes 4 or 5 different teams in charge of different types of tests for the same application or group of applications.

It seems like a healthy approach, since we're just trying to spread responsibility across different people and distribute workload here. But this configuration is only sustainable and scalable if you can do it efficiently enough. In my experience, it's not unusual to see a developer team, a functional testing team, a system/integration team, a load and performance testing team and then sometimes even a synthetic monitoring team re-script the same use case 5 times, maybe with 5 different tools, and often without even being aware of it.

Worse, they'll probably throw these scripts away entirely when the next release comes around.

Scripting a use case correctly can be pretty difficult and will sometimes involve hours or days of work, depending on how complex the scenario is and how proper the tooling and simulation strategy are.

I don't think there is a situation where that kind of waste makes sense. And I'm barely even going to mention the fact that in many cases, the resulting data - if any - is often meaningless because it's corrupt, incomplete or misinterpreted in some way. The complexity of this practice, sometimes the lack of skilled testers, but also the poor quality of the tools that people use, are all part of the problem here.

Nevertheless, it's important we make sure to get rid of this island of script waste in the middle of our ocean of Testing.

The serious reason : competency

Not only are we wasting assets such as scripts and test artefacts here, but we're also wasting some very precious time.

While a developer can probably write a solid Selenium or Grinder script easily, it's much less the case or at least it's less natural for a functional tester who works with tables like an Excel spreadsheet, or even a performance engineer, who's more comfortable analizing system behaviour, looking at charts and trying to identify incoherent, faulty or slow behaviour.

Asking a monitoring team to write a million scripts to monitor your services synthetically is also a bad investment. They already have a lot on their plate, trying to put everything together and making some sense out of the data that they're already gathering. Sometimes, if you're the CTO or an application owner for example, you'll come in and bug them for another dashboard so you can see what's going on in your system, but maybe you're making them waste a ton of time on things that seem mindless to them and that they're not truly equipped for. And maybe, that's the main reason why the dashboard you've asked for isn't already available.

Bottom line is (and it seems people forget or ignore this in testing departments sometimes) : to work efficiently, people should be working on what they're good at, with the correct tool and proper training. But wait now, this doesn't mean that people shouldn't learn new things or talk to each other. Our philosophy sits at quite the opposite end of the spectrum actually, which I'll demonstrate in the last section of this blog post.

The unacceptable reason : maintainability

So at this point in this old school scenario, we've got the wrong people working on the wrong tasks, possibly with the wrong tools, which is already pretty bad, but we're also having them do that very same task up to 5 times in a completely redundant way.

But wait. It gets even worse, much worse.

I remember distinctly at least 5 or 6 times in my career where I had to take over a testing project from someone for various reasons. Either they were gone or when the new release came out and was ready to be tested, they were busy working on something else at that time. Either way I had to try and reuse their code and in almost every case, I was unable to reuse what they had done. And that's not because I'm just horrible at scripting or didn't put in the time and effort, I beg you to trust me here.

It's because of two main factors :

big long monolithic code is awfully hard, and in some cases, impossible to maintain
the target applications we were supposed to test changed very rapidly

That's a very bad yet very common combination.

Sometimes people would come and try to help me make their own code work but it still wouldn't work. Sometimes that person would even be myself. Trying to test the same application with my own code 6 months after I wrote it was a true challenge. Of course, you're going to say :

"The answer is easy : just write more modular code!"

And... this is where the tooling problem comes in play. I don't want to call out the proprietary and sometimes legacy tools which I was literally forced to use at the time, but let's just say they don't particularly make the job easy for you in that department. It's very hard to create and maintain library code with these tools (you know who you are), when it is even possible at all. And it's even harder to share it with colleagues. Sure, we tried managing our script repository professionally and versioning them and everything, but the process was always so awkward, slow and buggy because these tools simply were not designed to work that way.

These tools are what I call "throw-away" oriented because they both expect you to throw your code away at the end of your testing campaign and make you want to throw them away too (the tools themselves).

I must say I did encounter the occasional big long term project which had managed to make it work because they had direct insight into the code changes and had a huge budget to pay people to keep maintaining the scripts and test artefacts. But that's just a silly, brute-force way of solving the problem which absolutely can not scale and should never be duplicated as far as my opinion goes.

Thankfully, tools such as step and Selenium have emerged in the mean time and the options provided by the open-source scene are much more satisfying these days. I must say also that some of the older open-source tools like Grinder or JMeter (which still seems to have a strong community to this day) didn't work too bad and relieved some of the pain in the performance testing department at times. It just depended on what exactly you had to do with them. The main problem being that again, they were not initially designed with the idea of sharing and reusing for other tasks in mind (across an entire dev-ops pipeline).

But tools themselves are not enough and I can't blame this entirely on them. A lot also had to do with the way we were approaching these things and back when I was mostly a Performance Tester, I'll confess I wasn't what I now consider a good developer in the sense that no one had really taught me what a good way of organizing my code was, nor why. Not even when I was in college, studying software engineering. Or did I just not pay enough attention to that part of the course back then? ;)

True modularity : the keyword approach

So there I was, stuck in that situation where no one really had a thoughtful approach to solving this problem at the industrial level at the time. And then things changed. I found out about the keyword approach and got a chance to work on a very innovative project for a client, in which we were able to design our testing platform from scratch. And we decided to embrace some of the concepts of that keyword approach and shoot for the best possible tradeoff between modularity, performance, maintainability and ease-of-use.

If I had to summarize this approach in one sentence I would say it's a way for people to create their own framework, their own domain-specific language almost, which will then allow them to describe the aspects of automation and test logic in a way that's relevant to their own business or application model.

And we'll do this in a very clean way, meaning that we'll use some solid interfaces for passing inputs and outputs to and from these keywords. We'll treat each little script as a little blackbox, so that higher level users don't have to concern themselves with code. Instead, they'll just use scripts like Lego's™ to implement their testing strategy.

As a result of that initial descriptive effort, many services and tasks can be streamlined and served in a unified way, where before, each would potentially have their own implementation based on a number of factors.

This basically means that we're going to analyze our needs and split the business or application logic into atomic chunks of behaviour. The level of atomicity or "amount of behaviour" we'll wrap into a keyword object is key here because the more chunks the more scripts and keywords to maintain and execute, but the larger the chunks, the less modular and less reusable.

Anyhow: once you've exposed your scripts as keywords with a definition of their expected inputs and outputs, you can use them as building blocks to design your tests cases.

Whether it's the logic that varies (you're testing different scenarios by using different sequences or combinations of keywords) or the data (you're performing data-driven scenarios with large volumes of different inputs but using mostly the same logic and thus, keyword combination), it becomes very easy to combine them into a test plan. The image below illustrates this concept.

By the way, in practice, we find that the Pareto rule applies and that on average, 20% of the keywords (the most important ones) will be involved in 80% of the test scenarios. Therefore, and in order to limit costs, it is advised to identify these key functions early and implement them first. You can always decide to maximize your automated test coverage down the line.

Now, as I stated earlier, one of the beauties of this, is that once you've found the granularity level that you need, you'll be able to streamline all of the different tasks related to testing, based on that framework that you've just created for yourself.

Here are some of these services you'll be able to deliver in a truly unified way, meaning independently from the underlying scripting tool, coding style or simulation approach, just as long as you're building your tests around keywords:

instantly exposing all test artefacts to other testers and users for reuse
presenting test results, errors and performance data (and in a way that makes sense to both technical and business teams)
reusing and deploying test cases for multiple purposes and types of tests (synthetic monitoring, performance testing, batched functional testing, etc)

and also (although these are not pure advantages from the keyword approach, but they become much easier to implement and build into your platform if you're using that approach) :

monitoring your tests in real time
managing test results and test meta data
rationalizing and managing your test infrastructure resources across all projects
planning, orchestrating and triggering test executions

Now, this is what the word "efficiency" means to us. We're not just talking about isolating a piece of code as a library anymore here !

On a more technical note, in the context of the step project, all of this meant that we had to come up with a dedicated architecture and a fully optimized stack to make this possible at scale and to provide our users with the best possible experience. And this was quite a challenge. I'll write more on this in future blog entries because I think there's a lot of interesting technical content to share there.

Signing off from Le Bonhomme.

jeudi 27 octobre 2016

Taken by storm

Hi everyone,

I'm typing these lines from a Tim Hordon's in Toronto, as our month long trip through the US and Canada nears its end. I don't normally post personal stuff on here, as it's not my primary intent, but I thought I would drop a few lines for those wondering what's up and also to briefly mention what I did before going on vacation and did not get a chance to write about.

I really wanted to put up some more material and tutorials before we left the country but I was so busy working on our first release of STEP that I just had to give up on that. And I never really got a chance to sit down and write a blog post since then either.

We're pretty excited about this release though, and I'm planing on working on a few tutorials when I get back home, which will be similar to those I did for djigger. You can expect a quick start video guide to get you going with a basic example and then in a second video, I'll probably demo our integration with Selenium with a script that's closer to a real life scenario.

I'll show off a little and put up a pic of the NFL game I went to (Falcons @ Seahawks, at the Century Link Stadium in Seattle) ;)

Signing off from Toronto.

Dorian

mardi 14 juin 2016

The impact of JDBC Connection testing in a WLS Datasource

A few weeks ago, I performed a bit of sampling in production on several nodes of the business layer of one of my client's applications. The application went into production a little over year ago and has been stable since then but from time to time I still take a look at the distribution of response time across the business and technical packages of the application.

This time I used djigger's reverse tree view to take a look at the "top of the stacks" to see if I could find any more potential low-level optimizations.

Since the business layer's implementation is based on EJBs running in a Weblogic container, I used weblogic's automatic skeleton naming "WLSkel" to filter out the noise of the other threads. djigger's "Stacktrace filter" is really good at that. By excluding the irrelevant threads in the JVM I could ensure my global statistics and hence analysis would be correct.

As I wanted to perform a global analysis and get a global overview of what the runtime was doing, I didn't need to further filter on specific threads or transactions.

Tip: As you start unfolding the different nodes in the tree (which is sorted by top consumer / number of occurences), the top part of the stack tells you what a thread is currently doing, while the bottom of the stack tells why or for whom it is doing it.

After sampling for a few minutes, I expanded a few nodes starting from the first method: socketRead (to be interpreted as time spent waiting for a reply on the network) and noticed several occurences of stacks originating from the testing of JDBC Connection objects, which usually takes place prior to executing SQL Code against the oracle database. What weblogic's datasource does for you here is it always tests the underlying connection to the database to make sure you're getting a valid connection.

So I decided to measure just how much time was spent in those testing stacks on average, regardless of which business method is being called. That can be done by using the "Node Filter" which cleans up the tree and implicitely aggregates matching orphan nodes. That's perfect for adding up the time spent in branches of the tree that come from different initial paths.

Depending on the environment and the scenario executed on these (test, integration & prod), I found the overhead to be standing between 10 and 25% (I believe these screenshots are based on data collected while reproducing the problem via automated testing in a testing environment, hence the 15% mark). The 25% mark being reached in production.

Traditionally, a developper will manage some sort of "scope" or "session" while working with the data layer (whether explicitely or implicitely depending on whether the transaction is attached to that of the container's and whether they're using an OR Mapper such as Hibernate or OpenJPA) in which the DB Connection object remains relevant, reducing the frequency of connection aquisition and hence, tests.

People try to make sure - and rightfully so - that they don't keep a pooled object in use for too long to avoid contention on the underlying resource.

However, in the event that you're acquiring and releasing these objects too frequently, connection testing can induce significant overhead. Knowing that under "normal" circumstances (no network or DB failure), DB Connections remain valid for pretty much the entire runtime lifetime, this becomes a bad investment.

In my case, the unorthodox application architecture causes for those tests to happen extremely frequently. Not only do we have a very fine granularity in the design of the service layer, which leads to hundreds of RMI/EJB calls per user action, but it seems the developers may have added an additional abstraction for managing DB sessions which can cause for multiple connection objects to be acquired within a single EJB transaction.

One last factor adding overhead here is the fact that we're dealing with an XA Datasource. With a regular Datasource, usually, a simple "SELECT 1 FROM DUAL" will be fired which has virtually no impact on the DB and will essentially just cost you a round-trip over the network. Here with XA, each of these test queries will result in the creation and management of a distributed transaction which will, in turn, cause the execution of more complex code along with multiple round trips to the database (see being, end, commit semantics).

So just to make sure that this juicy piece of response time could actually be spared entirely, we tried turning off connection testing and ran the same automated test scenario twice and here are the results.

PS: I'm back on the regular tree view in this screenshot, and I voluntarily excluded all of the non-weblogic packages, so that's why you're seeing a direct call from the RMI package to the jdbc package. It's a simplification, of course.

The modification did spare the entire 13% of stacks spent in weblogic's test method.

The 2 Stack samples found in the "testing disabled" case actually came from an LDAP datasource for which testing was still enabled, so these can be excluded.

However, an important robustness feature we'd lose with this is that when the database crashes or happens to be restarted, our business layer can't recover once the db is back up. And that's somehow problematic for our ops guys.

So now we're fiddling with weblogic's parameters "SecondsToTrustInterval" and "Test Frequency" which should eventually help solving this problem, since they were specifically designed for that kind of scenario. See screenshot below.

(the weblogic console wants to pop up in french on my computer, sorry about that..)

I haven't figured out the proper interval to use just yet and I need to do some more testing so that's still work in progress, but I thought this had interesting potential and would be worth sharing.

Signing off from Le Bonhomme.

Dorian

lundi 16 mai 2016

Why 24/7 monitoring is crucial

Being able to diagnose a performance problem in real time is great. But what happens when a colleague or a customer asks you to diagnose a problem that occured a week ago or if you're only notified after a JVM crashes?

A lot of people I know started using djigger because it's such a simple and intuitive tool. Recently I've tried to convince them to go ahead and set up a collector so that they wouldn't even have to worry about sampling anymore and so that they could analyze their problems "on demand", i.e regardless of the time perspective. It turned out I didn't have any convincing to do. The need was definitely something they wanted us to address and I've gotten a lot of positive feedback on this feature, which takes us one step closer to releasing a full APM-mode.

Whether you're wanting to monitor production JVMs efficiently or you're a developer wanting to take advantage of passive monitoring, you'll probably enjoy the simplicity of djigger's collector mode. It's worth mentioning that the retry mechanism built into the collector allows a developer to not have to worry about uptimes. As long as a JMX server is running in your program and regardless of when that program is running, the collector will attempt to connect and resume sampling after a down period.

All you have to do is set up a config file describing the connection and flag information of the JVMs you wish to monitor, start a mongoDB instance and the collector will do the rest for you !

I've actually just released a short video tutorial on youtube to guide you through these steps. Enjoy !

mardi 19 avril 2016

Why, when and how you should avoid using agent mode

So I thought I would write a short ironic post today.

Java agents are great for bytecode instrumentation but even as intrusive as they are, they still come short of their goal sometimes. Also, they come with a certain overhead in resources and configuration maintenance due to the fact that they require modifications and updates on the monitored JVM side. I could summarize this thought by saying that they cause a higher "TCO", than the simple, risk-free, collector-mode sampling approach.

Or I could illustrate it in a more explicit way. Depending on your context, using instrumentation might be like owning a Ferrari. If you liked sports cars and even if you had enough money to buy one, maybe renting a Ferrari for a day or two when you want to race or take a drive down the coast would make more sense than owning it, with all the ramifications it implies? I feel like I'm still not nailing this entirely here though, seeing as most Ferrari customers are probably not the most pragmatic and rational buyers, and they probably don't care about efficiency when it comes to using their sportscar.

But the reason why this post is slightly ironic though, is that I just uploaded a youtube tutorial today showing you how you can use djigger in agent mode for instrumentation purposes.

Obviously, in many cases you do need instrumentation. And in certain cases you'll want it on at all times, for instance if you end up building your business insights on top of it. But is that always a requirement, or is there some sort of way you could use sampling data to come up with the same or 'good-enough' information to understand and solve your problem?

I've partially covered this topic in my simulated "Q&A" session, but I felt I needed to explain myself a little more and illustrate my point with a recent example.

Here's an upcoming feature in djigger (which should be published some time this week in R 1.5.1) that will allow you to answer the classic question "is method X being called very frequently or do just a few calls take place but each call lasts a long time"?

This is the question you'll ask (or have asked) yourself almost every time after sampling a certain chunk of runtime behavior. In theory, the nature of the sampling approach and the logic behind stack trace aggregation (explained here) causes for us to be blind and "lose" that information.

However, there is a way to extract a very similar piece of information out of the stacktrace samples. Here's how.

When sampling at a given frequency, let's say every 50 ms, method calls lasting less then 50 ms might sometimes be "invisible" to the sampler. However, every single time a stacktrace changes in comparison to that of the previous snapshot (i.e a sub-method is called or the method call itself finishes or a different code line number is on the stack) then you know for sure that if you find that method or code line number again in one of the next snapshots, that the method count has to have been increased at least by one.

This is what we call computing a min bound for method call counts. And we're very excited about releasing this feature, as it is one of the primary reasons why people need instrumentation.

Again you have to understand, we have nothing against instrumentation and we offer instrumentation capability ourselves through our own java agent. However, there are numerous reasons (simplicity, overhead, risks, speed of analysis, etc) for which we love being able to refine our "data mining" logic at the sampling results level.

My next youtube tutorial will either provide in-depth coverage of that functionality or maybe cover collector mode. Either way, I can't wait to show you more of the benefits of djigger. Also I will try to wrap up part 2 of my silly benchmark tomorrow, so I can tell for sure what the impact of the network link looked like (see my previous entry on this blog).

Until then, I'm signing off from Le Bonhomme.

mercredi 13 avril 2016

A benchmark of jdbc fetch sizes with oracle's driver

Recently I investigated a small performance issue in the search function of a Java application in which sampling results had shown that a ton of time (91% of the total duration) was being spent in Oracle fetches.

After figuring out that the individual fetch times were normal/acceptable, I ended up putting together a small benchmark to isolate the round-trip overhead and sort of fine tune the size value. That way I was able to find out how much time I could save by getting rid of unnecessary round-trips between the application server and the oracle database. I have to admit this was close to being a "mind candy" kind of exercise as I had already convinced the developer to come up with a more selective query (he was basically querying and iterating through the entire table, without filtering hardly anything in his WHERE clause, and I knew that this was probably the wrong approach here). Nevertheless I thought the benchmark would be interesting for future reference and also because that way I would optimize that function regardless of how selective the new query would be.

This is what the sampling session looked like in djigger :

You can see the next() and fetch() call sequence taking 91% of the time in there.

The use case was a simple search screen in a Java application. A search returning no results would take 12 seconds to complete (not very nice for the end user). The actual result set iteration and corresponding table causing that 12 second delay didn't even have anything to do with the table against which the actual search was performed. It was just some kind of a security table that needed to be queried before the actual search results would even be retrieved from the business table.

I also found out that, per default, the fetch size was set to 10. So I did a quick benchmark reusing the same select as the one from the application, locally in my eclipse workspace but connecting to the same database. I measured the query duration and more importantly, the time it took to iterate over the result set, and then I changed the fetch size a few times and ran the benchmark again. I made sure the query results where cached by Oracle and I also ran the same benchmark 3 times each time to make sure I wasn't picking up any strange statistical outliers (due to say, network round-trip time variations or jobs running on the database or database host).

I tested the following values : 1, 10, 100, 1000 and 10000. More than that would have been pointless, since the table only contained 88k records.

Using the default size (10) I got pretty much the same duration from iterating over the entire result set as the duration from the real life scenario (just a little bit less, which makes total sense) : 11 seconds. As you can see in the results below, the optimal value seemed to be floating around 1000, sparing us almost 6 seconds (half of the overall response time). I'd say that was worth the couple of hours I spent investigating the issue.

Interestingly enough, the duration of the query execution itself increased slightly as I increased the fetch size. I'm not 100% positive just yet (I could investigate that with djigger but probably won't), but I assume it has something to do with the initial / first fetch. Since I initially set the fetch size to 1000, I assume Oracle already sends a first batch of X rows back to the client when replying to the query execution call (at cursor return time). In any case, the overall duration (executeQuery + Result Set Iteration) was greatly improved.

Here's the code I executed (I had to anonymize a few things, obviously, but you can recreate this very easily if you're interested) :

EDIT : I had a doubt at some point that my benchmark was not realistic enough, since I was running my test from a "local" VM (but still, it only was 1 hop farther away on the internal network) and not from the server itself.

So I uploaded and ran the benchmark from the server itself and I basically got the exact same results. So then I double checked what was the approximate duration I was supposed to be getting with a fetch size of 10, and it turned out that it matched exactly what I had observed initially in the real scenario and measured in the application itself via instrumentation (11 seconds). So everything makes sense now.

Signing off.

dimanche 10 avril 2016

The art of relaying information

Today, I finished writing comprehensive documentation for djigger. Although the tool has no secrets for me, I believe writing proper documentation is always a challenge. I also try to approach it in a way that is as exciting as possible. I try to put myself in the shoes of the reader/user, and sort of reverse engineer who that person might be, what content they would expect to find, in what order and whether they're going to succeed at using and understanding the tool.

One thing I've learnt is it sounds easy but it's a difficult process that's oft neglected. Neglected by developers who have much more fun writing code than documentation but also neglected by the users themselves, on the receiving end, because they have very limited patience (and rightfully so) and if your docs are low quality, instead of providing that feedback you need so badly, they'll just move on. We all know that. Everything works that way these days. And with each day and each engineer working towards better software and better documentation, presentation, etc, the patience budget of our average user decreases.

So I was happy to take on that challenge and I have to say I believe I did a pretty good job in that first batch of documents. There aren't too many pages, the pages don't seem excessively long to me, and I believe I've covered every feature available in R1.4.2. I took the time to take step-by-step screenshots and really illustrate my points precisely.

On that page, I also took the time to summarize the way Q&A's usually go when I present the tool to someone who's never used it, and I basically laid out the entire philosophy behind djigger. As I've stated, there are litterally thousands of hours of performance analysis behind that project, and it's important to me to show that, because I believe that's the true value we're bringing to you as a user. It goes well beyond the simple java program that we developed. Hopefully people will see that and hopefully they'll want to interact with us, as a result.

So with that, I really hope people will be able to get started with the tool and get answers to as many questions as possible. Of course, everyone is welcome to provide us with their feedback via our contact page. We'll try to answer all of your questions there too, whether they're about denkbar, djigger, technical points, our experience with APMs, everything.

Another quick update I wanted to make today is that we're currently working on publishing a road map for the next months in which the upcoming improvements and additional features of djigger will be documented. I'm not sure just yet when that'll be published (there are still conversations to be had internally) but we'll probably update djigger's page directly when it's ready.

If I have time today, I'm going to start working on the development of a reproducer (a small program destined to showcase a problem and a solution to that problem) that will allow me to illustrate a case study involving djigger and based on past experience at one of my clients in France.

The other project I have is the creation of a youtube channel in which I'll be posting video tutorials on a regular basis.

EDIT: I've added a first video which will show you how to use djigger in sampler mode here.

Those videos will essentially duplicate the content written in our docs and case studies, but will be a more dynamic, (hopefully) fun-to-watch experience, in which people will be able to follow and visualize every step of the logic behind the tutorial and more generally, behind the denkbar initiative and djigger. I might start with a short one covering the installation steps of djigger on the different platforms and how to use the different connectors we made available (see our installation page here for more details).

Remember, you can download djigger v1.4.2 for free at anytime by clicking here, read the documentation I wrote here, check out the release page of our github repository here, visit our website at http://denkbar.io or contact us via this page.

Signing off from Le Bonhomme.

jeudi 7 avril 2016

djigger v1.4.2 is available for download !

We are excited to announce the first public release of djigger (v1.4.1), our production-ready open source monitoring solution for java !

This is the first step as part of the denkbar initiative, and our goal here is simply to make available to the community what started out as a small thread dump analyzer and grew into a more mature production-ready APM solution.

I want to start off by thanking all of the people who made this possible, clients who trusted us and let us implement and test out our features against a variety of complex applications, but also colleagues and friends, who tried out the tool and helped us with their valuable feedback.

At this point there's still a lot of potential to unleash and great functionality to be built, but we're very confident that what we're releasing today can already make a big difference in the way many people and companies analyze performance problems.

We've really juiced the sampling approach like no one has ever done before (at least that we know of), and we're still working on milking that information cow some more. There's just so much you can do with just sampling, you or at least most people would be very surprised.

On top of our sampler, we've built a series of filtering, aggregation and visualization functions that allow you to inspect and understand your code in a very unique way. Once you think you're in a situation where you can't avoid instrumentation (which is highly intrusive) anymore, then you can use our process attach and/or agent mode functionality to start instrumenting methods and get the answers you need. We're now actively extending agent style and instrumentation functionality and we're planing on introducing a lot of new features very soon, such as distributed transaction tracing, getter injection and primitive types capturing.

Also, we intend to provide active support to the best of our ability (and availability) and we'll try to build as much momentum as we can in order to give djigger a chance to fly on its own. We hope the community responds, and we'd like to already thank in advance the people who'll try out djigger.

On that note, we've put a lot of effort in the last few days to package the tool in as pleasant a way as we could (which isn't necessarily easy considering the numerous supported connectors, JVM vendors, JDK versions, OS-specific start scripts, etc), and we hope we've made life easy for you as a user.

That said, we'll definitely appreciate any feedback you can give us, especially if you're stuck early in your attempt at using the tool. In a few days, we'll put up a form on the Contact page at denkbar.io, so you can get in touch with us directly, but for now, feel free to report any issue you like on our github repository.

We'll do our best to help you get going and fix any bugs as quickly as we can!

Signing off.

Dorian

mardi 5 avril 2016

A fresh start.

Hello, World!

It's good to be back. I believe haven't written a blog post since 2011. I must say however, I have been pretty busy and I believe it was worth waiting.

Over the past few years I've been maturing my idea of a collaborative open source platform, which would serve as a basis for promoting open source software, discussing performance, testing and analytic related topics, and simply sharing experience with other people in the industry.

This idea has come to fruition, thanks to the great participation of my good friend and killer java developer Jérôme Comte. We have officially opened a new domain and website at denkbar.io .

The funny thing is, one of my last blog posts on my old blog was centered around the release of an open source tool called jdigger on sourceforge. It didn't get much public attention as we did nothing to promote it really, but it planted a seed and now that I'm re-establishing contact with some of the engineers I worked with in the last few years, I'm getting some unbelievable feedbacks. Some saying they've been using the tool every time they had performance issues, most using it on and off depending on their needs.

To me this is just a tribute to the fact that we really cared about implementing real functionality, meaning things that we really needed and helped us tremendously in our daily life as performance analysts. And if you do that, it almost doesn't matter how badly packaged your tool is, the people who understand it will use it.

Which leads us to a big new step for us : an official release (yeah, yeah, on social networks and all..) of djigger. The more modern and refined version of jdigger (you can bury that name forever by the way, oh and I take full responsibility for the bad idea, but also for the fairly clever quick-fix :D).

The first public release should be announced offcially some time this thursday. We hope you enjoy the tool and give us as much feedback as we can handle.

Soon you will find what you would never have found back then in the old school version of the tool : a comprehensive installation package, documentation, illustration of use cases via official case studies on denkbar.io, an active community of developers on github, youtube videos to help you get started with the tool and much more.

You can follow the official communications (new release, events, etc) of the denkbar community lead over at @denkbar_io, on LinkedIn and on Facebook.

Now, I don't have a lot of time today, and it's already pretty late here, but there will more frequent and more interesting posts in the future, you can count on me.

Before I go, a couple of words on the title of this blog. I'm a pretty big NBA fan, and as some of you may have assumed, this is a reference to a pretty old yet cool basketball movie called White Men Can't Jump. And on these words, I'll wander off letting you think about what it could actually mean to me ;)

PS: my old blog archives (and a few broken screenshots) are still available at dcransac.blogspot.com.

Signing off from Le Bonhomme.