Hadoop data-mining swiss army knife by @plopezFr and @BertrandDechoux at #devoxx #DV13-HadoopCode

Hadoop data-mining swiss army knife

The website voyages-sncf.com sells half of the Thalys tickets in France and is one of the most visited websites in Europe. That high load triggered a huge amount of logs that, at first, were not used. After some time, they wanted to sek value in those logs and started investigating solutions for distributed computing. Continue reading Hadoop data-mining swiss army knife by @plopezFr and @BertrandDechoux at #devoxx #DV13-HadoopCode

Devoxx 2012 article: How to make good teams great


This presentation rocked. This is only my personal advice and is completely subjective. Anyway, I’ll repeat it: this presentation rocked.

Everything was there: good slideware, an interesting and well-structured content and, last but not least, a smiling and entertaining presenter.

Sven Peter (@svenpet) presents himself as an Atlassian Ambassador.

Let’s go and see what this guy had to say. Continue reading Devoxx 2012 article: How to make good teams great

Devoxx 2012 Oracle keynote: Make the Future Java

Oracle stated success factors

  • technology innovation
  • community participation
  • Oracle leadership

The current work focus is currently put on JavaFX and embedded Java development.

What’s new in JavaSE 8?


The first big change is the inclusion of closures. These adopt the form of

(x, y) -> x + y

Those closures will come with a whole new set of methods, specially on collections, in order to provide some kind of fluent API. There will also be a new keyword default which will allow developers to provide a default implementation on an interface.

Type annotations

Type annotation give further information to the compiler and thus allow it to check some invariants at compile time. Such checks can be nullability checks, immutability checks and so on.

Compact profile

This new profile defines a subset of essential libaries that will be part of a reduced Java Platform aimed at embedded JVMs where memory consumption and file space are concerns.

JavaEE news

The JavaEE 7th version will focus on simplicity and support of HTML5. The JavaEE 8th version will be more oriented towards cloud support and modularity.

And after?

The next goals will be embedding on more and more devices and platforms (e.g. iOS or ARM processors) on the one hand and providing “embedded suites” containing a JVM, an application server and a database together.

How to boost your object-oriented programming with functional programming

This is my summary of a BruJUG conference given bar Uberto Barbini about “How to boost your object-oriented programming with functional programming”.

This is maybe the first time I disliked a conference this year. Everything that Uberto said is true and relevant to the subject. But having read the title and the introduction, I expected something more practical on how to include functional programming concepts in my daily object-oriented experience. Instead of that, I got a strong advice to study functional programming and some good best practices about immutable objects and avoiding side-effects.

Anyway, I’ll write what I remember of that presentation, at least because talking in front of specialists is an uneasy exercise that deserves some attention and respect.


Let’s start with two interesting questions:

  1. What if object-oriented programming was wrong?
  2. Why does code quality matters?

Even if those questions may seem awkward at first sight, it is worth thinking about it and go beyond the standard answers that were hammered inside our heads while we were learning programming.

Bugs are inevitable at some point of our development. They cause frustration and delays. Many are easy to fix ans some can cause big troubles. A good code is thus not a code without bugs but rather a code where bugs are easy to find and to fix.

From the presenter point of view, worst bugs are often related to the state. I totally agree with that. So one solution is to defend your state like a castle so that state effects are limited to the object.

But even these days, even if we say or even sometimes think, we are still writing a lot of procedural code that is, code that describe a story like “do this, then do that, if the result is like this, do this other action” and so on. Take a look in a presentation layer based on Struts and chances are you’ll see what I mean.

Some framework seem to induce procedural paradigm which:

  • is easy to write
  • is hard to understand
  • often involves a global state

Paradigms history

Lisp is one of the first language to implement the functional paradigm and was published in the early 1960’s. In the early 1970’s, Smalltalk introduces the object-oriented paradigm.

But what is object-oriented programming?

  • “It looks like a record with fields and methods plus keywords to define the scope.”
  • “Object-oriented programming is an exceptionally bad idea which could only have originated in California.” Edsger Dijkstra
  • “OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.” Alan Kay
  • “An object has statebehavior, and identity.” Grady Booch

The idea of messaging if very present in Smalltalk itself.

A big problem is that a lot of programmers think they do good object-oriented programming because they use a lot of interfaces and design patterns.

To get some good advice about how to design an object-oriented application, one can read “Domain-Driven Design“, the Eric Evan’s famous book. One of this book concepts is to put emphasis on immutable value objects and avoiding side effects. Those practices makes object-oriented and functional paradigms not far from each other.

Functional paradigm recommandations

  • use only immutable structures (messages)
  • don’t use null objects or objects that are are not “ready”
  • use pure functions (= without side effects)
  • use closure to decouple behavior from data

Object-oriented paradigm recommandations

  • inject dependencies
  • don’t provide getters fro mutable state
  • use interfaces for collaboration
  • create simple aggregates
  • use meaningful names

Popular object-oriented designs

Rebecca Wirfs-Brock classify software architecture into 3 different designs according to the way they locate data and behavior:


This is like procedural paradigm: one class controls the behavior and other classes provide the data.

This is also what Martin Fowler calls an anemic domain.

Despite from that, this kind of design has the advantage that all the logic is concentrated in one place. You don’t have to look everywhere to understand how a process is carried on.


Here the logic is spread across every objects. Each one does something very short and with as few dependencies as possible.

The drawback is that to have the complete point on something, you must navigate all the participating classes.


This kind of design is a compromise between the first two. The logic is spread into some classes that controls the behavior in collaboration with other small classes. They are thus pools of responsibility that can be used in relative isolation.


Learning new paradigms is interesting because it often leads you to a better understanding of the ones you use daily.

Functional paradigm has some very good best practices to include in your daily object-oriented programming like avoiding side-effects and using immutable objects as often as you can.

But besides those good pieces of advice, I must admit that this conference didn’t meet my expectations.

Anyway, the section below has very good books to read. Especially, don’t hesitate to order your own copy of domain-driven design which is a book I really appreciated.


Modular Java BeJUG conference

The two presenters were Bert Ertman and Paul Bakker, both working for Luminis, Bert being an official Oracle Java Champion.

They began their talk by stating the following two trends:

  • Applications these days are bigger and bigger and thus are more complex
  • More and more teams adopt, at least partly, agile methodologies

Those trends bring new challenges along with them:

  • dependency management to manage dependencies of the application on its libraries.
  • versioning of the application that must fit with other enterprise projects which have their own, parallel, lifecycle.
  • maintenance on the long term become difficult (“it’s difficult to refactor level 0 of a 20-story skyscraper”).
  • deployment

As applications are moving to the cloud, other non functional requirements enter the game too:

  • As your users never sleep, you must deploy with 0 downtime.
  • Deploying a big monolithic application on the cloud takes time.
  • Customer-specific extensions that must be deployed on a Software as a Service application.

Modularity is the answer to those issues.

What is a module?

Every decent programmer must probably remember this from his classes: a good design has loose coupling but high cohesion.

Coupling is prevented through the use of interfaces and by hiding the actual implementations. Each module has its public and private parts and other modules cannot “touch its private parts”.

But then comes the issue of instantiating the actual classes. A popular way to solve that problem is to use dependency injection. But you can also do it with some kind of service registry which you can ask to provide you with an implementation of type X. Each module notifies the registry about the interfaces he provides implementations for and which interfaces he consumes. That service registry can manage multiple versions of interfaces and choos the best match.

With a little help of design-time modularity, we can thus tame the spaghetti syndrome of the most complex application.

Runtime implementation

When we analyse the same issues from the runtime  view, the JAR file becomes THE unit. So how do we deal with module versioning and intra-module dependency management? How can we replace just 1 module at runtime?

The first part of the answer is: put the architectural focus on modularity! If you don’t group coherent functionality together, there’s no point using the second part of the answer. So that second part is: use a modular framework like OSGi (Jigsaw can be seen as an alternative to OSGi but is far less mature). But keep in mind that a modular framework is no guarantee, the key is a modular architecture.

How well does modular java play in the JavaEE game?

JavaEE is high level. OSGi is low level and provides no enterprise services (transactions, security, remote access, persistence, …) on its own. So you’re left with 3 options:

  • Deploying enterprise services as published OSGi services.
  • A hybrid approach with a classic JavaEE part + a modular part + a bridge between the 2 containers.
  • An OSGi “à la carte” approach.

The first option involves application servers that publish their services as OSGi modules. Glassfish does that. There is now a concept of WAB (Web Application Bundle) files that are WAR files whose servlets are deployed as OSGi resources. A similar system exist for EJB bundles but there is no standard yet.

The second option can be implemented with Weld as CDI container to provide CDI inside the bundle and consume and produce OSGi services.

I’ll complete the description of the third option when I get access to the Parleys’ video (sorry, I ran out of ink at that moment and writing this article 2 weeks after the conference doesn’t help either).

Deploying to the cloud

The main concern here is that a modular application may involve hundreds of bundles. But hopefully, something like Apache ACE platform can help you manage this and gives you the option to redeploy 1 module at a time.


I won’t go into the details of the demo. The easiest is to wait the Parleys’ video as this will be far more interesting than me trying to write what I’ve seen.


Fork/Join and Akka: parallelizing the Java platform BeJUG conference

Here is another article on a BeJUG conference. This time, Sander Mak, software developer and architect at Info Support, The Netherlands, gave us an overview of two concurrency frameworks: Fork/Join and Akka.

Fork/Join is a framework that was added to the standard libraries starting with JDK 7. Akka is an open-source alternative with emphasis on the resilience of concurrent process.


Setting the scene

The CPU speed has been growing up these last years until reaching some kind of a technical limit at around 3,5 GHz. Right now, a CPU is mainly idling while waiting for its I/O. That’s why the new trend is to have multiple CPUs.

But as Sander quote: “the number of idle cores in my machine doubles every two years”. There is an architectural mismatch because developers use to believe that the compiler and/or the JVM can handle parallelism on their own. But unfortunately, this isn’t true.


The first demos are declined around the computation of Fibonacci’s suite whose definition is

Of course, the objective here is not to find an optimal solution to that problem (transforming the recursive definition into an iterative form) but just apply a concurrent computation of the recursive form.

  1. We can solve this problem by creating 1 thread to compute fib(n-1) and another thread for fib(n-2), then wait they have finished their computation and adding the results.
    Immediately, the number of thread explodes.
  2. If we implement the same algorithm with 2 Callable objects and a thread pool, the number grows slowlier but is high anyway.
    The problem is that the current thread blocks while its two children finish.
  3. With Fork/Join, the thread dependency is explicit and thus the join method call doesn’t block the current thread.

Join/Fork works with an auto-sized thread pool. Each worker thread is assigned a task queue which gets fed by the fork method calls. The interesting behavior is that a worker is allowed to steal work from the task queue of another worker.

Another, more advanced, demo was also perform, demonstrating how to make a dichotomic search on a large set of cities to find which one are within a certain distance from a point. Of course, the algorithm is implemented with Fork/Join.

All the code of those examples is available on http://bit.ly/bejug-fj.

API & patterns

Problem structure

The algorithm must be acyclic (no thread can work with another thread that is already present in its call stack) and CPU-bound. I/O-bound problems wait a lot on blocking system calls and thus prevent those threads to perform other tasks.

Sequential cutoff

To avoid that the overhead consumes all the computation time, you must set a threshold to decide whether the problem should be solved sequentially or in parallel. This leads to define work chunks that are processed in parallel but each steps inside the same chunk are processed sequentially.

Fork once, fool me twice

Some algorithm implementations allows to reuse the current thread to do come computation instead of forking a new task, thus limiting the overhead.

Convenience methods

There exist convenience methods :

  • Method invoke() is semantically equivalent to fork(); join() but always attempts to begin execution in the current thread.
  • Method invokeAll performs the most common form of parallel invocation: forking a set of tasks and joining them all.

Future and comparisons

Fork/Join creates threads. It is thus currently forbidden in the EJB spec. When it comes to CDI or servlet specs, we are there navigating in some kind of grey zone. Maybe this could work with JCA work manager. @asynchronous could be used as an alternative.

Anyway, it is foreseen that JavaEE 7 spec may contain java.util.concurrent package.

Compared with Fork/Join, the more classic ExecutorService doesn’t allow work stealing. It is better suited at coarse-grained independent tasks. Bounded thread pools supports blocking I/O better.

MapReduce implementations they are targeted at cluster and not single JVM. While Fork/Join is targeted at recursive working, MapReduce is often working on a single map. Furthermore, MapReduce has no inter-node communication and thus doesn’t allow work stealing.

The popular critics about Fork/Join are:

  • The implementation is really complex.
  • The scalability is unknown above 100 cores. Which may seem many for a CPU but is far below current standards for a GPU.
  • The one-to-one mapping between the thread pool size and the core numbers is probably too low-level.

With the availability of JDK8, the Fork/Join API could be extended with methods on collections working with lambdas. There is also a CountedCompleter ForkJoinTask implementation that is better at working with I/O-based tasks and that is currently contained in JSR-166-extra).



Akka is a concurrency framework written in Scala that offers a Java API. I wouldn’t introduce Akka better than their author so here is what they say about it:

We believe that writing correct concurrent, fault-tolerant and scalable applications is too hard. Most of the time it’s because we are using the wrong tools and the wrong level of abstraction. Akka is here to change that. Using the Actor Model we raise the abstraction level and provide a better platform to build correct concurrent and scalable applications. For fault-tolerance we adopt the “Let it crash” model which have been used with great success in the telecom industry to build applications that self-heals, systems that never stop. Actors also provides the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications.


An actor is an object with a local state, a local behavior and a mailbox to receive messages sent to it. An actor processes only one message at a time. And as they are lightweight (around 400 bytes of memory per actor), you can instantiate many of them in a standard JVM heap.

The receive method of an actor is called when a message processeing begins and it is where all the processing is done. The framework itself is responsible for the actor management (thread pooling, dispatching, …).

Using a ForkJoinPool with Akka scales very well.

A demo of Akka usage is available on http://bit.ly/bejug-akka.


Restructuring: Improving the modularity of an existing code-base BruJUG conference

Here is an article about a BruJUG conference given on 26/04/2012 by the founder of Structure101 company, Chris Chedgey. The complete conference video is available on vimeo.

What is restructuring?

Refactoring and restructuring are both terms that imply non-functional changes.

Refactoring means changing the code to make it more readable. This also means invasive code editing. It usually involves only a few classes.

Restructuring means reorganizing the code-base without change to the code to improve the modularity and make the code-base easier to understand. This involves minimally invasive code editing but the scope is the whole code-base.

The code-base structure has 3 aspects:

  • the package composition
  • the dependencies between them
  • and the hierarchy of the “nested levels” of packages

The two code-base quality factors to consider here are complexity and modularity.

And why is it important?

Because a better code-base makes your code more understandable. And understandable code is cheaper to maintain and evolve. Changes have a more predictable impact on the code. And, of course, your code-base have a better testability, reusability. At the end, your code has a better value.


Complexity can be measured by different means. Two of them are fatness and tangles and can be used to measure complexity.

Fatness is when you have too much code in one place (number of method in a class, number of class in a package, number of package nested under the same package or in the same component, …).

Tangles occur when some code in a package reference code in another package which itself references code in the first package (cyclic dependencies).

Both fatness and tangles can be approximated by automatically by metrics which make them good candidate for automatic checking using thresholds (e.g. in your build system).

This diagram shows the link between tangles and fatness. This means that you can eliminate all tangles pretty easily by moving everything to the same place. But then you get 100% fatness. To the contrary, you can eliminate fatness by partitioning your code-base. But then you create tangles. What you seek is a compromise between the two. What you really don’t want is a code-base that is both fat and full of tangles.


Modularity is best-defined by the mantra “high cohesion, low coupling“.

Modularity can show itself by multiple means. One of them is well-defined public interfaces while the remaining internals are kept private. Another one is when your packages have a clear responsibility.

Unfortunately, the best way to assess modularity of the code is to make it checked by a human software architect.

So how can I work on my code-base structure?

Usually, the methods and classes are OK. But there is almost no logical organisation of classes into higher level modules (= packages in Java). Packages are too often used more like a filesystem and not as an embodiment of module hierarchy.

What you need to have a good code-base structure is to understand neatly the following aspects:

  • package composition and dependencies between packages
  • the flow of dependencies
  • the application business

Once you understand all of this correctly, you can define and achieve your architectural target.

Restructuring strategies

There are a lot of strategies you can use. Here are some chosen one.

Merge parallel structures

If you have parallel structure (one for presentation, one for services, one for persistence, one for extranet, one for intranet, etc.), you’d better merge them to minimize the dependencies between packages.

Bust very large class tangles early

You often find yourself with one or a few large classes tangles spanning many packages. Fixing these will improve your code-base rapidly.

Do as much as you can by only moving classes and packages first

This is a least invasive refactoring you can do to improve your complexity and modularity. Moreover, this requires low effort.

Bottom-up or top-down approach?

Both are valid but have different impacts.

Top-down approach keeps as much of the existing package hierarchy as possible. This means that the “psychological” impact on the application team will be minimized.

Bottom-up approach uses to end far away from the current structure but is often easiest to achieve.

Tackle complexity before modularity

A structure without tangles is way easier to manipulate.

Other strategies

  • Split packages that lack cohesion
  • Split fat packages and fat classes
  • Move tangles together
  • Make the restructuring a milestone


That your code-base is a mess regarding modularity matters is common.
Lack of structure costs money.
That lack can be salvaged.
Restructuring your code-base is not easy but huge returns can happen.


Here ends the “theoretical part” of the presentation and begin the examples, illustrated by the ReStructure101 software which helps the architect to visualize the current structure of his code-base and allows to simulate structure changes and their impacts.

That tool philosophy is to create a task list reflecting the changes done rather than changing the code-base directly. After a restructuring session, the architects ends up with that task list that he can perform himself or plan to be executed by other developers.

Plugins allows to use that task list easily inside IDE like Eclipse or IntelliJ.

I’d say I love this philosophy because it gives you the feeling you’re always in control and that you are not only executing a drag’n’drop session in a GUI but really modifying your code-base deeply.

Thanks BruJUG for this enlightening conference.

See you next time.

JavaFX 2.0 BeJUG conference

History and status

The presentation started with a quick history of Java. How it started as a desktop application programming language. How that rich client facet of Java almost disappeared completely behind the Java EE web application years ago for economic reasons. And then how it may come back in a near future with frameworks like JavaFX 2.

But what is JavaFX 2? JavaFX 2 is “a modern Java environment designed to provide a lightweight, hardware-accelerated UI platform that meets tomorrow’s needs”. In this assertion, every word is important. And if JavaFX 2 may address tomorrow’s needs, today, it is still a work in progress. Only the Windows platform has attained the General Availability stage. Mac OS X implementation should get out of the dark in the next months and you can download a quite useable Linux implementation from OpenJDK sources. But unfortunately, iOS and Android implementations of JavaFX are not yet available. This is really unfortunate.

It is foreseen that JavaFX will replace Swing and AWT as a standard graphic library starting with Java 8. But that won’t happen before Java FX becomes JavaFX 3.

So, pragmatically speaking, JavaFX 2 must be considered as experimental at the moment (and that’s also what the demo done during the second part of the presentation confirms). It is the advice of the first presenter too : “wait until Java 8 and JavaFX 3 before reaching production with a JavaFX application”.


There are 2 APIs to JavaFX 2: the FXML API which is a declarative XML interface; the second API is a Java API similar to the Swing interface.

A big difference from the former Swing API is the ability to render HTML content through the use of the WebView component. This makes it possible to enrich a classical web application with behaviours (such as Near Field Communication or eID reader integration) that are only possible in a rich client.

The first part of the presentation is ended by a Hello World demo which looked to me a lot like the tutorials I’ve made with Swing. At least, JavaFX looks more polished than its predecessor and simple things seem simple to program.

Real-World demo

The real-world demo features healthcare software to manage patient’s dossier and made by the HealthConnect firm.

A lot of CSS was used to style UI controls. The CSS properties are proprietary to JavaFX (they all begin with -fx- …) but look similar to HTML properties.

The accent is put on the calendar and dropup (with auto-complete) controls developed and heavily customized by HealthConnect.

It is also put on the observer pattern which allows binding a UI control to a model property. Once done, every change on one of the UI or the model is reflected instantly on the other. Unfortunately, to achieve this, JavaFX 2 has created its own JavaBean-like API with, for example, SimpleObjectProperty and ObservableList.

There also exists a Task API to ease concurrency management while running service callbacks only in the main UI thread.

Here are the various lessons learned from the development of the application:

What has been easier with JavaFX 2?

  • great look and feel
  • customization thanks to CSS
  • the binding between the model and the view thanks to the observer pattern

What has been more difficult with JavaFX 2?

  • hard to change the default behaviour of controls
  • fighting against the JavaFX rules of engagement leads to weird results (but which framework doesn’t behave like that?)

Here are now some resources mentioned during the presentation:

I hope you enjoyed this third article. See you soon for the next one.

PlayFramework 2.0 BeJUG conference

So here is my second post. This time, I cheated a little. As I was sick at the conference time, I had to catch up with the recording of the presentation that BeJUG team put on parleys.com. Big thanks to them!

So, what is PlayFramework?

PlayFramework is a web framework. It features Java and Scala as programming languages for both controllers and page template. As you’ve probably already figured out from the terms I used, this framework is more action-based than component-based.

With the framework comes a philosophy.

First, everything is put in place so that the developper can keep focused on what he’s doing. Meaning that you can use only a text editor (or an IDE if you prefer) and a browser once the PlayFramework server has been launched. Every feedback the framework gives you (and it gives you plenty of feedback) appears in the browser. And every change you make in the code is reflected immediately in the browser.

Second, the framework uses a compiler to validate all your files (even page templates and configuration files). If there is a syntax error in your Scala or Java file, a compilation error is displayed in the browser with an indication of the error and the line it occured. This is kind of standard behaviour. Nothing special about it. When it becomes really cool is when you get the same type of error for your template pages with indications relative to the original file (and not the generated class). So here we get to something similar to what we can get with a JSP interpreter. That’s cool. What is really cool is that we even get the same feedback detail level for configuration files. That rocks!

Third, the framework tries to not fight the HTTP protocol. That means that it doesn’t hide the inner bits of the protocol. Quite the contrary. You will thus find methods to ease the usage of HTTP response codes, the usage of cookies, …

From the given demo, it really seems to be a framework easy to use and easy to learn. And as it doesn’t try to hide technical details of the HTTP protocol and the browser rendering, I have good hope that everything you know about web development is directly useable with this framework. Good point thus for a framework that lessens the amount of plumbing needed to get started and still succeeds to not get into the way.

After that first impressive demo, came a description of the Enumerator – Iteratee pattern. This pattern actually comes first from Haskell (http://hackage.haskell.org/package/enumerator). It is composed of:

  • iteratee: a data sink that consumes input values and generate a single output value.
  • enumerator:  a data source that generates input values (from reading a file, a socket for example) and is passed an iteratee.
  • enumeratee: a data transformer that read from an enumerator and provide transformed data to an iteratee.

Instead of controlling the information flow from the source, in this pattern it is the iteratee that is in charge of telling the enumerator if it wants to get more data or not.

That pattern is featured heavily in Play. You can use it for example to manage Comet connections (HTTP connections that “never” closes and that streams data to which the client browser can react in real time). You just define your enumerator on the server and tell him what callback method to call on the client when it has data to process. This really ease the concurrency management and allows the server to keep up with thousands of connections with a reasonable hardware.

The presentation ends with the show of the sample applications embedded within the PlayFramework 2.0 distribution.

This is only a small summary of the presentation. If you wants to have a look at it, you can watch the complete presentation on Parleys.com (http://www.parleys.com/d/3143 for part 1 and http://www.parleys.com/d/3144 for part 2).

Hibernate Spatial BeJUG conference


Tonight I attended my first BeJUG conference about Hibernate Spatial. So here is a summary of this conference.

Hibernate Spatial is a Hibernate extension for storing and querying GIS data. Thus for those like me who have heard from GIS but don’t really know what hides behind the acronym, the first part was a nice introduction. Continue reading Hibernate Spatial BeJUG conference