Java 8 MOOC - Session 2 Summary

As I mentioned last week, the Sevilla Java User Group is working towards completing the Java 8 MOOC on lambdas and streams. We're running three sessions to share knowledge between people who are doing the course.

The second week's lesson was about Streams - how you can use the new stream API to transform data. There was also a whole section on Optional, which initially seemed like rather a lot, but it turns out that Optional can do rather more than I originally thought.

In the meetup session, we talked about:

Optional

We were pretty comfortable, I think, with using Optional to prevent a NullPointerException. What we weren't so clear on were the examples of filter() and map() - if you were getting your Optional values from a stream, why wouldn't you do the map and the filter on the stream first? For example, why do this:

list.stream()
    .findFirst()
    .map(String::trim)
    .filter(s -> s.length() > 0)
    .ifPresent(System.out::println);

when you could map and filter in the stream to get the first non-empty value? That certainly seems like an interesting question in relation to streams.

I can see Optional being more useful when other APIs fully support Java 8 and return Optional values, then you can perform additional operations on return values.

That terminal operation's not actually terminal??

We ran into this a couple of times in our examples in the session, one example is the code above (let's copy it down here so we can look at it more closely):

list.stream()
    .findFirst()
    .map(String::trim)
    .filter(s1 -> s1.length() > 0)
    .ifPresent(System.out::println);

Isn't findFirst() a terminal operation? How can you carry on doing more operations on that?

The answer is, of course, that the return type of the terminal operation can also lead to further operations. The above is actually:

Optional<String> result = list.stream()
                              .findFirst();
result.map(String::trim)
      .filter(s1 -> s1.length() > 0)
      .ifPresent(System.out::println);

Our terminal operation returns an optional, which allows you to do further operations. Another example of this confusion:

list.stream()
    .map(String::toLowerCase)
    .collect(toList())
    .forEach(System.out::println);

Here, collect() is a terminal operation, but it returns a list, which also allows forEach():

List<String> results = list.stream()
                           .map(String::toLowerCase)
                           .collect(toList());
results.forEach(System.out::println);

So be aware that just because it's called a terminal operation, doesn't mean you can't perform other operations on the returned value.

Parallel/sequential/parallel

There had been a question in the previous week about why you could write code like this:

list.stream()
    .parallel()
    .map(String::trim)
    .sequential()
    .filter(s1 -> s1.length() > 0)
    .parallel()
    .forEach(System.out::println);

and whether that would let you dictate which sections of the stream were parallel and which were to be processed in serial. Lesson two set the lesson straight, declaring "the last operator wins" - meaning all of the above code will be run as a parallel stream. I can't find any documentation for this, I'll edit this post if I locate it.

Unordered

"Why would you ever want your stream to be unordered?" - the answer is that unordered() doesn't turn your sorted collection into one with no order, it just says that when this code is executed, the order of elements doesn't matter. This might make processing faster on a parallel stream, but as a group we figured it would probably be pointless on a sequential stream.

Efficiency optimisations and order of stream operations

We had a long conversation about the order in which you perform operations in a stream. The MOOC (in fact, most documentation around Streams) tells us that a) streams are lazy, and not evaluated until a terminal operator is encountered and b) this enables optimisation of the operations in the stream. That lead to a discussion about the following code:

list.stream()
    .map(String::toLowerCase)
    .filter(s -> s.length() % 2 == 1)
    .collect(toList());

The filter operation should result in less items to process in the stream. Given that the map() operation doesn't change anything that filter() relies on, will this code be optimised somehow under the covers so that the filter is actually executed first? Or are optimisations still going to respect the order of operations on a stream?

Our case is actually a very specific case, because a) the map() returns the same type as the params passed in (i.e. it doesn't map a String to an int) and b) the map() doesn't change the characteristic the filter() is looking at (i.e. length). But generally speaking, you can't expect these conditions to be true - in fact I bet in a large number of cases they are not true. So pipeline operations are performed in the order in which they are written, meaning that our map and filter will not be re-ordered into a more efficient order.

A good rule of thumb seems to be to do filtering as early in the stream as possible - that way you can potentially cut down the number of items you process in each step of the stream. Therefore our code would probably be better as:

list.stream()
    .filter(s -> s.length() % 2 == 1)
    .map(String::toLowerCase)
    .collect(toList());

Flat Map

what...?

flatMap() is one of those methods that makes total sense once you get the hang of it, and you don't understand why it was so confusing. But the first time you encounter it, it's confusing - how is flatMap() different to map()?

Well, flatMap is used to squish (for example) a stream of streams into just a simple stream. It's like turning a 2-dimensional array into a single dimension so that you can iterate over all the items without needing nested for-loops. There's an example on StackOverflow, and some more examples in answer to this question.

Comparators

We've probably all written comparators at some point, it's probably one of those examples where we really did use anonymous inner classes "in the olden days" and were looking forward to replacing them with lambdas.

reader.lines()
      .sorted(new Comparator<String>() {
          @Override
          public int compare(String o1, String o2) {
              return ???;
          }
      })
      .collect(toList());

Sadly, using a lambda still doesn't answer the question "do I minus o1 from o2, or o2 from o1?":

reader.lines()
      .sorted((o1, o2) -> ??? )
      .collect(toList());

But there's yet another new method in Java 8 here that can save us, one that is not nearly as well publicised as it should be. There's a Comparator.comparing() that you can use to really easily define what to compare on. The JavaDoc and signature looks kinda confusing, but this is one of those places where method references suddenly make loads of sense:

reader.lines()
      .sorted(comparingInt(String::length))
      .collect(toList());

(Here we're actually using the comparingInt method as we're going to compare on a primitive value). Personally this is one of my favourite new features in Java 8.

Join us next week for the last session on Java 8 - Lambdas and Streams.

Author

Trisha Gee

Trisha is a software engineer, Java Champion and author. Trisha has developed Java applications for finance, manufacturing and non-profit organisations, and she's a lead developer advocate at Gradle.
View all posts