Last night was the final get-together to discuss the Java 8 MOOC. Any event hosted in August in a city that is regularly over 40°C is going to face challenges, so it was great that we had attendees from earlier sessions plus new people too.
The aim of this session was to talk about Lesson 3, but also to wrap up the course as a whole: to talk about what we liked and what we would have improved (about both the course itself and our user group events).
findAny() vs findFirst()
Why do we need both of these methods, and when would you use them?
findFirst() is the deterministic version, which will return you the first element in the Stream (according to encounter order - see the section on Ordering in the documentation). So, regardless of whether you run the operation in parallel or serial, if you're looking for "A" and use
findFirst with this list:
["B", "Z", "C", "A", "L", "K", "A", "H"]
you'll get the element at index 3 - the first "A" in the list.
findAny() is non-deterministic, so will return you any element that matches your criteria - it could return the element at index 3, or the one at position 6. Realistically, if the stream is on an ordered collection like a list, when you run
findAny on a sequential stream, I expect it will return the same result as
findFirst. The real use-case for
findAny is when you're running this on a parallel stream. Let's take the above list, and assume that when you run this on a parallel stream it's processed by two separate threads:
["B", "Z", "C", "A", // processed by thread 1 "L", "K", "A", "H"] // processed by thread 2
It's possible that thread 2 finds its "A" (the one at position 6) before thread 1 finds the one at position 3, so this will be value that's returned. By configuring the Stream to return any one of the values that matches the criteria, you can potentially execute the operation faster when running in parallel.
findAny is (potentially) faster in parallel and (probably) returns the same value as
findFirst when running in serial, why not use that all the time? Well, there are times when you really do want the first item. If you have a list of DVDs ordered by year the film was released, and you want to find the original "King Kong" (for example), you'll want
findFirst to find the one released in 1933, not the one that was released in 1976 or the one from 2005.
findFirst is not always going to be slower than
findAny, even in parallel. Going back to our list:
["B", "Z", "C", "A", "L", "K", "A", "H"]
findAny for "H" could be the same performance for both methods.
Maybe it's just me who doesn't really see the big picture for collectors. I'm perfectly content with the built in collectors like:
It's easy to see what they do, and work out when you need to use them.
I'm also very happy to have discovered
a super-useful way to create Comma Separated Values (CSVs) that I use in my Java 8 demo.
Where things get a bit murky for me is where we start chaining up collectors:
(it should be obvious from my lack of clear example that I'm not 100% certain under which circumstances these are useful).
As a group, we think the chained collectors are kinda ugly - not because we're against chaining (we like Streams), but maybe because it's another chain inside a param to a chain.
We think this is an area where some good, solid examples and a bit of daily use will make it much clearer to developers. We hope.
Related to this, the course didn't go into creating your own collectors at all. My personal (under-informed) opinion is that I guess most developers should be able to use either the out-of-the-box collectors (
toList etc) or use the collector chaining to build what they need. If you need a custom collector, perhaps you haven't considered everything that's already available to you. But as a group, we decided we would have liked to see this topic anyway so that we could get a deeper understanding of what collectors are and how they work.
Exercises for lesson 3:
Well. What can we say? I really hope there are people reading this who haven't finished the course yet, because the Sevilla Java User group would like to say to you: don't despair, the lesson 3 exercises are substantially harder than those for lessons 1 and 2. Honestly, the whole group considered it less of a learning curve and more of a massive cliff to climb.
I mean, it was great to have something so challenging to end on, but it probably would have been less ego-destroying
if we could have got up to that level gradually instead of having it sprung on us.
The good thing about Part 2 of the lesson 3 exercises was that we had three very different answers to discuss in the group. None of us were super happy with any of them, but we could see definite pros and cons of each approach, and that's something you really want to learn in a course like this.
It was also really great to have a rough performance test to run on your own computer, so that you could really see the impact of your choices on the performance of the stream.
For more info
I'm going to add a shameless plug to a friend's book here. I've been reading a lot about Java 8 for this course, for my Java 8 demo, and to generally get up to speed. My favourite book for getting to grips with lambdas and streams is Java 8 Lambdas: Pragmatic Functional Programming by Richard Warburton. This book also contains more info about collectors too, so maybe some of our questions around how to use these in more complex situation are answered in here.
We really enjoyed the MOOC, and the sessions to get together to discuss it. We particularly liked that the meetups were a safe place to ask questions and discuss alternative solutions, and that we weren't expected to be genius-level experts in order to participate fully.
If/when Oracle re-runs the MOOC, if you didn't get a chance to take part this time I highly recommend signing up. And if you can find (or run) a local meetup to discuss it, it makes the experience much more fun.