Intro to Ray

Published 11-03-2019 18:27:14

For as long as I can remember, I learned things by making analogies with either real or imaginary setups. I applied this mainly to what I decided to pursue as a career, software engineering. Abstract concepts would in my head usually get translated to “Oh it’s just like that thing in football!”, “Oh it’s just like airports” and so, now I’ve decided I should give it a go at writing things down.

As I’ll aim to learn tools and frameworks that I find interesting, I’ll put them in the context of nothing else, but my favourite fantastical world ever, Harry Potter. I’ll start with a distributed computing framework developed by the people behind RISE labs in UC Berkeley, called Ray. Then I’ll move onto 2 libraries built on top of Ray, Tune and Rllib. I’ll maintain this blog mainly for my own reference, but if others find it useful, that’s even better! Hope the series will be interesting, useful and if all fails, at least funny!

Ray - The Burrow

I’m bringing Ray into the Burrow – the cosiest famous residence of the Weasley’s. All objects in Burrow are animated – they were all given some tasks and they’re executing them at the same time.

What Ray does is it helps when you want to run a lot of repetitive tasks in parallel. You can either run them in parallel on a single machine across its cores or on a cluster of machines. It relies on a series of concepts to schedule all the work and keep track of its progress. I like to start with the bigger picture and then drill down into the details. This will spread out across multiple posts. So let’s map the concepts to the Burrow.

Let’s presume cooking dinner for the 7 Weasleys, plus guests, requires really complex computations and scheduling, where multiples utensils in the kitchen need to work in parallel and perform different tasks.

In our case, the dinner is the application we want to execute, and the way Ray represents this application is as a graph of dependent tasks (e.g. you need to chop the carrots before adding them to the boil). Moreover, this graph is dynamic, so it evolves during execution (e.g. you’re performing more steps from the recipe as you go and they get added to the graph as you perform them).

Fig 1. Diagram of tasks and actors - the blue parts represent independent spells (taks) while the green sides are copies classes (actors) that preserve state between their spell invocations.

To introduce the next part, we’ll briefly define the difference between stateless and stateful computations. A computation is stateful, if we need to store previous interactions, like keeping a record of the spells we casted before. On the opposite side, as you’d expect, stateless computations are independent, they only care about the data they currently hold, nothing from the past. Hence, in Ray they can be executed on any node in the system. We’ll find out later what nodes are.

So let’s try and define now Ray’s two main programming abstractions: tasks and actors (Fig. 1). What do they mean and when would you want to use one or the other?

Tasks

A task is like casting a standalone spell. It is the execution of a remote function on a stateless worker. You might wonder what workers are, we’ll come to those in a bit as well. All the data the spell needs comes to it as you cast it. When you cast it, imagine you immediately receive a promise, called future, that this spell will eventually execute and return the result we wanted. As we all know, that’s not always the case and the spell might fail, returning an error result. This helps account for possible faults along the way - between casting the spell and obtained the desired result back.

Alt text for my gif

Actors

An actor allows stateful computations. It has a series of spells (methods) that can be invoked remotely and executed serially. This means all spells share a mutable actor state. They all belong to one Python class. You can pass a handle to an actor, so that other actors or tasks can cast spells on it. In this case, methods called on different actors execute in parallel.

In dinner prep terms, if I cast the boiling spell on water in 10 pans, they’ll start at the same time in parallel. If I cast the boiling spell 10 times on the same pan, all the water in the pan shares the same state, so it’ll gradually increase the temperature more and more with each new boiling spell I cast.

@ray.remote(num_gpus=1)
class Pan(object):
	def __init__(self):
		self.temperature = 0
	
	def fervere(self):
		self.temperature += 10
		return self.temperature

# create 10 Pan actors
pans = [Pan.remote() for _ in range(10)]
cast_fervere_on_all_pans = ray.get([pan.fervere.remote() for pan in pans])
# prints [10,10,10,10,10,10,10,10,10,10]
cast_ferevere_on_one_pan = ray.get([pans[0].fervere.remote() for _ in range(5)]
# prints [10,20,30,40,50]

References