Wednesday, September 5, 2007

Batch scheduling - grid computing

One of the recent problems we have been looking at is how to correctly schedule batches of tasks in grid computing where the batches consist of multiple synchronization barriers. For instance, REM (Replica Exchange Management) from the field of bio-complexity sends out a set of N tasks that are synchronized X times (the replica exchange) over the course of the simulation. In short at each synchronization point, all N replicas must finish their respective computation and then a small subset of the data is then exchange to help drive the next set of simulations. Loss tolerance in an m,k sense (m out of k must finish) varies depending upon the application but our current bio-complexity group does not allow for it in their batches.

Currently, the state of the art seems to be employing a catchup cluster of extremely fast machines that notice lagging execution hosts and then migrate the sub-task to the faster catchup cluster. At first glance, this appears to be a rather brute force mechanism for improving performance. While it is hard to argue that it does not have a benefit (who wouldn't benefit from having an idle cluster of fast machines?), there is some interesting theory and tradeoffs to be examined in what the optimal schedule would be and what sort of missed opportunity comes from dedicating the catchup cluster. Moreover, there are certainly practical tradeoffs with regards to job migration (network transfer time) that are also of concern for capturing the tradeoffs correctly. Toss in heterogeneous job execution length (due to parameters), heterogeneous processing capacity of the grid, and possibly the potential for job failure (eviction, node crashing, etc.) and it all gets complicated fairly quickly.

A fascinating problem as it has roots in both grid computing and real-time computing / scheduling (my initial graduate school work). Most interesting is that I think it can draw from some of the properties of the multi-processor EDF (Earliest Deadline First) bound estimation work and may or may not need to employ heuristic-based schemes ala the original Spring kernel scheduling approach. Comments are of course welcome if anyone has any work of note in this particular area.

No comments: