Mor Harchol-Balter

Bruce J. Nelson Professor of Computer Science, Computer Science Department, School of Computer Science

Mor Harchol-Balter is interested in the performance analysis and design of computer systems, particularly distributed systems. She uses analytical models to capture the important characteristics of a computer system, and then proves theorems about these models to redesign the system to improve its performance. Here “performance” might denote response time, energy use, throughput, capacity, etc. Most of her research involves inventing new analytical techniques in the area of performance analysis, as well as new algorithms for resource allocation.

Unlike many theoretical computer scientists, her analysis is based on stochastic (probabilistic) models of computer systems. There is no “adversary” sending us worst-case inputs. By contrast, there is a stream of requests, whose arrival times and service demands come from empirically fitted distributions. These distributions might be correlated, and they often exhibit heavy tailed service demands and high variability.

She believes that many conventional wisdoms on which we base computer system designs are not well understood and sometimes false, leading to inferior designs. Her research revisits very classic questions in system design. Here are a few examples of commonly-held beliefs that her research challenges:

Thousands of “load balancing” heuristics do exactly that—they aim to balance the load among the existing hosts. But is load balancing necessarily a good thing?
Ever notice that the optimal scheduling policy, Shortest-Remaining-Processing-Time-First (SRPT), is rarely used in practice? There's a fear that the big jobs will “starve”, or be treated unfairly as compared with Processor-Sharing (PS). But is this a reality?
To minimize mean waiting time in a server farm, research suggests that each job should be sent to the server at which it will experience the least wait. That seems good from the job's perspective, but is the greedy strategy best for the system as a whole?
Given a choice between a single machine with speed s, or n identical machines each with speed s/n, which would you choose? Think again ...
Migrating active jobs is generally considered too expensive. Killing jobs midway through execution and restarting them from scratch later is even worse! Says who?
Cycle stealing (using another machine's idle cycles to do your work) is a central theme in distributed systems and has been the topic of thousands of papers. But can we quantify when cycle stealing pays off, as a function of switching costs and threshold parameters?
Redundancy is the idea of making multiple copies of the same job, where each copy is dispatched to a different queue, and where one only needs to wait for the first copy to complete. When does redundancy help?

Office: 7207 Gates and Hillman Center

Phone: 412.268.7893

Fax: 412.268.5576

Email: harchol@cs.cmu.edu

Assistant: Patricia Loring

Google Scholar: Mor Harchol-Balter

Websites: Mor Harchol-Balter’s website