Saturday, October 26, 2019

Scheduler Choice in Cluster Environment

Scheduler Choice in Cluster Environment Clusters have become more popular and ubiquitous and the number of processors in cluster have also increased considerably. They consist of collection of a homogeneous machines or a host of diverse computational devices which collaborate via a high speed network to execute high-performance applications. Computer industry has widely accepted that future performance increases must largely come from increasing the number of processing cores on a die. This has led to NoC processors. Efficient scheduling of high performance applications on these parallel computing systems is critical to enhance their performance and to improve system throughput. It has been proved that the problem of scheduling tasks with precedence constraints is NP-Complete [Papad, 1994]. The data flow model is gaining popularity as a programming paradigm for parallel computers. Many high-performance applications are a collection of modules which have control/data dependences among them. When the characteristics of an application is fully deterministic, including tasks execution time, size of data communicated between tasks, and task dependencies, the application can be represented by a Directed Acyclic Graph (DAG). With an increase in the number of processing units, expressing parallelism of an application has become a major challenge. Many studies have proved that designing parallel applications using both task and data parallelism is an effective approach than pure data or pure task parallel models. This mixed parallelism achieves both higher scalability and performance. Mixed parallel applications are represented as Parallel Task Graph (PTG), a graph of data parallel tasks. Understanding the importance of task scheduling on a parallel system, an attempt is made to address issues in scheduling multiple applications with the objectives of enhancing the performance of individual applications and also increasing the throughput the parallel computing system. In this thesis, we introduce two new algorithms Level Based Scheduler (LBS) and Improved Level Based Scheduler (ILBS) to schedule parallel applications represented as parallel task graph onto a cluster of multi-core processors with the objective of reducing their completion time. Both algorithms can be used both as static or hybrid schedulers. We argue that hybrid scheduler is a good scheduler choice in a cluster environment to optimize the utilization of its resources. We state that a better way to deal with multiple applications on a cluster is through adoptive space-sharing approach with a promise to benefit both the user and the cluster administrator. In a space-sharing approach, each application is given a set of processors and it is executed on these processors only. A parallel application can be run on a varied number of processors i.e. a moldable job. Hence we argue that it is good to change processor allotment for executing applications depending on the workload on cluster. To perform initial processor allotment and subsequent adaptations if required, methods to find the optimal and maximal number of processors that an application can utilize are developed. Also a novel method to share available processors among multiple competing task graphs is proposed. A framework is developed to bring together the proposed hybrid schedulers, methods to find processor requirement of each application, the scheme to share processors among multiple applicat ions and a new policy to decide processor allotment for each submitted application. Approaches to improve scheduling on a NoC processor is attempted. An approach to make any list scheduling method more time efficient to schedule a task graph on NoC is proposed and experimented. To schedule multiple applications on NoC, the number of cores and which cores to be assigned for each application must be decided. Our belief is that this job of deciding number of cores can be better performed by the joint collaboration of the user and system instead of any one doing it alone. Hence we have developed methods to find the optimal and maximal block of cores that an application can utilize which is later used to decide the actual core allotment for each application. Policies to decide how many and which cores to be assigned for each application are suggested. All the experiments in this thesis are carried out using a discrete event simulator. Benchmark task graphs are taken from different sources, from where other researchers have taken to compare their scheduler performance. The metrics makespan and efficiency of the schedule are used. The developed LBS is compared with MCPA the most widely accepted good scheduler and EMTS the recent PTG scheduler are chosen for performance comparison. The benchmark suite includes regular task graph, random task graph and few real applications task graph. For regular task graphs LBS shows in improvement in makespan by 2-9% in comparison to MCPA. But for irregular PTGs, LBS shows 4-12% performance improvement over MCPA, which is significantly higher than for regular PTGs. Since EMTS uses evolutionary methods, it generates better schedule but at the expense of more computing time. The proposed LBS performance is inferior to EMTS by around 2-7% and 2-4% for regular and random PTGs respectively. Another metric used is the efficiency which is a measure of effective utilization of resources. The efficiency of LBS is more than MCPA, but the improvement is less than that for makespan. This is attributed to the fact task allocation in MCPA leads to better utilization of processors than in L BS. Efficiency of LBS is more than MCPA by 1-3% and less than EMTS by 1-2%. Another scheduler ILBS is compared with LBS and TwoL[rauber 1998], a good method to schedule set of independent tasks. ILBS exhibits performance improvement of 2-7% over LBS and 2-10% over TwoL for regular PTGs. For random PTGs improvement is 6-12% over LBS and 4-8% over TwoL. The increased performance of ILBS for regular PTGs is attributed to the method of finding of the best possible schedule at each level. The performance of the proposed novel method of sharing processors among multiple task graphs is compared with the most recent methods suggested by Tapke et al. The new method exhibited a performance improvement of 6-9% for all categories of task graph and is maximum when the demand for the processors is relatively more than available processors. A complete framework is developed to tailor together the pieces of work carried out. The new policies suggested to decide processor allotment for each task graph show 4-7% performance improvement in average completion time of a task graph. The proposed policy also exhibits better performance for the time required to complete a set of task graphs by 4-7%. Thus the new policy is good from both user and system perspectives. The approach to make list scheduling method more time efficient to generate a schedule for a NoC processor is implemented in DLS[] method and it recorded around 20-45% improvement in execution time. The time is recorded by executing the application on the cycle accurate multi2sim simulator. The new policy proposed to decide the cores allotment for each application performs better than the best methods found in the literature by 4-20%. The issues in scheduling multiple applications on a cluster of multi-core processors and a NoC processor is addressed in this thesis. The observed performance improvement indicate the usefulness of proposed methods.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.