lanxicy.com

µÚÒ»·¶ÎÄÍø ÎÄµµ×¨¼Ò

µÚÒ»·¶ÎÄÍø ÎÄµµ×¨¼Ò

Future Generation Computer Systems 28 (2012) 104¨C111

Contents lists available at SciVerse ScienceDirect

Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs

Feedback-based optimization of a private cloud

Hamoun Ghanbari a,b,? , Bradley Simmons a , Marin Litoiu a , Gabriel Iszlai b

a b

Department of Computer Science and Engineering, York University, 4700 Keele St, Toronto, Ontario, M3J 1P3, Canada Centre for Advanced Studies (CAS), IBM Toronto Lab, 8200 Warden Avenue, Markham, Ontario, L6G 1C7, Canada

article

info

abstract

The optimization problem addressed by this paper involves the allocation of resources in a private cloud such that cost to the provider is minimized (through the maximization of resource sharing) while attempting to meet all client application requirements (as specified in the SLAs). At the heart of any optimization based resource allocation algorithm, there are two models: one that relates the application level quality of service to the given set of resources and one that maps a given service level and resource consumption to profit metrics. In this paper we investigate the optimization loop in which each application¡¯s performance model is dynamically updated at runtime to adapt to the changes in the system. These changes could be perturbations in the environment that had not been included in the model. Through experimentation we show that using these tracking models in the optimization loop will result in a more accurate optimization and thus result in the generation of greater profit. ? 2011 Elsevier B.V. All rights reserved.

Article history: Received 21 January 2011 Received in revised form 11 May 2011 Accepted 28 May 2011 Available online 6 June 2011 Keywords: Optimization Modeling State estimation Private cloud PaaS IaaS

1. Introduction Advances in virtualization [1] techniques and the construction of numerous large commodity data centers around the world [2] have resulted in a new approach to computing referred to as cloud computing [3,2,4,5] becoming an important topic of research and development. The cloud, though still in its infancy, typically refers to some set of computing resources (infrastructure (IaaS), platform (PaaS) or software (SaaS)) being provided on demand over the Internet to users as a service. A private cloud represents a set of virtualized data centers under the ownership of a single administrative domain (i.e., the cloud service provider). Unlike in a public cloud where the various layers may be offered by multiple providers the entire stack (IaaS,PaaS and SaaS) is controlled by a single provider and so it has access and control over the various applications, middlewares and infrastructure simultaneously. The main objective of a private cloud provider is to maximize profit (i.e., revenue¨Ccost). Optimization techniques allow the provider to determine resource

allocations to various clients in order to best maximize its revenue while minimizing its costs.1 Due to these economic benefits, optimization has been the subject of much investigation [11¨C18]. The decisions made by a provider with regards to deployment of application tiers in the cloud and resource allocation to application environments can be enforced through scale-up/down (i.e., adding/removing resources to individual virtual machines (VM)), scale-out/in (i.e., adding/removing VMs to an application environment), and migration (i.e., moving VMs over the physical infrastructure) and will directly impact both the performance of an application and the provider¡¯s cost of operations. Here we focus only on scale-up/down. The optimization problem addressed by this paper involves the allocation of resources in a private cloud such that cost to the provider is minimized (through a maximization of resource sharing) while attempting to meet all client application requirements as specified in their respective Service Level Agreements (SLA)s2 [19¨C22] (see Fig. 1). Many of the current optimization

? Corresponding author at: Department of Computer Science and Engineering, York University, 4700 Keele St, Toronto, Ontario, M3J 1P3, Canada. Tel.: +1 416 265 7585. E-mail addresses: hamoun.gh@gmail.com, hamoun@cse.yorku.ca (H. Ghanbari), bsimmons@yorku.ca (B. Simmons), mlitoiu@yorku.ca (M. Litoiu), giszlai@ca.ibm.com (G. Iszlai). URLs: http://www.ceraslabs.com/people/hamoun-ghanbari (H. Ghanbari), http://www.ceraslabs.com/people/dr-bradley-simmons (B. Simmons), http://www.ceraslabs.com/people/dr-marin-litoiu (M. Litoiu).

0167-739X/$ ¨C see front matter ? 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2011.05.019

1 Notice that in a layered cloud, optimization is decomposed into a dynamic infrastructure pricing mechanism offered by provider [6,7] and elastic resource allocation policies employed by individual consumers [8¨C10] to satisfy their QoS requirements. 2 An SLA is a contract which defines the relationship between a service provider and its clients that fully specifies all obligations for both parties, the price to be paid for the service(s) offered and associated penalties should obligations be unmet. It can be quite complex and comprehensive (e.g., considering aspects of both functional and non-functional requirements); however, in this work, only performance objectives that can be extracted from an SLA are considered. No attempt is made to fully model or develop an SLA or an SLA management framework.

H. Ghanbari et al. / Future Generation Computer Systems 28 (2012) 104¨C111

105

Fig. 1. The interaction of layers in our optimization mechanism.

Fig. 3. Smooth service level utility function; vertical line indicates the servicel level objective of an application (as defined in SLA).

say pi , has one resource, and thus has a fixed capacity ci in one-dimensional space.3 The allocation of VMs on PMs can be represented by an n ¡Á m matrix A. Each element aij of A denotes a resource allocation defining the percentage of the total resource (i.e. CPU) capacity of the PM i allocated to a running VM of application j. 2.1. Problem formulation A global utility function U0 is expressed as the difference of the sum of application-provided resource-level utility functions and an operating cost function as follows:

?

U0 =

? ?

j¡ÊApp

uj (sj (Aj ))

? ¦Ø.cost (A)

(1)

Fig. 2. Feedback based optimization of resource shares.

approaches, while efficient, assume static models [11,23,24,12¨C18]. In this paper we attempt to solve the optimization problem through a feedback-based loop with dynamic models. The resulting optimizer will be compared to one using static models to demonstrate the benefits of this proposed approach. The remainder of this paper is structured as follows: Section 2 describes our feedback based approach for optimization of resource shares in a private cloud. This involves the introduction of general formal definitions used during problem formulation and the description of the estimator component of the feedback loop. Section 3 explains the optimizer component of the loop. The applicability of the proposed approach is demonstrated by the case studies in Section 4. Related work, conclusions and future works are discussed in Sections 5 and 6. 2. Proposed approach We propose a feedback based Cloud Optimization Manager (COM) as shown in Fig. 2. In this approach, each application maintains both a dynamically updated performance model (see Section 2.3) and a utility model (which defines a specific service level objective e.g., response time). The COM has access to the utility model of each application. Periodically, each application submits its performance model to the COM which performs a system-wide global optimization (see Section 3) using this information and determines new resource allocations for each application for the following period. Consider applications app1 , . . . , appm . Each application runs within one or more VMs and experiences a particular workload. Assume that there are n physical machines (PM)s in the data center, represented by the set P = p0 , . . . , pn . VMs are hosted on PMs on behalf of applications. Let us assume that the physical server environment is homogeneous and each physical machine,

where ¦Ø denotes an adjustable weight (working as a tunable parameter for the administrator), sj denotes the service level function which maps the application j¡¯s resource allocations (i.e. Aj ) to the service level measure (e.g. response time) of the application, uj is the local utility function for application j, A represents the allocation matrix of VMs on PMs (defined earlier), and Aj represents the j¡¯th column of A. U0 can be associated with the profit of the cloud and the two terms of Eq. (1) represent the revenue and the cost respectively. Our objective is to maximize U0 subject to a set of capacity constraints which come from the physical layer of the private cloud. The optimization problem addressed here can be expressed as follows: maximize: U0? subject to: ?i

? ?

j¡ÊApp

aij < ci

(2)

?i ?j (aij ¡Ê [0, ci ]).

It is assumed that each allocation signal aij is constrained to lie in the interval [0, ci ] meaning that an application can get the whole capacity of a PM. Notice that the problem has a best effort nature and we treat a service level objective (i.e. target on a specific QoS metric) as a soft constraint by incorporating it into the objective function. Fig. 3 represents a sample service level utility function where the vertical line indicates the SLA target of an application and utility decreases as the value of sj approaches the SLA limit. It is worth noting that any decreasing differentiable function can be used instead. However, the shape of a function, especially after passing the SLA threshold, will impact the behavior of the optimization algorithm.

3 In the case of multi-resource modeling c can be substituted with c r , where c r i i i is the capacity of resource r of pi .

106

H. Ghanbari et al. / Future Generation Computer Systems 28 (2012) 104¨C111

2.2. Modeling QoS attributes based on resource allocation A performance model can be used to quantitatively relate the physical layer specification of a customer¡¯s application (e.g. its given resource shares) and its associated workload to a service level measure (e.g. response time). We assume that the service level function of each application j, referred to as sj , can be modeled by a function g with the column j of A (a1j , a2j , . . . , akj ) and ¦Ój , dj as parameters: sj = g (a1j , a2j , . . . , anj , ¦Ój , dj ) (3)

The filter maintains the estimate of Bj,n and updates it using the linear feedback equation based on new measurements ¦Õj,n and zj,n : Bj,n = Bj,n?1 + Kj,n (zj,n ? sj,n ) (5) where n denotes a discrete time index, Kj,n , the dynamically computed ¡®¡®Kalman Gain¡¯¡¯ matrix and zj,n ? sj,n denotes the prediction error.6 With certain assumptions, the calculated Kalman gain, Kj,n , is guaranteed to minimize the estimation errors for Bj,n . The updated state will result in a dynamic model f (¦Õj,n , Bj,n ) that will be accessed multiple times during optimization cycles. Function sj and its non-measurable and measurable parameters can have many representations (as shown in Section 4). 3. Optimization In this section, we solve the introduced optimization problem, as defined in Eq. (2), through primal decomposition. Algorithm 1 represents a centralized solution for this problem using the primal method. The equations referenced by the algorithm are listed in Table 1.

input : initial capacities A0 , kmax , performance model sj and utility model uj for each app j output : optimal allocation Aopt , Maximum utility fpbest initialize: fp = ?¡Þ, fpbest = ?¡Þ, A = A0 while k < kmax do 3 compute constraint value Ui for each PM using equation 1.

1 2 4 5 6 7 8

where aij is the number of allocated CPU cycles (i.e., measured in MIPS), ¦Ój denotes the request inter-arrival time in seconds associated with the workload of the application4 and dj is application j¡¯s hardware demand per request (i.e. the number of CPU cycles for each request to be processed). The model is further simplified ¡Æ by using the aggregate resource capacity measure ¦Ãj = i=1...n aij instead of the vector of individual values (a1j , a2j , . . . , anj ). We will assume that all applications deployed on the infrastructure adhere to the same performance model differing only in the setting of various configurable parameters. While there are several types of models (the model we use is described fully in Section 4) the onlineestimation techniques and optimization method are general and will be considered next in Sections 2.3 and 3. 2.3. Online estimation In general the goal of any optimal control approach is to choose a sequence of feasible control actions that maximizes a defined performance criterion (or objective function).5 Since calculating the cost and performance criterion involves projection of a system¡¯s behavior and state in the future, there is a need to have a model of the system over time. This model can be either deterministic or stochastic, and continuous-time, discrete time, or steady-state. Adaptive models are used to dynamically take into account the system differences and re-tune the model. In adaptive models, parameter estimates of a model are bootstrapped and improved iteratively over time as more data is revealed. This is different from offline construction of the model using training data in supervised learning. In adaptive models the process of estimation (i.e. finding the most likely hidden (or non-measurable) state sequence given a model and an observation sequence) and model identification (i.e. finding the probabilistic relation state variables over time) are mixed together to form an unsupervised learner. The Kalman filter [25] is a well-known parameter estimator of this type which can compensate for system differences from the base model at runtime for linear dynamical systems. In the algorithm presented in the next section we performed dynamic tuning of the models sj using an extended Kalman filter, a variant of the Kalman filter for a non-linear measurement equation. The filter is able to take into account the measurements during runtime and update the model. Focusing only on one application called j, let Bj,n be a vector representing the model coefficients and ¦Õj,n = [¦Ãj,n , ¦Ój,n , dj,n ] be the measured inputs of the system (i.e. capacities, inter-arrival time and demand) at step n. Let zj,n be the measured system output (e.g. response time) at step n. The model maps ¦Õj,n and Bj,n to the modeled output sj,n : sj,n = f (¦Õj,n , Bj,n ) (4)

if no constraint is violated, move towards optimum: (max(Ui (A)) > 0) then compute global utility function using equation 2. record the global maxima using equation 3. calculate objective function¡¯s subgradient using equation 4. move towards subgradient and the optimum using equation 5. else if there is some violated constraint then find most violated constraint using equation 7. compute the subgradient value of most violated constraint using equation 8. select step size based on equation 9. move away from subgradient and the optimum using equation 6. project each individual variable based on its local constraint using equation 10. k = k + 1;

9 10 11

12 13

14

15 16

Aopt = A; Algorithm 1: A centralized solution for the problem using primal method. The equations referred to are presented in Table 1.

where f is a runtime approximation of the service level function (or performance model) g defined earlier.

The input to the algorithm includes the initial capacities (i.e., A0 ), the number of iterations during optimization (kmax ), the performance model (sj ) and utility model uj for each application. The output of the algorithm is the optimal allocation matrix Aopt , and maximum utility gained from that allocation, fpbest which is obtained iteratively using kmax iterations. To demonstrate how the optimization algorithm performs, we optimize a pre-built model of one physical machine hosting two applications, each running on a single VM. Both applications have CPU demands of 5100 and 4100 MIPS per request respectively and response time thresholds (i.e. rthrj ) of 16 and 7 respectively. The request interarrival time of both applications is 22 s, and their umax

4 s is usually the average response time of the application. j 5 One can instead say optimal control minimizes an expected total cost function.

6 Notice that s was calculated using Eq. (4). j,n

H. Ghanbari et al. / Future Generation Computer Systems 28 (2012) 104¨C111

107

Fig. 4. One sample run of subgradient optimization algorithm, finding the maximum utility using variations in capacity.

is 10. A single PM¡¯s CPU, which is used to host these applications, has a total capacity of 1200 MIPS. Assuming capacities a11 and a12 are the CPU allocation for VM1 and VM2 Fig. 4 shows the whole configuration space over these allocations, constrained by inequalities: a11 + a12 < 1200, a11 > 280, and a12 > 280. Notice how the optimizer climbs up the global utility function, hits the constraint (i.e., a11 + a12 < 1200) then continues to climb and reach the maximum. 4. Case studies For the case studies in this paper we simulated a private cloud with the CloudSim [26] simulator. It was assumed that the administrator responsible for configuring the feedback optimizer had access to enough information regarding the deployed applications so as to construct simple models for them.7 A data-set of a sample application¡¯s performance, was generated synthetically by invoking the CloudSim simulator using random values for capacity, demand and inter-arrival time (Fig. 5 presents a partial visualization of the data-set). Three steady-state models were considered: 1. A simple linear model only containing predictor variables8 (sj = ¦Â1 + ¦Â2 dj ? ¦Â3 ¦Ój ? ¦Â4 ¦Ãj ). 2. A more complex linear model with interaction terms9 ¦Ãj ¦Ój and ¦Ãj dj to capture the effect of ¦Ãj on response time at different arrival rates and demands (sj = ¦Â1 + ¦Â2 dj ? ¦Â3 ¦Ój ? ¦Â4 ¦Ãj ? ¦Â5 ¦Ãj dj + ¦Â6 ¦Ãj ¦Ój ). 3. A non-linear model derived from the mean value based formula for open queuing networks (R = D/(1 ? ¦ËD)) [27] extended to account for capacity (sj = ¦Ã ?¦Â1 dj /¦Ó ). 10 j 2 j j A regression analysis was then performed on the resultant data for each model allowing confidence intervals to be computed for all model coefficients. In all cases, the approximate 95% confidence

¦Â d

Fig. 5. Visualization of pairwise relation between input parameters (capacity,demand) and the response attribute (i.e. response time) for a simulated application running on a single VM.

intervals for coefficients were quite narrow and the margin of error was orders of magnitude smaller than the coefficient. This implies that the amount of data for modeling was adequate. However, in linear models the intervals were very near to zero and coefficients were not significant. In the end as a result of this analysis, the nonlinear model (i.e., 3) was used. We further considered a very simple local utility function for applications: uj = umaxj .

(arctan(rthrj ? rj ) + ¦Ð /2) ¦Ð

(6)

where umaxj denotes the maximum possible utility as defined by the curve, rthrj denotes the SLA limit (in terms of response time) and rj denotes the actual response time. 4.1. Case study 1 Consider a small cloud configuration with three applications (app1, app2, app3) deployed on two PMs with the following placement matrix:

7 Only a single application was used for which a model was designed. Multiple instances of this application were deployed. 8 The model is linear in the parameters ¦Â not predictor variables.

i

?

placement =

1 0

1 1

0 . 1

?

from the base formula, and adding model coefficients (i.e. ¦Â1 and ¦Â2 ) to compensate for the differences between the system and pure queuing model. The substitution requires the arrival rate (¦Ë) to be a homogeneous Poisson process, and inter-arrival times (¦Ó ) to be exponentially distributed with parameter ¦Ë (mean 1/¦Ë).

9 The interaction term is the product of the subset of our original predictors. 10 This equation can be derived by substituting D with d /¦Ã and ¦Ë with 1/¦Ó

j j j j

j

That is, app1 is deployed on PM1, app3 is deployed on PM2, and app2 uses both PMs by having a VM on each. Both of the PMs have single core CPUs with each core having a processing speed of 1200 MIPS and 2 GB of RAM. The VMs for applications are identical with image size of 1 GB, RAM of 512

108 Table 1 Equations referenced by Algorithm 1. Num 1 2 3 4 5 6 7 8 9 10 Equation Ui (A) = (

H. Ghanbari et al. / Future Generation Computer Systems 28 (2012) 104¨C111

Description

¡Æ

j¡Êapp aij ) ? ci

U0 = ( j¡ÊApp uj (sj (¦Ãj , ¦Ój , dj ))) ? k.cost (A) fpbest = max(U0 (A), fpbest ) g = ? U0 (A)/? A A = A + ¦Á g0 A = A ? ¦Á gi i = argmaxi (¡¬Ui (A)¡¬) ?k ?j{[(k = i ¡Ä placement (k, j) = 1) ¡ú (gi )kj = 1] ¡Ä [k ?= i ¡ú (gi )kj = 0]} ¦Á = (Ui (A) + ?)/(¡¬g ¡¬2 2) ?i ?j{aij = min(max(aij , 0), ci )}

¡Æ

A constraint for each PMi (i > 0) is the sum of all resources given to VMs deployed on that PM minus the PM¡¯s total capacity. The global utility function U0 . Incremental recording of objective value. Calculating the objective subgradient as if the problem were unconstrained. Moving towards the objective function subgradient with a fixed step size (i.e. optimality update). Projecting the current point onto the set of points that satisfy the inequality constraint i. Choosing the most violated constraint. Calculating the subgradiet gi of a constraint i (i.e. ? Ui (A)/? A) while i > 0. Calculating the step size in feasibility update, while ? > 0 is a tolerance and set to 10?3 . Check that each allocated capacity is feasible and is between the local bounds.

MB, and bandwidth of 1 GB. The CPU resource share for each VM is initialized to 280 MIPS but will be adjusted by the optimizer. At the PaaS layer applications are assumed to have workloads with mean CPU demand per request of 4100, 5700 and 500 MIPS respectively. For app2, the load gets distributed evenly between the two VMs. We chose the arrival rates of applications, from FIFA ¡¯98 workload [28]. The interarrival time of requests for all dynamic pages were extracted, on a per-minute basis, from the first six hours of day 21 of FIFA 98 workload. To have the time-of-day effect we divided this into three 2-h periods and applied each as an application workload. Moreover we multiplied interarrival-times of applications by 14, 11 and 8 respectively. The third application is more transactional in nature (i.e. lower demand and higher arrival rate) while the other ones are characterized as batch-like. The utility functions are defined for each application based on Eq. (6) with the following parameters for each application: rthr = [15, 16, 2] umax = [20, 10, 5]. For each 120 samples of interarrival times, we submit 20 transactions on each application in the simulator. We ran the experiment twice. One run used the static model with the coefficients computed offline as mentioned above. The second run used the same model but tuned at runtime. In the case of the tuned model, the new measurements obtained from the simulator (e.g. response times) are passed to the filter. The Kalman filter introduced in Section 2.3 re-tunes the dynamic model¡¯s coefficients and passes the model to the optimizer. In the case of a static model, the optimizer uses the model obtained through regression analysis as-is and no tuning is happening. The optimizer then recalculates the new resource allocations, and passes the new allocation vector to actuators to apply it to the system. Fig. 6 presents the results. The small mismatch of model and measurements due to the use of static models by the applications (see Fig. 6(a)) results in a bad allocation decision for app2, which leads to failure in meeting its SLA (see Fig. 6(e)). In contrast, using dynamic models results in more efficient resource allocations being made and better commitment to SLAs (i.e., response time for app2) (see Fig. 6(f)). 4.2. Case study 2 The second case study was performed to demonstrate the ability of our approach to successfully update resource shares in a way that compensates for additional workload. Ten applications were deployed on seven PMs. VMs were assigned randomly to PMs. All application workloads were set to have an interarrival time of 40 s except app1 whose workload was monotonically increasing (i.e., interarrival time was decreasing from 40 to 7 seconds). Fig. 7 presents both before (a) and after (b) snapshots of resource allocations. Notice that both app1 (dark blue) and app7

(yellow) are resident on the same PM. Initially, app1 is allocated approximately 300 MIPS while app7 is allocated approximately 310 MIPS. After the increase in workload app1 is allocated approximately 510 MIPS while app7 is allocated approximately 100 MIPS. 5. Related work Maintaining application level QoS guarantees has been the subject of much investigation. Recently satisfying the dual objectives of QoS guarantees and server consolidation in cloud computing environments has received considerable attention. Current approaches formulate the optimization problem in different forms. One approach attempts to minimize cost subject to performance constraints [12,13]. In this approach the SLA represents constraints rather than flexible goals. Constraints could be based on response time or throughput. For example [12] finds the minimum cost deployment subject to processing capacity and user throughput constraints. It seeks deployments which minimizes the overall cost of the hosts used, subject to meeting average delay and throughput constraints for each application as posed by its SLA. A second approach attempts to simultaneously minimize cost while maximizing QoS attributes, through multi-objective optimization or MOO [14]. For example, Pareto-optimal solutions can find a good trade-off between conflicting performance and cost-saving goals rather than finding a single global optimum [29]. Geometrically, these well-balanced solutions concentrate around the ¡®¡®knee¡¯¡¯ of a multi-objective curve. A third approach is the one that was used in this paper, that is, to optimize a utility function combining application-level SLAs and resource costs with tunable parameters for the administrator to specify trade-offs between the two [14]. In this approach a systemlevel global utility is defined in terms of local utilities which are in turn based on the achieved service level of the applications. These local utilities are combined with a set of coefficients that allows for the high level control of performance goal fulfillment and the resource cost savings. Finally, some techniques try to satisfy SLAs using a pure control-based feedback loop [11,23,24]. These approaches are categorized as control theoretic constraint satisfaction, rather than optimization; however, they are used in our estimation technique. Also the idea of dynamically adjusting the resource shares of multiple applications in order to meet a specified level of service differentiation, was used in [11,30], although using an adaptive multivariate controller and for a more limited scenario than ours (i.e. maximum one PM per application layer). 6. Conclusions and future work The purpose of this work was to demonstrate the advantage of ¡®¡®adaptive¡¯¡¯ models relative to ¡®¡®static¡¯¡¯ models in optimization. We investigated model based optimization of a private cläÂÑãÒøÔŒµß“EäÂÑãÒøÔŒµßhanbari et al. / Future Generation Computer Systems 28 (2012) 104¨C111

109

a

b

c

d

e

f

g

Workloads 22 0.2 18 16 14 12 0.1 08 06 04 02 0

app 1 app 2 app 3

20

40

60 Time

80

100

Fig. 6. The measured and modeled gained global utility using (a) static and (b) dynamic models over time. The response time of applications together with SLA response times for (c) static and (d) dynamic models. The allocated capacity to applications over time on (e) server one and (f) server two respectively using dynamic models. The (g) workload of applications.

applications are clustered across a known, homogeneous set of PMs. In this optimization, we modified the resource shares of applications, in order to minimize the SLA violations. While we

focused only on response time, considering multiple service level objectives will not change the approach (just the complexity of solving the optimization problem).

110

H. Ghanbari et al. / Future Generation Computer Systems 28 (2012) 104¨C111

a

part of the program of the Centre for Research in Adaptive Systems (CERAS). References

[1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfield, Xen and the art of virtualization, in: SOSP ¡¯03: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, ACM, New York, NY, USA, 2003, pp. 164¨C177. [2] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A. Konwinski, G. Lee, D.A. Patterson, A. Rabkin, I. Stoica, M. Zaharia, Above the clouds: a berkeley view of cloud computing, Tech. Rep. UCB/EECS-200928, EECS Department, University of California, Berkeley, February 2009. URL http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html. [3] B. Hayes, Cloud computing, Communications of the ACM 51 (7) (2008) 9¨C11. [4] B. Rochwerger, D. Breitgand, E. Levy, A. Galis, K. Nagin, I.M. Llorente, R. Montero, Y. Wolfsthal, E. Elmroth, J. Caceres, M. Ben-Yehuda, W. Emmerich, F. Galan, The reservoir model and architecture for open federated cloud computing, IBM Journal of Research and Development 53 (4) (2009) 4¨C11. [5] R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud computing and emerging it platforms: vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems. 25 (6) (2009) 599¨C616. [6] H.H. Huang, A control-theoretic approach to automated local policy enforcement in computational grids, Future Generation Computer Systems 26 (6) (2010) 787¨C796. [7] C.S. Yeo, S. Venugopal, X. Chu, R. Buyya, Autonomic metered pricing for a utility computing service, Future Generation Computer Systems 26 (8) (2010) 1368¨C1380. [8] W. Iqbal, M.N. Dailey, D. Carrera, P. Janecek, Adaptive resource provisioning for read intensive multi-tier applications in the cloud, Future Generation Computer Systems 27 (6) (2011) 871¨C879. [9] L. Rodero-Merino, L.M. Vaquero, V. Gil, F. Galn, J. Fontn, R.S. Montero, I.M. Llorente, From infrastructure delivery to service management in clouds, Future Generation Computer Systems 26 (8) (2010) 1226¨C1240. [10] M.A. Murphy, S. Goasguen, Virtual organization clusters: self-provisioned clouds on the grid, Future Generation Computer Systems 26 (8) (2010) 1271¨C1281. [11] X. Liu, X. Zhu, P. Padala, Z. Wang, S. Singhal, Optimal multivariate control for differentiated services on a shared hosting platform, in: Decision and Control, 2007 46th IEEE Conference on, 2007, pp. 3792¨C3799. [12] J.Z. Li, J. Chinneck, M. Woodside, M. Litoiu, Fast scalable optimization to configure service systems having cost and quality of service constraints, in: Proceedings of the 6th International Conference on Autonomic Computing, ACM, 2009, pp. 159¨C168. [13] J. Li, J. Chinneck, M. Woodside, M. Litoiu, G. Iszlai, Performance model driven QoS guarantees and optimization in clouds, in: Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, IEEE Computer Society, 2009, pp. 15¨C22. [14] H. Li, G. Casale, T. Ellahi, SLA-driven planning and optimization of enterprise applications, in: Proceedings of the First Joint WOSP/SIPEW International Conference on Performance Engineering, ACM, 2010, pp. 117¨C128. [15] H.N. Van, F.D. Tran, J.M. Menaud, SLA-aware virtual resource management for cloud infrastructures, in: IEEE Ninth International Conference on Computer and Information Technology, IEEE, 2009, pp. 357¨C362. [16] G. Tesauro, W.E. Walsh, J.O. Kephart, Utility-function-driven resource allocation in autonomic systems, in: Proceedings of the Second International Conference on Automatic Computing, IEEE Computer Society, 2005, pp. 342¨C343. [17] X. Wang, Z. Du, Y. Chen, S. Li, D. Lan, G. Wang, Y. Chen, An autonomic provisioning framework for outsourcing data center based on virtual appliances, Springer Netherlands 11 (3) (2008) 229¨C245. [18] X. Wang, D. Lan, G. Wang, X. Fang, M. Ye, Y. Chen, Q. Wang, Appliancebased autonomic provisioning framework for virtualized outsourcing data center, in: Proceedings of the Fourth International Conference on Autonomic Computing, IEEE Computer Society, 2007, p. 29. [19] J. Sauv¨¦, F. Marques, A. Moura, M.C. Samaio, J. Jornada, E. Radziuk, Sla design from a business perspective, in: DSOM 2005: Proceedings of the 16th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, 2005, pp. 72¨C83. [20] I.n. Goiri, F. Julia, J. Ejarque, M.d. Palol, R.M. Badia, J. Guitart, J. Torres, Introducing virtual execution environments for application lifecycle management and sla-driven resource distribution within service providers, in: Proceedings of the 2009 Eighth IEEE International Symposium on Network Computing and Applications, NCA ¡¯09, IEEE Computer Society, Washington, DC, USA, 2009, pp. 211¨C218. [21] M. Comuzzi, C. Kotsokalis, G. Spanoudakis, R. Yahyapour, Establishing and monitoring slas in complex service based systems, in: Proceedings of the 2009 IEEE International Conference on Web Services, ICWS ¡¯09, IEEE Computer Society, Washington, DC, USA, 2009, pp. 783¨C790. [22] A. Ferrer, F. Hern¨¢ndez, J. Tordsson, E. Elmroth, C. Zsigri, R. Sirvent, J. Guitart, R. Badia, K. Djemame, W. Ziegler, et al. OPTIMIS: a holistic approach to cloud service provisioning, Future Generation Computer Systems (2011) (in press). [23] E. Kalyvianaki, T. Charalambous, S. Hand, Self-adaptive and self-configured CPU resource provisioning for virtualized servers using kalman filters, in: Proceedings of the 6th International Conference on Autonomic Computing, ACM, Barcelona, Spain, 2009, pp. 117¨C126.

b

Fig. 7. A snapshot of resource allocation both (a) before and (b) after an increase in workload is detected. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

The main contribution of this work was dynamically using tracking models (for each application) within the global optimization loop. These models update themselves at runtime in order to adapt to perturbations in the environment not captured in initial model specification. This results in more accurate models being passed to the COM allowing for better resource utilization on a global scale. The relationship between the number of simulated VMs and optimization time per step was also considered and a non-linear relation was observed. While we acknowledge that this could pose a problem for our approach (i.e., when working with large numbers of VMs) a possible solution would be to split the problem into subsets and solve them individually. Future work will involve implementing the optimization algorithm in a distributed manner in which applications interact in a peer-to-peer fashion to determine how much resource should be allocated to each. We are also investigating optimizing the dual problem, which is more suitable for mapping to an agent-oriented optimization approach (see [31,32]). Another direction will involve extending the performance model to include more details about an application¡¯s structure. For example the model can be aware of different layers of an application such as web server, application server, and database server. Further, consideration of heterogeneous resources is also interesting and should be considered as well. Acknowledgments This research was supported by OCE, the Ontario Centres of Excellence, and by the IBM Toronto Centre for Advanced Studies, as

H. Ghanbari et al. / Future Generation Computer Systems 28 (2012) 104¨C111 [24] T.F. Abdelzaher, K.G. Shin, N. Bhatti, Performance guarantees for web server end-systems: a control-theoretical approach, IEEE Transactions on Parallel and Distributed Systems (2002) 80¨C96. [25] G. Welch, G. Bishop, An introduction to the kalman filter, University of North Carolina at Chapel Hill, Chapel Hill, NC. URL http://www.cs.unc.edu/~welch/kalman/kalmanIntro.html. [26] R.N. Calheiros, R. Ranjan, A. Beloglazov, C.A.F. De Rose, R. Buyya, Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Software: Practice and Experience 41 (1) (2011) 23¨C50. [27] E.D. Lazowska, J. Zahorjan, G.S. Graham, K.C. Sevcik, Quantitative System Performance: Computer System Analysis Using Queueing Network Models, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1984. [28] M. Arlitt, T. Jin, A workload characterization study of the 1998 world cup web site, Network, IEEE 14 (3) (2000) 30¨C37. [29] A. Soror, U. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, S. Kamath, Automatic virtual machine configuration for database workloads, ACM Transactions on Database Systems 35 (1) (2010) Article 7. [30] H.C. Lim, S. Babu, J.S. Chase, S.S. Parekh, Automated control in cloud computing: challenges and opportunities, in: Proceedings of the 1st Workshop on Automated Control for Datacenters and Clouds, ACM, New York, NY, USA, 2009, pp. 13¨C18. [31] P. Huang, H. Peng, P. Lin, X. Li, Macroeconomics based grid resource allocation, Future Generation Computer Systems 24 (7) (2008) 694¨C700. [32] H. Izakian, A. Abraham, B.T. Ladani, An auction method for resource allocation in computational grids, Future Generation Computer Systems 26 (2) (2010) 228¨C235. Hamoun Ghanbari defended his M.Sc. on August 2008 at Concordia University, Montreal. He is currently a Ph.D. student at York University, Toronto working under the supervision of Prof. Marin Litoiu in the Centre of Excellence for Research in Adaptive Systems (CERAS) and IBM Toronto Centre for Advanced Studies. His research interests in general include adaptive systems, high performance and cloud computing and specifically managing computing clouds using adaptive systems techniques.

111

Bradley Simmons is currently a Postdoctoral Fellow in the Adaptive Systems Research Group at York University in Toronto, Canada. He earned his B.Sc. in Biology with an Honors in Genetics, an M.Sc. in Computer Science and Ph.D. in Computer Science at the University of Western Ontario in London, Canada. His research interests include, but are not limited to, policy-based management of distributed systems, business driven IT management and the further development of the strategy-tree concept and its implementation.

Marin Litoiu is a professor at York University. Prior to that he was a Senior Research Staff Member with the Centre for Advanced Studies, IBM Toronto Lab, where he led the research programs in Autonomic Computing, System Management and Software Engineering. He was the Chair of the Board of CSER, a Canadian Consortium for Software Engineering Research and Director of Research for Centre of Excellence for Research in Adaptive Systems. Dr. Litoiu holds doctoral degrees from the University Polytechnic of Bucharest and from Carleton University. His research interests include autonomic computing; high performance software design; performance modeling, performance evaluation and capacity planning for distributed and real time systems.

Gabriel Iszlai is a Senior Developer with the IBM Tivoli On Demand group in Toronto, Canada. He received his B.S. degree in 1992. Prior to joining IBM he worked as a Network Architect for Think-Dynamics, a company acquired by IBM in May 2003. He was one of the initial designers of the former ThinkControl application, known today as IBM Tivoli Intelligent Orchestrator. Prior to that, he worked for over 8 years in the IT industry for different European telecom companies.

Ïà¹ØÎÄÕÂ:

¸ü¶àÏà¹Ø±êÇ©:

- 1-s2.0-S0021979713011053-main
- 1-s2.0-S1226086X1400700X-main
- 1-s2.0-S0167273814002884-main
- 1-s2.0-S0360544217305935-main
- 1-s2.0-S0966979506000823-main
- 1-s2.0-S0006899315000797-main
- 1-s2.0-S0963996913002408-main
- 1-s2.0-S0955221915301692-main
- 1-s2.0-S0955221916300656-main
- 1-s2.0-S1043466611008118-main
- 1-s2.0-S0014482703003872-main
- 1-s2.0-S0939641107002639-main[1]
- 1-s2.0-S0925400507002481-main
- 1-s2.0-S0956566308003308-main
- 1-s2.0-S1110982311000330-main[1]