Efficient Software Performance Estimation Methods for HardwareSoftware Codesign

Introduction n One of the most important purposes of hw/sw codesign is to find

Software Performance Estimation n n Cost of a mixed hw/sw system based on a

POLIS System n CFSMs (Codesign Finite State Machines) n n Does not discriminate between

Related Work n n Software performance depends on the structure of the software program

Abstraction Models in POLIS n CFSM n n HW: be mapped into an abstract

Abstraction Models in POLIS n S-Graph: n n n A DAG(directed acyclic graph) with

S-Graph n Semantics: n n n Start with the BEGIN node Traverse each node

Performance Estimation Methods n Modeling the target system n The structure of C code

Modeling the Target System n Execution time T=Tpp +k Tinit +Tstruct n Code size

Modeling the Target System(cont. ) n n n Tpp, Spp , Tinit , Sinit

Benchmark Model n Four attributes to characterize a system n n Name of the

S-graph Level Estimation n Property 1. Each node in an S-graph has a one-toone

S-graph Level Estimation n Algorithm: SGtrace(sgi) If (sgi==NULL) return (C(0, ∞, 0)); If(sgi has

S-graph Level Estimation n n The computational complexity: O(E) Average execution time: Cave =ΣPij

CFSM Level Estimation n n Is much more difficult since a CFSM model does

Experimental Results(1) 10/29/2002 EE 249 Discussion Session

Experimental Results(2) 10/29/2002 EE 249 Discussion Session

Experimental Results(3) n Compared to an assembly-level analysis: n S-graph(Table 1): n n The

Conclusions n S-graph level method n n n provides an accurate estimation for all

Conclusions n Two software performance estimation methods for use with the POLIS hardware/software codesign

Slides: 21

Download presentation

Efficient Software Performance Estimation Methods for Hardware/Software Codesign Kei Suzuki Alberto Sangiovanni-Vincentelli Present: Yanmei Li

Introduction n One of the most important purposes of hw/sw codesign is to find the optimum hw/sw partition of a system level specification under particular criteria Criteria n Performance(speed, or the number of clock cycles) n Cost(number of components, die size, or code size) Estimation n At a lower abstraction level n n easy and accurate, but long design iteration time At a higher abstraction level n n reduce the exploring time Play an important role in the synthesis and optimization 10/29/2002 EE 249 Discussion Session

Software Performance Estimation n n Cost of a mixed hw/sw system based on a standard micro-processor depends on the hw size Solution: Implement a given functionality with a program on the microprocessor Problem: Software implementation often fails to meet the performance requirement Tradeoff: n n To implement the critical portion in the program with hardware Software performance estimation is the key 10/29/2002 EE 249 Discussion Session

POLIS System n CFSMs (Codesign Finite State Machines) n n Does not discriminate between hw and sw Estimation provides preliminary timing information and also a measure for hw/sw partitioning A partitioning process takes place to identify the candidate components for sw implementation S-Graph (Software graph) n n To optimize the trade-off between the performance and the code size of the final implementation Estimation is helpful for s-graph optimization and sw module scheduling 10/29/2002 EE 249 Discussion Session

Related Work n n Software performance depends on the structure of the software program as well as on the components of the target system The structure of the software program is more difficult to estimate as the abstraction level rises Most of the results are from the object code level which is the lowest level of abstraction, and are concerned with software that has a limited structure A number of approaches have been proposed n A simple prediction method n Statistical methods n …… 10/29/2002 EE 249 Discussion Session

Abstraction Models in POLIS n CFSM n n HW: be mapped into an abstract hardware description format, and synthesized into a combinational circuit and a set of latches SW: be is translated into a data structure called s-graph 10/29/2002 EE 249 Discussion Session

Abstraction Models in POLIS n S-Graph: n n n A DAG(directed acyclic graph) with one source node and one sink node Represent the control flow of a given behavior Four types of node: BEGIN, END, TEST, ASSIGN 10/29/2002 EE 249 Discussion Session

S-Graph n Semantics: n n n Start with the BEGIN node Traverse each node along its edge, until reaching the END node At a TEST node, select one corresponding child with the value of the associated predicate P(V) At an ASSIGN node, assign the value of the associated function A(V) to the output variable z Translate an s-graph into a C program n Traverse the graph in a depth-first manner n n n TEST: if (or switch) statement ASSIGN: assignment statement The resulting C program has the same structure 10/29/2002 EE 249 Discussion Session

Performance Estimation Methods n Modeling the target system n The structure of C code generated by POLIS: Function() ……(1) { Initialization of local variable(assignment statements); ……(2) Structure of mixed if or switch statements and assignment statements; ……(3) Return; ……(4) } 10/29/2002 EE 249 Discussion Session

Modeling the Target System n Execution time T=Tpp +k Tinit +Tstruct n Code size S=Spp +k Sinit +Sstruct n n n 10/29/2002 Tpp (Spp) : for entering and exiting the function (1)+(4) Tinit ( Sinit): for initializing local variables(2). k is the number of local variables. Tstruct (Sstruct): for the structure of mixed conditional statements generated from TEST nodes and assignment statements generated from ASSIGN nodes(3). EE 249 Discussion Session

Modeling the Target System(cont. ) n n n Tpp, Spp , Tinit , Sinit are constant which can be determined beforehand Tstruct =ΣPi Ct (node_type_of(i), variable_type_of(i)) Sstruct =ΣCs (node_type_of(i), variable_type_of(i)) n n 10/29/2002 Pi =1 if node i is on a path, otherwise Pi =0 Ct and Cs can be obtained by using simple benchmark programs containing a mix of the C statement that appears in the generated C programs and analyzing the execution time and code size of the programs on the target compiler and the target CPU EE 249 Discussion Session

Benchmark Model n Four attributes to characterize a system n n Name of the parameter set, a name for a unit of execution time, a name for a unit of code size, and the size of an integer variable seventeen cost parameters to model the execution time, and fifteen cost parameters to model the code size n n n n A TEST node with an event-type variable/multi-valued variable with a bit mask/multi-valued variable An ASSIGN node with an event-type variable/which assigns a constant to a variable/which assigns one variable to another one Pre-processing and post-processing A branch operation Initialization of a local variable Average execution time and size for pre-defined software library functions The size of pointers The size of integer variables 10/29/2002 EE 249 Discussion Session

S-graph Level Estimation n Property 1. Each node in an S-graph has a one-toone correspondence with only a few statements in the synthesized C code Property 2. The form of each statement is determined by the type of corresponding node Property 3. The S-graph is a DAG, hence it does not include loops in its structure Each node/edge is weighted according to precalculated cost parameters in the pre-process 10/29/2002 EE 249 Discussion Session

S-graph Level Estimation n Algorithm: SGtrace(sgi) If (sgi==NULL) return (C(0, ∞, 0)); If(sgi has been visited) return (pre-calculated Ci(*, *, 0) associated with sgi); Ci=initialize (max_time=0; min_time=∞; code_size=0); For each child sgj of sgi{ Cij=SGtrace(sgj)+edge cost for edge eij If(Cij. max_time> Ci. max_time) Ci. max_time= Cij. max_time; If(Cij. min_time< Ci. min_time) Ci. min_time= Cij. min_time; Ci. code_size+= Cij. code_size; } Ci+= node cost for node sgi; Return(Ci); 10/29/2002 EE 249 Discussion Session

S-graph Level Estimation n n The computational complexity: O(E) Average execution time: Cave =ΣPij (Ct (node_type_of(i), variable_type_of(i))+ Ce (i, j)) n n 10/29/2002 Pij is the possibility of executing node i and going to node j Ce (i, j) is the edge cost for edge eij EE 249 Discussion Session

CFSM Level Estimation n n Is much more difficult since a CFSM model does not closely reflect the code structure MDDs are used to represent the transition relation function of a CFSM (a node represents a multi-valued variable; ordering is important) The estimation algorithm of the MDD is based on the assumption that the maximum(minimum) cost path in an MDD is usually the maximum (minimum) cost path in the s-graph that is generated from the MDD Also based on recursive DFS traversing algorithm There is no relation between the code size of the number of the MDD nodes 10/29/2002 EE 249 Discussion Session

Experimental Results(1) 10/29/2002 EE 249 Discussion Session

Experimental Results(2) 10/29/2002 EE 249 Discussion Session

Experimental Results(3) n Compared to an assembly-level analysis: n S-graph(Table 1): n n The differences in the maximum execution time are within (-10%, +10%) The differences in the minimum execution time are within (-20%, +20%) The differences in code size are within (-20%, +20%) CFSM(Table 2): n n 10/29/2002 The differences in the maximum execution time are within (-10%, +25%) The differences in the minimum execution time are within (-20%, +20%) EE 249 Discussion Session

Conclusions n S-graph level method n n n provides an accurate estimation for all analysis: the maximum and minimum execution time, and code size. It is a useful technique for optimization in software synthesis because of its accuracy. CFSM level method n n is less accurate than the s-graph estimation, but it is still accurate enough when estimating the maximum and minimum execution time. is important for automatic partitioning of CFSMs into hardware and software parts, and also for scheduler generation. 10/29/2002 EE 249 Discussion Session

Conclusions n Two software performance estimation methods for use with the POLIS hardware/software codesign system are proposed in this paper. n n n S-graph level method CFSM level method The experimental results showed that the accuracy of both proposed methods is high enough for use in the POLIS system. 10/29/2002 EE 249 Discussion Session