Designing an XMLbased Exchange Format for Harmonia Marat

Designing an XML-based Exchange Format for Harmonia Marat Boshernitsan Susan L. Graham University of California, Berkeley, USA Exchange Formats Workshop WCRE 2000, Brisbane, Australia 1

Application Domain • Harmonia: a framework for constructing interactive language-based programming tools – Primary focus: front-end, user-visible tools (language-sensitive editors, source browsers, design assistants) – Simultaneously support many languages: C++, Java, as well as smaller domain-specific languages (make, bison, …) – Fine-grain version history: enables incremental analyses and other user services – Rapid prototyping: bindings for C++, C, Tcl, Java, Lisp – External tool integration: existing analyzers, source transformers, etc. 2

Exchange Format Requirements • General requirements – Support for industry standards – Convenient APIs – Easy to generate, store, and parse use XML • Domain requirements – Small tools (individual source files) and large tools (entire software projects) – Harmonia-enabled tools as well as external language-sensitive tools – Legacy text-based tools (editors, compilers, etc. ) – Support many languages – can not standardize on one encoding, e. g. Java. ML model internal program representation 3

Example • Text: “if (cond) f(); ” actually, ASG! • AST… If. Then. Stmt if ( Var. Ref ) Expr. Stmt Name cond to variable declaration Method. Call ; Name ( Args ) f to method declaration 4

Encoding Graphs <node id=1 name=Method. Call> <edge target=2 type=child/> <edge target=4 type=child/> <edge target=5 type=child/> <edge target=6 type=child/> </node> <node id=2 name=Name> <edge target=3 type=child/> <edge target=42 type=decl/> </node> <node id=3 name=IDENT text="f"/> <node id=4 name=LPAREN text="("/> <node id=5 name=Args></node> <node id=6 name=RPAREN text=")"/> If. Then. Stmt if ( Var. Ref ) Name Expr. Stmt Method. Call 1 ; cond Name 2 (4 Args 5 )6 f 3 to method declaration: 42 5

Encoding Trees <node id=1 name=Method. Call> <edge target=2 type=child/> <edge target=4 type=child/> <edge target=5 type=child/> <edge target=6 type=child/> </node> <node id=2 name=Name> <edge target=3 type=child/> <edge target=42 type=decl/> </node> <node id=3 name=IDENT text="f"/> <node id=4 name=LPAREN text="("/> <node id=5 name=Args></node> <node id=6 name=RPAREN text=")"/> If. Then. Stmt if ( Var. Ref ) Name Expr. Stmt Method. Call 1 ; cond Name 2 (4 Args 5 )6 <node id=1 name=Method. Call> <node id=2 name=Name decl=42> <node id=3 name=IDENT text="f"/> </node> <node id=4 name=LPAREN text="("/> <node id=5 name=Args></node> <node id=6 name=RPAREN text=")"/> </node> f 3 to method declaration: 42 6

Schemas • XML documents should have a DTD <node id=1 name=Method. Call> <node id=2 name=Name decl=42> <node id=3 name=IDENT text="f"/> </node> <node id=4 name=LPAREN text="("/> <node id=5 name=Args></node> <node id=6 name=RPAREN text=")"/> </node> <!ELEMENT node (node)*> <!ATTLIST node. . . all node attributes. . . > O No type restrictions – cannot validate with respect to programming language O Need additional data schema 7

Schemas (cont. ) • Idea: use the same typing mechanism as the AST PAST nodes typed according to the abstract grammar PGenerate DTD from abstract grammar (used by the parser) PInterpret the DTD as a data schema <!ELEMENT <!ATTLIST Method. Call Name <Method. Call id=1> <Name id=2 decl=42> <IDENT id=3 text="f"/> </Name> <LPAREN id=4 text="("/> <Args id=5></Args> <RPAREN id=6 text=")"/> </Method. Call> (Name, LPAREN, Expr*, RPAREN)> (IDENT)> id ID #REQUIRED decl IDREF #REQUIRED> 8

Encoding Program Text • Legacy tools may not want to understand XML: but it is not so easy to get to program text • One last refinements: simply mark up the source… <Method. Call id=1> <Name id=2 decl=42> <IDENT id=3>f</IDENT> </Name> <LPAREN id=4>(</LPAREN> <Args id=5></Args> <RPAREN id=6/>)</LPAREN> </Method. Call> • Don’t want structure? Strip off XML tags (with XSL!): <xsl: template match="/"> <xsl: apply-templates/> </xsl: template> 9

Future Directions • Schema and grammar evolution – Schema will change with the grammar • Not really a problem: the grammar is abstract and rarely changes – Still, can use XSL to transform existing data to a new schema • Harmonia-enabled editor can generate transformations automatically • Incorporating version information – Need easy access to current version – Should be able to get to older versions • Data granularity for high level modeling tools – Need incremental refinement of detail 10

Conclusions • An encoding satisfying all design requirements – – XML-based (standard, convenient APIs, easy import/export) Works for variety of tools Can utilize many off-the-shelf XML tools/libraries Programming language independent (DTD generated from language grammar) • More information – Harmonia web page: http: //www. cs. berkeley. edu/Research/Projects/harmonia – Email: Marat Boshernitsan <maratb@cs. berkeley. edu> 11
- Slides: 11