Scripting EPrints About This Talk x Light on
Scripting EPrints
About This Talk x Light on syntax xobject->function(arg 1, arg 2) x Incomplete x Designed to xgive you a feel for the EPrints data model xintroduce you to the most significant objects xhow they relate to one another xtheir most common methods xact as a jumping off point for exploring
Finding Documentation x EPrints modules have embedded documentation x Extract it using perldoc xperldoc perl_lib/EPrints/EPrint. pm
EPrints 3. 0 x This talk based on EPrints 2. 3 series x 3. 0 API still being finalised xtidies up object hierarchy xresolves some of 2. 3’s naming clashes xlots of extra functionality xbut core data model remains the same x. EPrints 3. 0 is fully back-compatible x 2. 3 scripts will work with EPrints 3. 0
Roadmap 1. Data x EPrints, Users, Documents, Subjects, Subscriptions 2. Data collections x Data. Sets, Meta. Fields 3. Searching your data x Search. Expressions 4. Scripting your archive x Archives, Session
1. Data EPrints, Users, Documents, Subjects, Subscriptions
Data Model Sketch EPrint
Data Model Sketch Document EPrint all documents PDF Document HTML
Data Model Sketch Document EPrint all documents PDF Document owner User HTML
Data Model Sketch Document EPrint owned eprints owner User all documents PDF Document HTML
Data Model Sketch Document EPrint all documents EPrint owned eprints owner Document User subscriptions Subscription PDF Subscription HTML
Data Model Sketch Document EPrint Subject all documents EPrint owned eprints owner Document User subscriptions Subscription PDF Subscription HTML
Data Model Sketch Subject Document child EPrint Subject all documents EPrint parent Subject owned eprints owner Document User subscriptions Subscription PDF Subscription HTML
Data Model Sketch Subject child EPrint Document posted eprints EPrint Subject all documents EPrint parent Subject owned eprints owner Document User subscriptions Subscription PDF Subscription HTML
EPrint x An EPrint object represents a single deposit in your EPrints archive xhas some metadata fields xhas one or more documents xis owned by a user
Creating EPrints x new(session, id) xcreate an EPrint object for an existing deposit x create(session, dataset, data) xcreate a new EPrint object x More on sessions and datasets later!
Introducing Data. Obj x EPrint is a subclass of Data. Obj x Data. Obj provides common methods for xaccessing metadata xrendering XHTML output
Inherited from Data. Obj x get_id x get_url(staff) xget the URL of an EPrint xe. g. URL to the abstract page of an eprint in the archive xif staff is true then returns the URL to the staff view, which shows more detail x get_type() xget the EPrint type xe. g. article, book, thesis, conference paper. . .
Inherited from Data. Obj x get_value(fieldname) xget the value of the named field x set_value(fieldname, value) xset the value of the named field x. Remember to call commit() to make changes in database! x is_set(fieldname) xtrue if the named field has a value
EPrint Methods x remove() xerase the eprint and any associated records/files from the database and filesystem xthis should only be called on EPrints in the "inbox" or "buffer" datasets x commit() xcommit any changes made to the database x datestamp() xset the last modified date to today
Moving EPrints Around x move_to_deletion() xtransfer the eprint to the deletion dataset xshould only be called on eprints in the archive dataset x See also: xmove_to_inbox() xmove_to_buffer() xmove_to_archive()
Rendering EPrints x generate_static() xgenerate the static abstract page for the eprint xin a multi-language archive this will generate a page in each language
Rendering - Inherited from Data. Obj x render_citation(style) xcreate an XHTML citation for the EPrint xif style is set then use the named citation style xdefined in citations-en. xml x render_citation_link(style) xas above, but citation is linked to the EPrint’s abstract page
Rendering - Inherited from Data. Obj x render_value(fieldname, showall) xget an XHTML fragment containing the rendered version of the value of the named field xin the current language xif showall is true then all languages are rendered xusually used for staff viewing (checking) data
Rendering Tips x Most rendering methods return XHTML xbut not a string! x. XML Node objects x. Document. Fragment, Element, Text. Node. . . x In your scripts, build a document tree from these nodes xe. g. node 1 ->append. Child(node 2) xthen flatten it to a string x. Why? It’s easier to manipulate a tree than to manipulate a large string
More Rendering Tips x XML Node objects are not part of EPrints x. XML: : DOM or XML: : GDOME libraries xexplore these libraries using perldoc x XHTML is good for building Web pages xbut not so good for command line output! xuse tree_to_utf 8() xextracts a string from the result of any rendering method xtree_to_utf 8( eprint->render_citation)
Navigating to Related Objects x get_user() xget a User object representing the user to whom the EPrint belongs x get_all_documents() xget a list of all the Document objects associated with the EPrint x We will look at these objects next. . .
User x A User object represents a single registered user x Also a subclass of Data. Obj xinherits metadata access methods xget_url get_type get_value set_value is_set xinherits rendering methods xrender_citation_link render_value x Also has commit and remove xinherited from Data. Obj in 3. 0
Creating Users x new(session, id) xcreate a User object from an existing user record xuser_with_email(session, email) xuser_with_username(session, username) x create_user(session, access_level) xcreate a new User
User Accessors x get_editable_eprints() xget a list of EPrints that the user can edit x get_owned_eprints(dataset) xget a list of EPrints owned by the user in the dataset x is_owner(eprint) xtrue if the user is the owner of the EPrint x get_subscriptions() xget a list of Subscriptions associated with the user
Document x A single document associated with an eprint xmay actually contain one or more physical files x. PDF = 1 file x. HTML + images = many files x Another subclass of Data. Obj
Creating a Document Object x new(session, docid) xcreate a Document object from an existing record x create(session, eprint) xcreate a new Document object for the given EPrint
Document Accessors x get_eprint() xget the EPrint object the document is associated with x local_path() xget the full path of the directory where the document is stored in the filesystem x files() xget a list of (filename, file size) pairs
Main File and Format x get_main() x set_main(main_file) xget/set the ‘main’ file for the document xe. g. if the document is multipage HTML with images, the main file needs to be set to the top index. html file xwhen rendering document links, EPrints always links to the main file in the document x set_format(format) xsets the document format
Adding Files to Documents x upload(filehandle, filename) xuploads the contents of the given file handle xadds the file to the document (using the given filename) x add_file(file, filename) xadds a file to the document (using the given filename) xfile is the full path to the file
Adding Files to Documents x upload_url(url) xgrab file(s) from given URL xin the case of HTML, only relative links will be followed x add_archive(file, format) xadd files from a. zip or. tar. gz archive x remove_file(filename) xremove the named file from the Document
Subject x A single subject from the subject hierarchy x Another subclass of Data. Obj
Creating Subjects x new(session, subjectid) xcreate a Subject object from an existing subject x create(session, id, name, parent, depositable) xcreate a new Subject xdepositable specifies whether or not users can deposit eprints in the subject
Subject Accessors x children() xget a list of Subjects which are the children of the subject x get_parents() xget a list of Subjects which are the parents of the subject x subject_label(session, subject_tag) xget the full label of a subject, including parents
Subject Accessors x count_eprints(dataset) xget the number of eprints associated with the subject x posted_eprints(dataset) xget a list of EPrints associated with the subject
Rendering Subjects x render_with_path(session, topsubjid) xget a Document. Fragment containing the subject path xexample of a subject path: H Social Sciences > HD Industries. Land use. Labor > HD 28 Management. Industrial Management
Subscription x A stored search which is performed every day/week/month on behalf of a user x get_user() xget the User who owns the subscription x Another subclass of Data. Obj
Creating Subscriptions x new(session, id) xcreate a Subscription object from an existing subscription x create(session, userid) xcreate a new Subscription object for the given user
Processing Subscriptions x send_out_subscription() xsearch for new items matching the subscription settings xemail them to the user owning the subscription
Data. Obj Hierarchy
So Far. . x We’ve looked at individual data objects xbut an EPrints archive holds many eprints and documents, has many registered users etc. xhow do we access them collectively? x We’ve seen the get_value and set_value methods for metadata xbut an archive’s metadata is configurable xso how do we know what metadata fields an EPrint, User etc. has? xhow do we access properties of the fields?
2. Data Collections Data. Sets and Meta. Fields
Dataset x A collection of data items x Tells us all the possible types in the collection x e. g. EPrints may be article, thesis x Tells us the fields in each type x e. g. article has title, authors, publication. . . x e. g. conference_item has title, authors, event_title, event_date. . x Can also tell us all the fields that apply to a dataset x title, authors, publication, event_title. .
Dataset Configuration x Archive. Metadata. Fields. Config. pm xfields in each dataset xadditional system fields defined in EPrint. pm, User. pm etc. x metadata-types. xml xtypes in each dataset xfields that apply to each type
Datasets in EPrints x archive x EPrints that are live in the main archive x buffer x EPrints that have been submitted for editorial approval x deletion x EPrints that have been deleted from the archive x inbox x EPrints which users are still working on x eprint x All EPrints from archive, buffer, deletion and inbox
Datasets in EPrints x user xall registered Users x subject xall Subjects in the subject tree x document xthe Documents belonging to all EPrints in the archive x subscription xthe Subscriptions which Users have requested
Data. Set Accessors x id() xget the id of the dataset x count(session) xget the number of items in the dataset x get_item_ids(session) xget a list of ids of the items in the dataset
Datasets and Meta. Fields x Many Dataset methods return Meta. Field objects x A Meta. Field x is a single field in a dataset x tells us properties of the field xe. g. name, type, input_rows, maxlength, multiple etc. xconfigured in Archive. Metadata. Fields. Config. pm x but not the field value xthe value is specific to the individual EPrint, User etc. x e. g. eprint->get_value(“title”)
Meta. Field Methods x get_name() xget the field name x get_type() xget the field type x get_property(name) x set_property(name, value) xget/set the named property to the given value
Meta. Field Type Hierarchy
Data. Set Accessors x has_field(fieldname) xtrue if the dataset has a field of that name x get_field(fieldname) xget a Meta. Field object describing the named field
Data. Set Accessors x get_fields() x get a list of Meta. Fields belonging to the dataset x get_types() x get a list of all types in the dataset x e. g. EPrint types: article, book_section, conference_item, monograph, patent, thesis, other x e. g. User types: user, editor, admin x get_type_name(session, type) x get a string containing a human-readable name for the specified type in current language
Data. Set Accessors x get_type_fields(type) xget a list of Meta. Fields belonging to the given type x get_required_type_fields(type) xget a list of the Meta. Fields which are required for the given type x field_required_in_type(field, type) xtrue if given field is required in given type
Rendering Data. Sets x render_name(session) xget an XHTML fragment containing the name of the dataset in the language of the current session x render_type_name(session, type) xget an XHTML fragment containing the name of the given type in the language of the session
Rendering Meta. Fields x render_name(session) xget an XHTML fragment containing the name of the field in the current language xe. g. from phrases-en. xml: x<ep: phrase ref="eprint_fieldname_title"> Title</ep: phrase>
Rendering Meta. Fields x render_input_field(session, value) xget some XHTML containing input controls that will allow a user to input data to the field xvalue is the default value
Rendering Meta. Fields x render_help(session, type) x get some XHTML containing help text for a user inputting some data for the field x if an optional type is specified then specific help for that type will be used if available x e. g. from phrases-en. xml: <ep: phrase ref="eprint_fieldhelp_title">The title of the item. </ep: phrase> <ep: phrase ref="eprint_fieldhelp_title. book">The title of the book, usually found on the title page. </ep: phrase>
So Far. . . x We know how to access data objects in EPrints x. EPrint, User, Document. . . x We know how to access collections of these objects x. Datasets x. Meta. Fields x Now, how do we search for items?
Searching Your Archive Search. Expressions
Search. Expression x The conditions of a single search x new(data) xcreate a new search expression from the given data se = new Search. Expression( session => session, dataset => dataset, custom_order => “title” ) xsorted by title, ascending
Adding Search Fields x add_field(metafield, value) xadd a new search field with the given value (search text) to the search expression xif the search field already exists in the search expression, its value is replaced
Adding Search Fields x Example: full text search x searchexp->add_field( dataset->get_field(“title”), “routing”, “IN”, “ALL” )
Adding Search Fields x Example: full text search x matches word in title OR abstract x searchexp->add_field( [ ds->get_field(“title”), dataset->get_field(“abstract”) ], “routing”, “IN”, “ALL” )
Adding Search Fields x Example: date range search x searchexp->add_field( dataset->get_field(“date”), “ 2000 -2004”, “EQ”, “ALL” )
Serialising Searches x serialise() xget a text representation of the search expression, for persistent storage x from_string(string) xunserialises the contents of string xbut only into the fields already existing in the Search. Expression
Rendering Search. Expressions x render_description() xget some XHTML describing the current parameters of the search expression x render_search_form(help) xrender an input form for the search expression xif help is true then this also renders the help for each search field in current language
Processing Results x Carry out a search using: xperform_search() x The results can then be accessed: xcount() xget the number of results xget_records(offset, count) xget_ids(offset, count) xget a list of Data. Objs (e. g. EPrint, User) representing the result set, or just their ids xoptionally specify a range of results to return from result set using count and offset
Processing Results x map(function, args) xusing get_records to get results uses a lot of memory if there are 1000 s of results xapply the function to each result without overhead xfunction is called with args: x(session, dataset, dataobj, args) xthe Data. Set object also has a map function xcreates a Search. Expression over dataset xsets allow_blank = 1 xpasses args to searchexp->map
Aside: Lists in EPrints 3. 0 x In EPrints 3. 0, searches return a List x ordered collection of Data. Objs x In fact, any 2. 3 function which returns a list (array) of Data. Objs returns a List in 3. 0 x list->reorder( neworder ) x list->union( list 2 ) x list->intersect( list 2 ) x list->remainder( list 2 ) x map over items in the list x even arbitrarily constructed ones
Scripting Your Repository Archives and Sessions
Archive x One EPrints installation can host multiple archives x An Archive object is a single EPrints archive xaccess archive-specific configuration x Don’t confuse the Archive object with the archive Data. Set! xarchive->get_dataset(“archive”) xrenamed Repository in 3. 0
Archive Accessors x get_id() xget the id string of the archive. x get_conf(key, subkeys) xget a named configuration setting xprobably set in Archive. Config. pm xget_conf( "stuff", "en", "foo" )
Calling Archive Subs x call(cmd, params) xcalls the subroutine named cmd specified in the archive configuration (Archive. Config. pm etc. ) with the given parameters and returns the result x can_call(cmd) xtrue if the named cmd exists in the archive configuration x lets you delegate processing to “user” space
Session x Not a session in the traditional Web sense x not stateful (although it might be in future!) x 3. 0 introduces cookie-based authentication x global object which provides access to: xcurrent language xgeneric rendering functions x. CGI parameters (input from forms etc. ) xhttp request x Always create a session object at the beginning of your script x don’t forget to terminate it at the end
Creating & Ending a Session x new(mode, param) xset mode to 0 for online session (CGI script) xuses language from cookie, http headers, or default language xset mode to 1 for offline session (cmd line script) xparam is the id of archive, uses default language x terminate() xterminate session, performing necessary cleanup
Web Page Building Blocks x make_doc_fragment() xcreate an empty XHTML document xfill it with things! x make_text(text) xcreate an XML Text. Node x make_element(name, attrs) xcreate an XHTML Element make_element("p", align => "right") <p align=”right” />
Web Page Building Blocks x render_link(uri, target) xcreate an XHTML link = session-> render_link("foo. html", "frame 1") link->append. Child(session-> make_text("Foo")) <a href=”foo. html” target=”frame 1”>Foo</a>
Web Page Building Blocks x Many methods for building input forms, including: xrender_form(method, dest) xrender_option_list(params) xrender_hidden_field(name, value) xrender_upload_field(name) xrender_action_buttons(buttons) x. . .
Web Pages x build_page(title, body) xwraps your XHTML document in the archive template x send_page() xflatten page and send it to the user
Language x change_lang(langid) xchange the session language to the given language ID x phrase(phraseid, inserts) xget given phrase (as a string) in the current language xlooks up phraseid in language-specific phrase file xe. g. phrases-en. xml xlets look at an example of the inserts parameter. . .
Language x html_phrase(phraseid, inserts) x render an XHTML phrase in the current language session->html_phrase( 'link_to_google', link => session-> render_link(“http: //www. google. com”)) x gets phrase link_to_google from phrases-en. xml <ep: phrase><ep: pin ref="link">Search Google</ep: pin></ep: phrase> <a href="http: //www. google. com">Search Google</a>
User Input x have_parameters x true if parameters (POST or GET) were passed to the CGI script (e. g. from an input form) x param(name) x get the value of a parameter passed to the CGI script x get_action_button() x get the id of the button the user pressed x client x get the name of the user’s browser
Navigating the API
- Slides: 88