University of Sheffield NLP GATE development hints Reporting

  • Slides: 27
Download presentation
University of Sheffield NLP GATE development hints • • Reporting bugs Submitting a patch

University of Sheffield NLP GATE development hints • • Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP Bugs, feature requests • Use the tracker on Source. Forge

University of Sheffield NLP Bugs, feature requests • Use the tracker on Source. Forge § http: //sourceforge. net/projects/gate/support • Give as much detail as possible § GATE version, build number, platform, Java version (1. 5. 0_15, 1. 6. 0_03, etc. ) § Steps to reproduce § Full stack trace of any exceptions, including "Caused by…" • Check whether the bug is already fixed in the latest nightly build

University of Sheffield NLP Patches • Use the patches tracker on Source. Forge •

University of Sheffield NLP Patches • Use the patches tracker on Source. Forge • Best format is an svn diff against the latest subversion § Save the diff as a file and attach it, don't paste the diff into the bug report. • We generally don't accept patches against earlier versions

University of Sheffield NLP Patches (2) • GATE must compile and run on Java

University of Sheffield NLP Patches (2) • GATE must compile and run on Java 5 § Not sufficient to set source="1. 5" and target="1. 5" but compile on Java 6 § This doesn't prevent you calling classes/methods that don't exist in 5 • Test your patch on Java 5 before submitting

University of Sheffield NLP The User Guide • Everything in GATE is (theoretically) documented

University of Sheffield NLP The User Guide • Everything in GATE is (theoretically) documented in the GATE User Guide § http: //gate. ac. uk/userguide • Every change to the core should be mentioned in the change log § http: //gate. ac. uk/userguide/chap: changes • User guide is written in La. Te. X

University of Sheffield NLP Updating the user guide • Lives in subversion § https:

University of Sheffield NLP Updating the user guide • Lives in subversion § https: //gate. svn. sourceforge. net/svnroot/gate/ userguide/trunk • Build requires pdflatex, htlatex (tex 4 ht package), sed, make, etc. § On Windows, use Cygwin • Download http: //gate. ac. uk/sale/big. bib and put in directory above the. tex files

University of Sheffield NLP Updating the user guide (2) • Edit the. tex files

University of Sheffield NLP Updating the user guide (2) • Edit the. tex files • Graphics, screenshots, etc. should be. png • Check in changes to. tex files, the PDF and HTML are regenerated automatically by…

University of Sheffield NLP Hudson • Continuous integration platform • Automatically rebuilds GATE and

University of Sheffield NLP Hudson • Continuous integration platform • Automatically rebuilds GATE and user guide (among others) whenever they change • Also does a clean build of GATE every night § Nightly builds published at http: //gate. ac. uk/download/snapshots

University of Sheffield NLP Hudson • Junit test results available for each build •

University of Sheffield NLP Hudson • Junit test results available for each build • http: //gate. ac. uk/hudson

Running GATE Embedded in Tomcat (or any multithreaded system) Issues and tricks

Running GATE Embedded in Tomcat (or any multithreaded system) Issues and tricks

University of Sheffield NLP Introduction • Scenario: § Implementing a web service (or other

University of Sheffield NLP Introduction • Scenario: § Implementing a web service (or other web application) that uses GATE Embedded to process requests. § Want to support multiple concurrent requests § Long running process - need to be careful to avoid memory leaks, etc. • Example used is a plain Http. Servlet § Principles apply to other frameworks (struts, Spring MVC, Metro/CXF, Grails…)

University of Sheffield NLP Setting up • GATE libraries in WEB-INF/lib § gate. jar

University of Sheffield NLP Setting up • GATE libraries in WEB-INF/lib § gate. jar + JARs from lib • Usual GATE Embedded requirements: § § A directory to be "gate. home" Site and user config files Plugins directory Call Gate. init() once (and only once) before using any other GATE APIs

University of Sheffield NLP Initialisation using a Servlet. Context. Listener • Servlet. Context. Listener

University of Sheffield NLP Initialisation using a Servlet. Context. Listener • Servlet. Context. Listener is registered in web. xml <listener> <listener-class>gate. web. . example. Gate. Init. Listener</listener-class> </listener> • Called when the application starts up public void context. Initialized(Servlet. Context. Event e) { Servlet. Context ctx = e. get. Servlet. Context(); File gate. Home = new File(ctx. get. Real. Path("/WEB-INF")); Gate. set. Gate. Home(gate. Home); File user. Config = new File(ctx. get. Real. Path("/WEB-INF/user. xml")); Gate. set. User. Config. File(user. Config); // site config is gate. Home/gate. xml // plugins dir is gate. Home/plugins Gate. init(); }

University of Sheffield NLP GATE in a multithreaded environment • GATE PRs are not

University of Sheffield NLP GATE in a multithreaded environment • GATE PRs are not thread-safe § Due to design of parameter-passing as Java. Bean properties • Must ensure that a given PR/Controller instance is only used by one thread at a time

University of Sheffield NLP First attempt: one instance per request • Naïve approach -

University of Sheffield NLP First attempt: one instance per request • Naïve approach - create new PRs for each request public void do. Post(request, response) { Processing. Resource pr = Factory. create. Resource(. . . ); try { Document doc = Factory. new. Document(get. Text. From. Request(request)); try { // do some stuff } finally { Factory. delete. Resource(doc); } } Many levels of nested try/finally: ugly but finally { necessary to make sure we clean up even Factory. delete. Resource(pr); when errors occur. You will get very used } to these… }

University of Sheffield NLP Problems with this approach • Guarantees no interference between threads

University of Sheffield NLP Problems with this approach • Guarantees no interference between threads • But inefficient, particularly with complex PRs (large gazetteers, etc. ) • Hidden problem with JAPE: § Parsing a JAPE grammar creates and compiles Java classes § Once created, classes are never unloaded § Even with simple grammars, eventually Out. Of. Memory. Error (Perm. Gen space)

University of Sheffield NLP Second attempt: using Thread. Locals • Store the PR/Controller in

University of Sheffield NLP Second attempt: using Thread. Locals • Store the PR/Controller in a thread local variable private Thread. Local<Corpus. Controller> controller = new Thread. Local<Corpus. Controller>() { protected Corpus. Controller initial. Value() { return load. Controller(); } }; private Corpus. Controller load. Controller() { //. . . } public void do. Post(request, response) { Corpus. Controller c = controller. get(); // do stuff with the controller }

University of Sheffield NLP Better than attempt 1… • Only initialise resources once per

University of Sheffield NLP Better than attempt 1… • Only initialise resources once per thread • Interacts nicely with typical web server thread pooling • But if a thread dies, no way to clean up its controller § Possibility of memory leaks

University of Sheffield NLP A solution: object pooling • Manage your own pool of

University of Sheffield NLP A solution: object pooling • Manage your own pool of Controller instances • Take a controller from the pool at the start of a request, return it (in a finally!) at the end • Number of instances in the pool determines maximum concurrency level

University of Sheffield NLP Simple example private Blocking. Queue<Corpus. Controller> pool; public void init()

University of Sheffield NLP Simple example private Blocking. Queue<Corpus. Controller> pool; public void init() { pool = new Linked. Blocking. Queue<Corpus. Controller>(); for(int i = 0; i < POOL_SIZE; i++) { pool. add(load. Controller()); } } public void do. Post(request, response) { Corpus. Controller c = pool. take(); try { // do stuff } finally { pool. add(c); } } Blocks if the pool is empty: use poll() if you want to handle empty pool yourself public void destroy() { for(Corpus. Controller c : pool) Factory. delete. Resource(c); }

University of Sheffield NLP Exporting the grunt work the Spring Framework • Spring Framework

University of Sheffield NLP Exporting the grunt work the Spring Framework • Spring Framework § http: //www. springsource. org/ § Handles application startup and shutdown § Configure your business objects and connections between them using XML § GATE provides helpers to initialise GATE, load saved applications, etc. § Built-in support for object pooling § Web application framework (Spring MVC) § Used by other frameworks (Grails, CXF, …)

University of Sheffield NLP Initialising GATE with Spring <beans xmlns="http: //www. springframework. org/schema/beans" xmlns:

University of Sheffield NLP Initialising GATE with Spring <beans xmlns="http: //www. springframework. org/schema/beans" xmlns: gate="http: //gate. ac. uk/ns/spring"> <gate: init gate-home="/WEB-INF" plugins-home="/WEB-INF/plugins" site-config-file="/WEB-INF/gate. xml" user-config-file="/WEB-INF/user-gate. xml"> <gate: preload-plugins> <value>/WEB-INF/plugins/ANNIE</value> </gate: preload-plugins> </gate: init> </beans>

University of Sheffield NLP Loading a saved application <gate: saved-application id="my. App" location="/WEB-INF/application. xgapp"

University of Sheffield NLP Loading a saved application <gate: saved-application id="my. App" location="/WEB-INF/application. xgapp" scope="prototype" /> • scope="prototype" means create a new instance each time we ask for it § Default is singleton - one and only one instance

University of Sheffield NLP Spring servlet example • Spring provides Http. Request. Handler interface

University of Sheffield NLP Spring servlet example • Spring provides Http. Request. Handler interface to manage servlet-type objects with Spring • Declare an Http. Request. Handler. Servlet in web. xml with the same name as the Spring bean

University of Sheffield NLP Spring servlet example • Write the handler assuming singlethreaded access

University of Sheffield NLP Spring servlet example • Write the handler assuming singlethreaded access § Will use Spring to handle pooling for us public class My. Handler implements Http. Request. Handler { public void set. Application(Corpus. Controller app) {. . . } public void handle. Request(request, response) { Document doc = Factory. new. Document(get. Text. From. Request(request)); try { // do some stuff with the app } finally { Factory. delete. Resource(doc); } } }

University of Sheffield NLP Tying it together • web. xml <!-- set up Spring

University of Sheffield NLP Tying it together • web. xml <!-- set up Spring --> <listener-class> org. springframework. web. context. Context. Loader. Listener </listener-class> </listener> <!-- servlet --> <servlet-name>main. Handler</servlet-name> <servlet-class> org. springframework. web. context. support. Http. Request. Handler. Servlet </servlet-class> </servlet>

University of Sheffield NLP Tying it together (2) • application. Context. xml <gate: init.

University of Sheffield NLP Tying it together (2) • application. Context. xml <gate: init. . . /> <gate: saved-application id="my. App" location="/WEB-INF/application. xgapp" scope="prototype" /> <bean id="my. Handler. Target" class="my. pkg. My. Handler" scope="prototype"> <property name="application" ref="my. App" /> </bean> <bean id="handler. Target. Source" class="org. springframework. aop. target. Commons. Pool. Target. Source"> <property name="target. Bean. Name" value="my. Handler. Target" /> <property name="min. Idle" value="3" /> <property name="max. Idle" value="3" /> <property name="when. Exhausted. Action. Name" value="WHEN_EXHAUSTED_BLOCK" /> </bean> <bean id="main. Handler" class="org. springframework. aop. framework. Proxy. Factory. Bean"> <property name="target. Source" ref="handler. Target. Source" /> </bean>