Writing a Perl XS swig interface to the

  • Slides: 28
Download presentation
Writing a Perl XS swig interface to the CLucene C++ text search engine Peter

Writing a Perl XS swig interface to the CLucene C++ text search engine Peter Edwards Perl XS and SWIG interface to CLucene C++ text search engine 1 9/26/2020

Introduction Peter Edwards ~ background l Subject ~ writing a Perl XS swig interface

Introduction Peter Edwards ~ background l Subject ~ writing a Perl XS swig interface to the CLucene C++ text search engine l Perl XS and SWIG interface to CLucene C++ text search engine 2 9/26/2020

Aims Give an idea of the process involved in selecting and using an external

Aims Give an idea of the process involved in selecting and using an external library from Perl l Introduction to extending Perl using XS, swig, GNU autotools l Entertainment Ø Audience: What is your background and interest? l Perl XS and SWIG interface to CLucene C++ text search engine 3 9/26/2020

Topics Understanding the Problem l The Answer (at a high level) l Technical Options

Topics Understanding the Problem l The Answer (at a high level) l Technical Options l Investigating Options l Writing a perl / C++ Interface l Layers and Components l Lessons Learned l Perl XS and SWIG interface to CLucene C++ text search engine Process Extending Perl 4 9/26/2020

Terms l Perl ~ Pathologically Eclectic Rubbish Lister l Perl XS ~ e. Xternal

Terms l Perl ~ Pathologically Eclectic Rubbish Lister l Perl XS ~ e. Xternal Subroutine l SWIG ~ Simplified Wrapper and Interface Generator l C++ ~ Object Oriented version of C programming language text search ~ boolean searching of stemmed words, wildcards CLucene ~ C++ text search engine based on Java Lucene l l $_ = "wftedskaebjgdpjgidbsmnjgc"; tr/a-z/oh, turtleneck Phrase Jar!/; print; allows a perl program to call a C language subroutine XS is also the “glue” language specifying the calling interface contains complex “perlguts” stuff that will destroy your sanity makes it easy to call a C/C++ library from many languages (perl, python, ruby, PHP…) Perl XS and SWIG interface to CLucene C++ text search engine 5 9/26/2020

Understanding the Problem Recruitment software written in Perl l 20, 000+ candidate Word CVs/resumes

Understanding the Problem Recruitment software written in Perl l 20, 000+ candidate Word CVs/resumes l Boolean searching using words or partial words and wildcards l e. g. (“BA” or “MA”) and “literature” l Combined with SQL searching e. g. geographic area, skill profile codes, pay rate Speed < 2 seconds l Old system used dt. Search proprietary s/w l Perl XS and SWIG interface to CLucene C++ text search engine 6 9/26/2020

The Answer (at a high level) Load l Convert candidate CVs from Word to

The Answer (at a high level) Load l Convert candidate CVs from Word to text using wv. Ware (Open. Office) converter l Index text against candidate no. Search l Search text -> cand nos -> SQL temp table l Normal SQL search on other criteria Perl XS and SWIG interface to CLucene C++ text search engine 7 9/26/2020

Technical Options (at 2003/4) Proprietary l dt. Search ~ cost; hard to get cand

Technical Options (at 2003/4) Proprietary l dt. Search ~ cost; hard to get cand nos out; Windows interface when perl app is Web Open Source l Java Lucene ~ slow but good API and power l C++ CLucene ~ alpha quality rewrite of Lucene in Visual C++ as degree project by Ben van Klinken l Perl CPAN (PLucene etc. ) below http: //search. cpan. org/modlist/String_Language_Text_Processing Perl XS and SWIG interface to CLucene C++ text search engine 8 9/26/2020

Investigating Perl Options l l l Wrote test harness to load 1000 CVs then

Investigating Perl Options l l l Wrote test harness to load 1000 CVs then do some searches Tried about 5 CPAN modules PLucene search speed okay for small volumes but exponential increase in insert time >60 seconds per insert l Why? Tokenises doc, multi-lingual word stemming, adds doc id to reverse lookup index for each stem token Other modules faster but search options weak Need to look further l Perl XS and SWIG interface to CLucene C++ text search engine 9 9/26/2020

Investigating CLucene Wrote similar C++ test harness l Speed good: search 20, 000 CVs

Investigating CLucene Wrote similar C++ test harness l Speed good: search 20, 000 CVs <1 second load 3 CVs per sec (mostly Word->text) l Code written as VC++ degree project and registered at Source. Forge l Jimmy Pritts changed layout and added GNU autoconf files configure. ac Makefile. in to let it build cross-platform on Windows, cygwin, Linux l Had C DLL interface used by PHP wrapper Decided to write Perl wrapper l Perl XS and SWIG interface to CLucene C++ text search engine 10 9/26/2020

Interfacing Perl to C++ l l l When I wrote this wrapper, Perl to

Interfacing Perl to C++ l l l When I wrote this wrapper, Perl to C++ interfacing via XS or SWIG was tricky and despite the optimism expressed at http: //www. johnkeiser. com/perl-xs-c++. html I had difficulties mapping the CLucene API to XS Reasons: C++ namespace mangling; object and method mapping; C++ memory garbage collection So I decided to go via the C DLL wrapper to hide this complexity Perl XS and SWIG interface to CLucene C++ text search engine 11 9/26/2020

Perl XS l l Always start with h 2 xs utility Code is C

Perl XS l l Always start with h 2 xs utility Code is C with macro extensions Write C code (XSUBs) Call internal Perl routines (perlguts) to create variables, allocate arrays… new. SViv(IV), sv_setiv(SV*, IV) ~ scalar integer variable l l Complicated Nyarlathotep / “Crawling Chaos” Perl XS and SWIG interface to CLucene C++ text search engine 12 9/26/2020

Enter SWIG Creates XS for you from a. i definition file l Parses C/C++.

Enter SWIG Creates XS for you from a. i definition file l Parses C/C++. h header files to get types and function prototypes l Allows for inline C/XS code l Perl XS and SWIG interface to CLucene C++ text search engine 13 9/26/2020

Swig XS Sample From argv. i // Creates a new Perl array and places

Swig XS Sample From argv. i // Creates a new Perl array and places a NULL-terminated char ** into it %typemap(out) char ** { AV *myav; SV **svs; int i = 0, len = 0; /* Figure out how many elements we have */ while ($1[len]) len++; svs = (SV **) malloc(len*sizeof(SV *)); for (i = 0; i < len ; i++) { svs[i] = sv_newmortal(); sv_setpv((SV*)svs[i], $1[i]); }; myav = av_make(len, svs); free(svs); $result = new. RV((SV*)myav); sv_2 mortal($result); argvi++; } Perl XS and SWIG interface to CLucene C++ text search engine 14 9/26/2020

Diagram of Layers Perl OO Wrapper Low Level Perl SWIG XS C Code C

Diagram of Layers Perl OO Wrapper Low Level Perl SWIG XS C Code C DLL Interface CLucene C++ Library CLucene. pm CLucene. Wrap. pm clucene_wrap. c SWIG generated clucene_dll. o clucene. so Perl XS and SWIG interface to CLucene C++ text search engine 15 9/26/2020

CLucene C++ Interface src/CLucene/search/Search. Header. h: #include "CLucene/Std. Header. h" #ifndef _lucene_search_Search. Header_ #define

CLucene C++ Interface src/CLucene/search/Search. Header. h: #include "CLucene/Std. Header. h" #ifndef _lucene_search_Search. Header_ #define _lucene_search_Search. Header_ #include "CLucene/index/Index. Reader. h“ … using namespace lucene: : index; namespace lucene{ namespace search{ //predefine classes class Searcher; class Query; class Hits; class Hit. Doc { public: float_t score; int_t id; lucene: : document: : Document* doc; Hit. Doc* next; Hit. Doc* prev; }; // in doubly-linked cache Hit. Doc(const float_t s, const int_t i); ~Hit. Doc(); Perl XS and SWIG interface to CLucene C++ text search engine 16 9/26/2020

CLucene C DLL Interface src/wrappers/dll/clucene_dll. h: #ifndef _DLL_CLUCENE #define _DLL_CLUCENE #include "CLucene/CLConfig. h" …

CLucene C DLL Interface src/wrappers/dll/clucene_dll. h: #ifndef _DLL_CLUCENE #define _DLL_CLUCENE #include "CLucene/CLConfig. h" … #ifdef _UNICODE //unicode methods # define CL_UNLOCK CL_U_Unlock # define CL_OPEN CL_U_Open # define CL_DOCUMENT_INFO CL_U_Document_Info # define CL_ADD_FILE CL_U_Add_File … CLUCENEDLL_API int CL_U_Unlock(const wchar_t* dir); CLUCENEDLL_API int CL_U_Delete(const int resource, const wchar_t* query, const wchar_t* field); CLUCENEDLL_API int CL_U_Add_Field(const int resource, const wchar_t* fie ld, const wchar_t* value, const int value_length, const int store, const ind ex, const int token); … Perl XS and SWIG interface to CLucene C++ text search engine 17 9/26/2020

SWIG Definition File clucene. i %module "Fulltext. Search: : CLucene. Wrap" %{ #include "clucene_dllp.

SWIG Definition File clucene. i %module "Fulltext. Search: : CLucene. Wrap" %{ #include "clucene_dllp. h" %} // our definitions for CLucene variables and functions %include "clucene_perl. h" //%include "clucene_dll. h" // could use this but then would need to call CL_N_Se arch not CL_SEARCH etc. %include typemaps. i %include argv. i // helper functions where pointers to result buffers are expected // would be better done with a %typemap(out) if I knew enough about perlguts %inline %{ int val_len; char * val; int CL_Get. Field 1(int resource, char * field) { return CL_GETFIELD(resource, field, &val_len); } … } Perl XS and SWIG interface to CLucene C++ text search engine 18 9/26/2020

SWIG-Generated XS CLucene. Wrap. pm # This file was automatically generated by SWIG package

SWIG-Generated XS CLucene. Wrap. pm # This file was automatically generated by SWIG package Fulltext. Search: : CLucene. Wrap; require Exporter; require Dyna. Loader; @ISA = qw(Exporter Dyna. Loader); package Fulltext. Search: : CLucene. Wrapc; bootstrap Fulltext. Search: : CLucene. Wrap; package Fulltext. Search: : CLucene. Wrap; @EXPORT = qw( ); # ----- BASE METHODS ------package Fulltext. Search: : CLucene. Wrap; sub TIEHASH { my ($classname, $obj) = @_; return bless $obj, $classname; } sub CLEAR { } … # ------- FUNCTION WRAPPERS -------package Fulltext. Search: : CLucene. Wrap; *CL_OPEN = *Fulltext. Search: : CLucene. Wrapc: : CL_OPEN; *CL_CLOSE = *Fulltext. Search: : CLucene. Wrapc: : CL_CLOSE; … # ------- VARIABLE STUBS -------package Fulltext. Search: : CLucene. Wrap; *clucene_perl = *Fulltext. Search: : CLucene. Wrapc: : clucene_perl ; *NULL = *Fulltext. Search: : CLucene. Wrapc: : NULL; *val_len = *Fulltext. Search: : CLucene. Wrapc: : val_len; *val = *Fulltext. Search: : CLucene. Wrapc: : val; *errstr = *Fulltext. Search: : CLucene. Wrapc: : errstr; … Perl XS and SWIG interface to CLucene C++ text search engine 19 9/26/2020

SWIG-Generated XS clucene_wrap. c #ifdef __cplus extern "C" { #endif XS(_wrap_CL_OPEN) { { char

SWIG-Generated XS clucene_wrap. c #ifdef __cplus extern "C" { #endif XS(_wrap_CL_OPEN) { { char *arg 1 ; int arg 2 = (int) 1 ; int result; int argvi = 0; d. XSARGS; if ((items < 1) || (items > 2)) { SWIG_croak("Usage: CL_OPEN(path, create); "); } if (!Sv. OK((SV*) ST(0))) arg 1 = 0; else arg 1 = (char *) Sv. PV(ST(0), PL_na); if (items > 1) { arg 2 = (int) Sv. IV(ST(1)); } result = (int)CL_OPEN(arg 1, arg 2); ST(argvi) = sv_newmortal(); sv_setiv(ST(argvi++), (IV) result); XSRETURN(argvi); fail: ; } } croak(Nullch); Perl XS and SWIG interface to CLucene C++ text search engine 20 9/26/2020

CLucene. pm Perl OO Wrapper Back into the realms of sanity l Normal OO

CLucene. pm Perl OO Wrapper Back into the realms of sanity l Normal OO package with methods l Calls XS wrapper functions l sub open { my $this = shift; my %arg = @_; my $path = $arg{path} || $this->{path} || confess "path undefined"; my $create = anyof ( $arg{create}, $this->{create}, 0 ); $this->{resource} = Fulltext. Search: : CLucene. Wrap: : CL_OPEN ( $path, $creat e) or confess "Failed to CL_OPEN $this->{path} create $create errst r ". $this->errstrglobal(); $this->{path} = $path; $this; } Perl XS and SWIG interface to CLucene C++ text search engine 21 9/26/2020

Build Environment Uses GNU autotools and m 4 macro processor Definition files l configure.

Build Environment Uses GNU autotools and m 4 macro processor Definition files l configure. ac ~ top level build definitions l Makefile. am ~ makefile flags definitions Programs l libtool ~ generalised library building l aclocal ~ builds aclocal. m 4 from configure. ac l autoconf ~ reads configure. ac to create configure script l autoheader ~ creates C header defines for configure l automake ~ creates Makefile. in from Makefile. am l l autoreconf ~ manually remake whole tree of GNU build files Perl XS and SWIG interface to CLucene C++ text search engine 22 9/26/2020

Bootstrap shell script #!/bin/sh # Bootstrap the CLucene installation. mkdir -p. /build/gcc/config set -x

Bootstrap shell script #!/bin/sh # Bootstrap the CLucene installation. mkdir -p. /build/gcc/config set -x libtoolize --force --copy --ltdl --automake aclocal autoconf autoheader automake -a --copy --foreign Perl XS and SWIG interface to CLucene C++ text search engine 23 9/26/2020

Autoconfigure. ac file dnl Process this file with autoconf to produce a configure script.

Autoconfigure. ac file dnl Process this file with autoconf to produce a configure script. dnl Written by Jimmy Pritts. dnl initialize autoconf and automake AC_INIT([clucene], [1]) AC_PREREQ([2. 54]) AC_CONFIG_SRCDIR([src/CLucene. h]) AC_CONFIG_AUX_DIR([. /build/gcc/config]) AC_CONFIG_HEADERS([config. h]) AM_INIT_AUTOMAKE dnl Check for existence of a C and C++ compilers. AC_PROG_CC AC_PROG_CXX dnl Check for headers AC_HEADER_DIRENT dnl Configure libtool. AC_PROG_LIBTOOL dnl option to use UTF-8 as internal 8 -bit charset to support characters in Unicodeâ ¢ AC_ARG_ENABLE(utf 8, AC_HELP_STRING([--enable-utf 8], [UTF-8 as internal 8 -bit charset to support characters in Unicodeâ ¢ (default=no)]), [AC_DEFINE([UTF 8], [use UTF-8 as internal 8 -bit charset to support characters in Unicodeâ ¢])], enable_utf 8=no) AM_CONDITIONAL(USEUTF 8, test x$enable_utf 8 = xyes) AC_CONFIG_FILES([Makefile src/Makefile examples/demo/Makefile examples/tests/Makefile examples/util/Makefile wrappers/dll/Makefile wrappers/dlltest/Makefile]) AC_OUTPUT Perl XS and SWIG interface to CLucene C++ text search engine 24 9/26/2020

Makefile. am files src/Makefile. am: AUTOMAKE_OPTIONS = 1. 6 . /Makefile. am: ## Makefile.

Makefile. am files src/Makefile. am: AUTOMAKE_OPTIONS = 1. 6 . /Makefile. am: ## Makefile. am -- Process this file with automake to produce Makefile. in include_HEADERS = CLucene. h INCLUDES = -I$(top_srcdir) lsrcdir = $(top_srcdir)/src/CLucene SUBDIRS = src wrappers examples. lib_LTLIBRARIES = libclucene. la libclucene_la_SOURCES = include CLucene/analysis/Makefile. am include CLucene/analysis/standard/Makefile. am include CLucene/debug/Makefile. am include CLucene/document/Makefile. am include CLucene/index/Makefile. am include CLucene/query. Parser/Makefile. am include CLucene/search/Makefile. am include CLucene/store/Makefile. am include CLucene/util/Makefile. am include CLucene/Makefile. am src/CLucene/document/Makefile. am: documentdir = $(lsrcdir)/document dochdir = $(includedir)/CLucene/document libclucene_la_SOURCES += $(documentdir)/Date. Field. cpp libclucene_la_SOURCES += $(documentdir)/Document. cpp libclucene_la_SOURCES += $(documentdir)/Field. cpp doch_HEADERS Perl XS and SWIG interface to CLucene C++ text search engine = $(documentdir)/*. h 25 9/26/2020

Recap We saw how and why I selected an external Perl library l We

Recap We saw how and why I selected an external Perl library l We looked at GNU autotools to provide a cross-platform build environment l We investigated the layers of code needed to interface perl to a C++ library ~ SWIG, C, XS inline helpers, low and high level Perl modules l Perl XS and SWIG interface to CLucene C++ text search engine 26 9/26/2020

Lessons Learned Start off a new external library using GNU autotools and keeping in

Lessons Learned Start off a new external library using GNU autotools and keeping in mind that the API should be easy to use through SWIG l Use SWIG not XS to wrap a C/C++ library l Always use h 2 xs to start a Perl extension l Open Source feedback and testing are more valuable than you expect (2 emails this week alone) l Perl XS and SWIG interface to CLucene C++ text search engine 27 9/26/2020

Where to Get More Information l Perl XS l l C++ / XS SWIG

Where to Get More Information l Perl XS l l C++ / XS SWIG l l Lucene CLucene Autoconf Book Ø Any Questions l These slides are at http: //perl. dragonstaff. com/ 2002) http: //en. wikipedia. org/wiki/XS_%28 Perl%29 http: //www. perl. com/doc/manual/html/pod/perlguts. html http: //www. johnkeiser. com/perl-xs-c++. html http: //en. wikipedia. org/wiki/SWIG http: //www. swig. org/ http: //en. wikipedia. org/wiki/Lucene http: //sourceforge. net/projects/clucene/ http: //www. gnu. org/software/autoconf/ “Extending and Embedding Perl”, Jenness & Couzens (Manning, Perl XS and SWIG interface to CLucene C++ text search engine 28 9/26/2020