The SCANCSV LUA library Jaroslav Hajtmar Apology Englishspeaking
The SCANCSV. LUA library Jaroslav Hajtmar
Apology English-speaking participants Sorry, but this talk is only in Czech. Due to my language skills I would probably just did not know enough to say everything important. I will try to at least the guide slideshow in English. Thanks for your understanding.
Abstract In the data processing are often used data, stored in CSV (Comma Separated Values) files. The presentation will describe the author's library Scan. CSV. lua, the method of its formation and will be demonstrated practical examples of its use in Con. Te. Xt MKIV. Author shows how easily and quickly create print reports, letters, forms, certificates, invitations, cards, business cards, double-sided cards, tables, animations etc. using external texts CSV databases. Users of Con. Te. Xt MKIV (but Lua. LATEX and Lua. TEX too) can through the library practicaly use data from external CSV tables in own documents through Te. X macros built on library and have this data available in an attractive and very simple and natural way.
Introduction • SCANCSV. LUA library – easy way to use text database data stored in external CSV files in Con. Te. Xt Mk. IV (in Lua. La. Te. X and Lua. Plain is working too). • Easily create documents Lua. Te. X which handle multiple data (CSV simple database). • Varied uses: printing of various forms, collective letters, certificates, invitations, cards, business cards, double-sided cards, tables, animations etc. • Main objective : easy to use without knowledge of Lua, use in Lua. La. Te. X and Lua. Plain. Te. X too, access CSV data by Te. X macros built in library functions (without Lua code), motivate other users to use Lua. Te. X.
• CSV data format and SCANCSV. LUA Exchange data, export to CSV (f. e. the My. SQL database), a simpler alternative to the XLM, easy handling (sorting and editing), spreadsheets (Excel, Calc, Gnumeric, . . . ) • Description CSV format generally • CSV format suitable for SCANCSV. LUA: – file must be encoded in UTF-8! (Exported XLS files to be recoded – handicap) – Field separators: basically anything, default value is ; semi-colon (MS Excel) – Spacers fields: anything, left and right may be different (most often "quotes), the default value is without spacers! – The parsing algorithm SCANCSV. LUA is very simple (although it can be freely adjusted) => limitation (if set spacers must be used everywhere - in general, it needs not to be)
SCANCSV – history, inspiration • 2005 - appearance macro scanbase. tex of Petr Olšák. Macro processing text files in a certain format. • Petr Olšák modify and generalized the macro scanbase. tex to macro scancsv. tex – it processing text files in CSV format. I used it in plain. Te. X to 2008. • 2008 - modification of macro for La. Te. X (Jaromír Kuben) and for Con. Te. Xt (Petr Olšák). I used it in Con. Te. Xt Mk. II today. • 2010 - I began to use Con. Te. Xt Mk. IV. Original macro does not work. Con. Te. Xt is working with character set UTF 8, but macro is unable to process this character set. • March 2010 – my familiarization with Lua. Te. X, Lua language and I start creating the library scancsv. lua. First version was practicaly useless. • July 2010 – first real applicable version • today – improvements, tuning and expansion of options
The operating principle of the library 1. Load library scancsv. lua (single Lua code in the source text, context). 2. Optional settings flag header, separator elements, and spacers (Otherwise, the default value). 3. Opening CSV file (different ways). 4. Loading CSV table row (manually or in a cycle) 5. Parse row (column separation data). 6. Retrieving column data to Te. X macros. 7. Repeat steps 4 to 6 for all lines of CSV tables. Method of processing of first row of the table depends on whether it's "head" or not. After loading the column data in the macro data are available Con. Te. Xt. Rows can browse the "manually", using the standard cycles or macros of library.
Using in the "manual" mode • Load library directlua{dofile(scancsv. lua)} • Setting a flag header (when the head) setheader (or unset - resetheader) • Opening CSV file opencsvfile{file. csv} • Then, in source text, we use the macros c. A, c. B. . . (or Firstname, Lastname, . . . if line first line contains header Firstname, Lastname, …). These macros contains the column values of the current CSV row • Nextrow - go to the next table row (macro c. A, c. B. . . or Firstname, Lastname, … are filled with new values)
Main Te. X macros for using the library • • • setfiletoscan{CSVFile} – setting of name of CSV file setheader – set a flag header resetheader – unset a flag header setsep{, }, setld{*}, setrd{!} – setting of separator of columns and spacers of columns to user value (nondafault value) resetsep, resetld, resetrd – unset to default values opencsvfile{CSVFile}, openheadercsvfile{CSVFile} nextrow – go to to next row of CSV file printline, printall – print all of line / all of CSV table filelineaction, filelineaction{CSVfile}{to}, filelineaction{CSVfile}{from}{to} – macros for processing of userdefined macro lineaction in a cycle
Te. X macros for accessing of columns data CSV file without Header (default option - resetheader) c. A c. B c. C c. D … 1; Petr; Novák; 19. 5. 1989; m; Nymburk; U Brány 7 2; Jan; Novotný; 5. 7. 1991; m; Praha; Uhlířská 178 3; Zuzana; Vašíčková; 13. 9. 1984; ž; Ostrava; Jánská 14 … resetheader no header data lines CSV file with Header (switch with setheader) c. A = Surname c. B = Firstname c. C = Birthdate … Surname; Firstname; Birthdate; Sex; City; Zipcode; Street Novák; Jan; 14. 10. 1997; m; Zbečno; 27024; Farní 21 Pospíšilová; Hana; 4. 1. 1996; ž; Zábřeh; 78901; Studénky 420 … setheader Header (no data) data lines Possibility set of Roman numbers of columns: c. I, c. III, c. IV, … (defalut User. Column. Numbering=‘XLS’)
Te. X macros to obtain „system“ information • • • csvfilename – name of actual open CSV file numcols – number of columns of the CSV table numrows – number of processed (offered) lines numline – the serial number of the currently loaded row csvreport – Report information on open CSV file Hooks for data processing (default relax) • blinehook, elinehook – begin line hook, end l. h. – macros are executed before and after processing row macro lineaction (ie CSV table row) • bfilehook, efilehook – performed before and after processing the entire CSV table • bch, ech – begin column hook, end c. h. - can be manually set in lua code, because of the impossibility of testing the macro, this option is disabled Te. X IF for testing EOF CSV file • if. EOF – TRUE, if we get to the end of processing a CSV file • ifnot. EOF – opposite if. EOF
Using „manual“ mode • In the source code we use the macros c. A, c. B, or. . . Firstname, Lastname, . . . (if first line contains a header) containing a column value of the current CSV row. Nextrow - go to the next table row (macros c. A, c. B. . . are filled with new values)
Modification of functions of library • Default settings can be changed by editing the file scancsv. lua - in the introductory section of code • During the processing of Con. Te. Xt MKIV (Lua. La. Te. X) can continuously change settings separator, spacers, headers, using Te. X macros. . . • Possibility of processing different CSV files in one document (with different dividers and spacers columns) • Use Hooks – default are relax
Main Lua library functions • Parse. CSVdata() -- Functions for parsing of individual records (rows) CSV table • lineaction() -- processing of user macro lineaction according to the specified range of lines at the open CSV file • Create. Page. Files() -- create a two CSV files from open CSV file. It would by used to print double -sided cards, printed on the page in block R x C (the "Reposition" CSV file from the 2 nd page so that the front and back of the tiles match) • Filelineactioncards() – printing 1 st and 2 nd sides of list of cards from the files created by the previous function • • • CSVReport() – get report information about open CSV file csvfilename() – name of actualy open CSV file TMN(s) – (Te. X Macro Name). Macro name must not contain prohibited characters ar 2 rom() -- Convert Arabic numbers to Roman. Used for "numbering" column in the macro ar 2 xls() -- convert numbers to the column name (Excel format) ar 2 colnum() -- podle nastavení glob. proměnné vrací označení sloupce Te. Xového makra printline() -- vypíše aktuální řádek CSV tabulky printall() -- vypíše celou CSV tabulku printallcontext() -- vypíše celou CSV tabulku v Con. Te. Xtové syntaxi
Testing and cycles Conditions with AND and OR (see Olšák TBN) % Condition A AND B doloop{ ifnumId>2 ifnumId<10lineaction fi if. EOFexitloopelsenextrowif. EOFexitloopfifi } % Condition A OR B defAor. B{lineaction} doloop{ ifnumId=1Aor. B% elseifnumId>3Aor. Bfi if. EOFexitloopelsenextrowif. EOFexitloopfifi }
SCANCSV. LUA and cycles Examples of Con. Te. Xt cycles: dorecurse{5}{lineactionnextrow} - lineaction macro for next 5 rows doloop{lineactionnextrowifnumnumline>7exitloopfi} doloop{if. EOFexitloopelselineactionnextrowfi} doloop{lineactionnextrow ifId 3 exitloop fi} Examples of library cycles (only in test version SCANCSV. LUA): The macros are based on doloop macro to easier use in source code. doloopwhile{Trida}{3. A}{tableaction} % List all meet the criterion doloopuntil{Trida}{3. A}{tableaction} % list until it is not satisfied doloopforall{lineaction} – for all lines will lineaction macro doloopfromto{3}{7}{lineaction} doloopaction – without parameter done for all rows macro lineaction. doloopaction{useraction} – done for all rows user macro useraction doloopaction{useraction}{5} – for the first 5 rows will doing useraction macro doloopaction{useraction}{5}{7} - for rows 5 -7 will doing useraction macro
Practical demonstrations of the use of libraries Forms, multiple letters, etc. Cards, business cards, … Tables Metapost animation Use Con. Te. Xtových cycles, IF tests SCANCSV. LUA "drifts" (Te. X macros in a CSV file, change lineaction during processing CSV) • Samples of work for CTM & TE • • •
Constraints, compatibility, flaws • SCANCSV. LUA does not handle general CSV files. Reason: The parsing algorithm is very simple. If the item contains a column separator “, ” the CSV output is current: 1, Jan, Novotny, "The Gate 4, 111 50 Prague", . . . Solution: an better (general) algorithm that. Only suffice change Parse. CSVdata function (). • Occasional problems with the expansion. Eg. I failed to get SCANCSV. LUA in the module database (usemodule [database]) Mojca Miklavec • Some things work only in Con. Te. Xt
Possibilities of improvement. . . • Improvements and generalizations parsing algorithm • Using for XML processing? ? • Create a separate module ONLY FOR MKIV (gone for a number of limitations Lua. La. Te. X)
Thanks… • Members of mail conference ntg-context@ntg. nl for advice about Con. Te. Xt and Lua. The library would not have given their kind assistance. Special thanks to Taco Hoekwater, Hans Hagen, Wolfgang Schuster. • Members of mail conference cstex@cs. felk. cvut. cz for advice about Te. X and La. Te. X. Especially Mr. Zdenek Wagner, Vit Zýka, Pavel Stříž, Petr Olšák. . • Pavel Stříž for inspiration, testing, advice and for convincence to me to finish the library and presented at this conference.
Discussion
- Slides: 21