Lattes Miner a Multilingual DSL for Information Extraction
- Slides: 25
Lattes. Miner: a Multilingual DSL for Information Extraction from Lattes Platform 11 th Workshop on Domain-Specific Modeling Alexandre Donizeti Alves Horacio Hideki Yanasse Nei Yoshihiro Soma October 24, 2011
Introduction Lattes Platform is an information system implanted by CNPq (National Council for Scientific and Technological Development) to manage information on science, technology and innovation related to researchers and institutions in Brazil This platform is undoubtedly the major source of information available on Brazilian researchers
Introduction: Lattes Platform http: //lattes. cnpq. br
Introduction The Lattes CV system, a curricular information system, is the main component of the platform Currently, the Lattes CV system stores around 2, 000 curricula of researchers, lectures, students and professionals from diverse areas of knowledge
Introduction: Lattes CV system Jorge Almeida Guimaraes http: //buscatextual. cnpq. br/buscatextual
Introduction: Lattes curriculum (English)
Introduction: Lattes curriculum (English)
Introduction: Lattes curriculum (Portuguese)
Introduction In the last years, many works were developed using data extracted from Lattes Platform of researchers of different areas of knowledge A common problem presented in these works is that the curricula and the information extracted had to be obtained manually
Introduction Therefore, this system has a very high quality information extraction potential
Lattes. Miner is an internal multilingual DSL for automatic information extraction from Lattes curricula It is composed by a set of classes written in Java that allows developers to implement their own applications with a high-level abstraction and expression power
Lattes. Miner Data Acquisition is responsible for downloading the Lattes curricula of the researchers from Lattes CV system on the Web. The Data is responsible for the identification Data. Visualization Extraction is component the main component of Lattes. Miner. It is The Analysis component is responsible fororthe analysis of the The. Data extracted data can be stored in XML files in any database and visualization of the academic social networks. These responsible for extracting data from the HTML files. Thenetworks techniqueare data extracted and also for. Data the Structure analysis ofcomponent. the relationships identified. using the byextraction checking the relationships researchers. of identified information based on regularbetween expressions was used. Data Discovery is used to find the (ID) number of the researchers. Usually, only the name of the researcher is available.
Lattes. Miner Perfil Banca The Lattes. Miner class is composed lattes. miner. brby instances of classes Biodata and Board, in addition to many others not presented here. Biodata. IE lattes. miner. en Lattes. Miner lattes. miner. ie Board lattes. miner Board. IE Biodata. Dao Board. Dao lattes. miner. dao
Lattes. Miner was created through a fluent interface, that provides a compact and yet easy-read representation of the domain problem Fluent interfaces are implemented using the method chaining Lattes. Miner makes use of static factory methods and imports
Case Study For the following examples researchers of the Computer Science area with CNPq Research Productivity Scholarship were considered. The list contains all the names of the researchers. However, their corresponding (ID) number are not provided. http: //plsql 1. cnpq. br/divulg/RESULTADO_PQ_102003. curso
Listing 1 Java application code import java. util. *; import lattes. util. Util; import static lattes. miner. Lattes. Miner. *; public class Listing 1 { public static void main(String[] args) { List<String> list = new Array. List<String>(); for (String name : Util. get. List("names. txt")) list. add( search(name) ); } } Util. set. List(list, "ids. txt");
Listing 2 Code fragment used to download the lattes curricula of the researchers. dir("cvs"); for (String id : Util. get. List("ids. txt")) download(id). save();
Listing 3 This listing shows as to extracted data from Lattes curricula of the researchers. props("mysql"); for (String id : Util. get. List("ids. txt")) { load(id). biodata(). address(); publications( JOURNAL ). save(); }
Listing 4 Code fragment to illustrate how the Lattes. Miner is used to extract information in different languages. for (String id : Util. get. List("ids. txt")) { // Portuguese for (Banca b : carregar(id). bancas(). get. Bancas() ) { if ( b. ano() == 2010 ) System. out. println( b. aluno() ); } // English for (Board b : load(id). boards(). get. Boards() ) { if ( b. year() == 2010 ) System. out. println( b. student() ); } }
Results The SUCUPIRA is a system for identification and visualization of academic social networks. Here is shows the geographical distribution of the five researchers that have published more articles in scientific journals.
Results This a graph of contacts of the five researchers that Theisgraph depicts an academic social network of the have published morefive in scientific journals. researchers. Nodes are presented with the name of researcher The color of the edges represent the number of relationships among researchers.
Conclusions Currently, the Lattes curricula are available in HTML format Lattes. Miner however does not depend on the data format because it allows users to program their own applications with a high-level abstraction If the data format is eventually modified, the DSL interface remains the same
Conclusions An advantage of Lattes. Miner is that it searches by the name of the researcher Lattes. Miner is multilingual Another advantage is that the data extracted can are stored in a structural format (XML or database), allowing these data to be easily used by others applications
Future work The future step that is already being implemented in the Lattes. Miner DSL is a statistical analysis of the data
ACNOWLEDGMENTS
- "lattes http lattes cnpq"
- Dsl 1 dsl 2
- Multilingual product information
- Dsl adalah
- Leone lattes contribution to forensic science
- Father of forensic toxicology
- Extrator lattes
- Marcelo marcos morales
- Leone lattes biography
- Leone lattes forensics
- Adriana bauer lattes
- Matthew orfila
- Meni miner
- Exam miner 42
- Pjm data miner 2
- Janice miner holden
- Max miner algorithm
- What do the hawks symbolize in the alchemist
- Newly hired experienced miner training
- Qda miner
- Confluence miner
- Kenora daily miner and news
- Curvehash miner
- Sander leemans
- Utep minermall
- Minor.arc