Job Vacancies Experiment Boro Niki Satellite workshop on
Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015
Job Vacancies experiment (1) - Idea about the experiment: Rome Workshop (May, 2014) - Started with identifying web sites which advertise jobs - and searching for available APIs for websites - UNECE Task Team consisted of representatives from Austria, Hungary, Italy, Netherlands, Sweden and Slovenia 2
Job Vacancies experiment (2) Goals: - Overview of the methodologies of calculation of JV statistics at NSIs - Identification of possible web scraping tools - Determination of BD methodology of calculation of JV statistics - Testing the BD quality indicators proposed by UNECE Quality Task Team 3
Overview of the methodologies of calculation of JV statistics at NSIs By EU regulation it is prescribed to publish quarterly statistic on JV data: - Totals of advertised JV on national level - Totals on domains defined by size of units - Totals on domains defined by NACE activity groups Documents on Wiki: • http: //www 1. unece. org/stat/platform/pages/vie wpageattachments. action? page. Id=100303739 &metadata. Link=true 4
Identification of web scraping tools Tools: http: //www. irobotsoft. com/ https: //www. kimonolabs. com 5
Aim of the Irobot tool • IRobot. Soft for Visual Web Scraping • IRobot. Soft is a visual Web robot software for Web scraping and Web automation. With IRobot. Soft, you can scrape tons of data from the deep Web with a single click! You don't need to have computer skills to do this! IRobot. Soft is for Everyone! Follow our discussions and become a Web geek! • for novice data collectors • for Web testers • for data experts Link: http: //www. irobotsoft. com/ 6
Basic Steps 1. Define the name of the Irobot 2. Define the name of the Task 3. Copy and paste the link of desired website into the URL 4. Start Recording Actions 5. Give names to the „scraped“ variables 6. Save the variables 7. Use the option „Repeat Property“ 7
Determination of BD methodology of calculation of JV statistics (1) - Cleaning of data - Methodology for the replacement of existing statistics (on the level of NSi) - Methodology for the calculation of new statistics (international level) 8
Interface with the parameters 9
Determination of BD methodology of calculation of JV statistics (2) All the documentation about the experiment could be found on: http: //www 1. unece. org/stat/platform/pages/viewpageattach ments. action? page. Id=100303739&metadata. Link=true Document: Information which could be extracted from the Slovenian Websites and the proposed statistics for the job vacancies. doc 10
Determination of BD methodology of calculation of JV statistics (3) One of the step in the statistical processing of JV data is assigning the ID of the Legal Unit from the Business Register. Linking the ID to the „scraped“ unit enables us to get the information about the activity and size of Le. U (according to number of employees) 11
„Scraped“ data Name_Le. U Tel numb Mob_numb Town AR PLANE d. o. o. 03 -809 -4100 040 383840 Bistrica ob Sotli Savatech, d. o. o. ARENDA d. o. o. Knauf Insulation d. o. o AVIAT d. o. o. VIP Virant d. o. o 04 5114 219 Kranj Ljubljana Škofja Loka Trzin Komenda Street Trata Streat_numb Postal_code 32 4220 12
„Matched“ data iskani Name_Le. U 1 AR PLANE d. o. o. 1 APLANE d. o. o. Town_BR id complete_nmae nace_code BISTRICAOBSOTLI ZAGAJ SOLKAN 3290476000 3307611000 70. 220 30. 300 1474238 1034269 0 8 1 ARTPLANET 1 ARTPLAN, d. o. o. SLOVENSKABISTRICA KRANJ 3498417000 6188265000 72. 200 31. 010 2429891 15 21 1 ARPLAN, ANŽE REZAR s. p. 1 AL PLANET, Dejan Janež s. p. 1 AR-AL NET d. o. o. 1 ARTLINE d. o. o. 2 Savatech, d. o. o. PROSENIŠKO SEŽANA ČENTIBA MENGEŠ KRANJ 3761843000 3356892000 6072526000 5333644000 AR PLANE, korporacijsko upravljanje in pravna pisarna, d. o. o. Letalska družba APLANE d. o. o. ARTPLANET, zavod za razvoj umetnosti, kulture in kakovosti življenja, Slovenska Bistrica ARTPLAN, proizvodnja in trgovina d. o. o. ARPLAN, projektiranje, inženiring, svetovanje in storitve v gradbeništvu, ANŽE REZAR s. p. AL PLANET, Stavbno pohištvo iz aluminija, Dejan Janež s. p. AR-AL NET, trgovina in posredništvo d. o. o. ARTLINE, studio za oblikovanje, d. o. o. 2 SAVATECH d. o. o. 2 SAITECH d. o. o. KRANJ CELJE 1661205000 5311292000 2 SAVA TMC, d. o. o. 2 ASTECH d. o. o. 2 AVTECH D. O. O. 2 SANOTECHNIK d. o. o. 3 ARENDA d. o. o. 3 OPTIKA ARENA d. o. o. 3 PEKARNA ARENA d. o. o. LJUBLJANA LOGATEC VIDRGA MARIBOR LJUBLJANA 1893718000 1661078000 3282058000 5850908000 3 ARENA SERVIS d. o. o. 3 ADENDA d. o. o. 3 AGENDA d. o. o. 3 RANDA d. o. o. OSLUŠEVCI MIREN MARIBOR LJUBLJANA 6318797000 5743729000 5656222000 6011624000 3 AGENDA 2003 d. o. o. LJUBLJANA 1824775000 1629417000 1873512000 3918076000 71. 129 25. 120 47. 910 73. 110 1417055 25 26 28 28 2404555 1428363 0 21 2585325 1617965 284552 1490149 21 25 25 27 68. 200 47. 781 10. 710 1242548 499981 2313488 0 10 10 77. 390 18. 130 62. 020 41. 200 1365580 163187 1890496 10 16 16 20 69. 200 63849 24 SAVATECH družba za proizvodnjo in trženje gumenotehničnih proizvodov in pnevmatike, d. o. o. 22. 190 SAITECH podjetje za trgovino in storitve d. o. o. 43. 290 SAVA TURIZEM - TMC, podjetje za upravljanje dejavnosti turizem, d. o. o. 70. 100 ASTECH d. o. o. , Inženiring in servisiranje strojnih instalacij 43. 220 AVTECH, SVETOVANJE, ZASTOPSTVO, PROIZVODNJA, D. O. O. 70. 220 SANOTECHNIK trgovsko podjetje d. o. o. 46. 730 ARENDA, nepremičninska družba, d. o. o. OPTIKA ARENA, družba za trgovino in storitve d. o. o. PEKARNA ARENA, pekarstvo in trgovina, d. o. o. ARENA SERVIS, izposojanje šotorov, šankov in gostinske opreme ter gostinske storitve, d. o. o. ADENDA d. o. o. grafične storitve in oblikovanje AGENDA komunikacijski in informacijski inženiring d. o. o. RANDA gradbeništvo, storitve in prevozi d. o. o. AGENDA 2003 premoženjsko svetovanje in računovodske storitve d. o. o. adress. VID dist 1 2315474 930791 13
Testing the BD quality indicators proposed by Quality Team Quality framework consists of three quality hyperdimensions: input, throughput and output hyperdimension http: //www 1. unece. org/stat/platform/pages/viewpageattach ments. action? page. Id=101158888&metadata. Link=true 14
Conclusions (1) BD could be used as a source: • for new types of statistics • for existing statistics • for validation of existing statistics In case of scraping of JV data: • Change of mode of collection • Validation of data collected by traditional way (administrative sources, questionnaire • Flash statistics 15
Conclusions (2) Before the JV BD source is employed in regular statistical production the scraping tools, procedures of manipulation of data and statistics must be carefully tested in period of at least one year in order to ensure stability of sources and statistics. More about experiment can be found on http: //www 1. unece. org/stat/platform/display/BDP/Sandbox+ Task+Team 16
Thank you for your attention! 17
- Slides: 17