Categorization of Library Function Call Patterns Noritoshi Atsumi

Categorization of Library Function Call Patterns Noritoshi Atsumi† Shinichiro Yamamoto‡ Kiyoshi Agusa† †Dept. of Information Engineering, Nagoya University ‡ Dept. of Information Science, Aichi Prefectural University

Outline • Introduction – Background – The problems on retrieving • Retrieval of know-how – FCDG (Function Call Dependency Graph) – Categorization of • same FCDGs FCDG • similar FCDGs – Our System • Conclusions and Future Works 2

Background • Source codes of many programs are acquirable – know-how for coding know-how source code archive retrieve 3

The know-how for coding • Library Function – is used in various programs • primitive function Common Vocabulary – is used by certain combination developer find out it string retrieval by grep or some tools know-how = combination of library function 4

Outline • Introduction – Background – The problems on retrieving • Retrieval of know-how – FCDG (Function Call Dependency Graph) – Categorization of • same FCDGs • similar FCDGs – Our System • Conclusions and Future Works 5

The problems on string retrieval • Too many retrieval results • Indistinctive the difference among retrieval results library function call in source tree of Free. BSD fopen : 332 socket : 212 getopt : 175 fh = fopen(“/var/log”, “a”); ftrace = fopen(file, “a”); if ((fp = fopen(name, “r”)) == NULL) { if ((fp = fopen(dumpfile, “r”)) == NULL) { fp = fopen(“acp”, “w”); It’s necessary to categorize the results 6

The retrieval results by string fp = fopen(. . . ) fp = fopen(. . . ) 7

if ((sockfd = socket (res->ai_family, res->ai_socktype, res->ai_protocol)) < 0) err_sys ("Can't open socket"); if (udp) { … } } Dependencies between library function calls if (setsockopt (sockfd, SOL_IP, IP_TOS, (void *) &tos, (socklen_t) sizeof (tos)) != 0) { err_sys ("Failed setting IP type of service octet"); } if (!ttcp && !icp) { if ((errno == EINTR) && (timeout_flag)) { printf ("Timeout while connectingn"); close (sockfd); continue; } if ((nr < 0 || nr != n) && timeout_flag) { • complex control structure • long code description make unclear dependency close(sockfd); } } close(sockfd); How combine with the library functions? 8

Outline • Introduction – Background – The problems on retrieving • Retrieval of know-how – FCDG (Function Call Dependency Graph) – Categorization of • same FCDGs • similar FCDGs – Our System • Conclusions and Future Works 9
![Function Call Dependency • Nodes Graph [’ 98 Miura] – Definition Node • stores Function Call Dependency • Nodes Graph [’ 98 Miura] – Definition Node • stores](http://slidetodoc.com/presentation_image_h2/d50cbd5dd27dfe87d6911134ccf8b080/image-10.jpg)
Function Call Dependency • Nodes Graph [’ 98 Miura] – Definition Node • stores the return value of a library function call in a variable – Reference Node • referes the return value of a library function call as the argument of a library function call – Controlled Node • depends on the truth value of the condition – Control Node • refers to the return value of a library function call in the condition • Edges (Data and Control Dependencies) 10

The Dependencies in FCDG • Data dependency – the return value of function call f is referred in other function call g 1. g ( … , f ( ), … ) ; 2. a = f ( ) ; … ; g ( … , a , … ) ; • Control dependency – whether function call f is executed or not is determined by the condition c if ( c ) { f( ); } while ( c ) { f ( ); } 11

Example of FCDG fd = fopen (fname, "r"); if (fd != NULL) { ptr = fgets(line, sizeof(line), fd); if (ptr != NULL) { p = strstr(line, NAME); if (p != NULL) { p++; strcpy(name, p); } } fclose(fd); } fopen t t !=NULL fgets fclose !=NULL t strstr t !=NULL strcpy 12

Example of FCDG fd = fopen (fname, "r"); if (fd != NULL) { ptr = fgets(line, sizeof(line), fd); if (ptr != NULL) { p = strstr(line, NAME); if (p != NULL) { p++; strcpy(name, p); } } fclose(fd); Library } fopen t t !=NULL fgets fclose !=NULL t strstr t !=NULL strcpy Function Call Pattern 13

Outline • Introduction – Background – The problems on retrieving • Retrieval of know-how – FCDG (Function Call Dependency Graph) – Categorization of • same FCDGs • similar FCDGs – Our System • Conclusions and Future Works 14

Categorization of same FCDGs fp = fopen(. . . ) fp = fopen(. . . ) 15

Categorization of same FCDGs Same FCDG fp = fopen(. . . ) fp = fopen(. . . ) 16

Categorization of same FCDGs Same FCDG fp = fopen(. . . ) fp = fopen(. . . ) 17

Categorization of same FCDGs Same FCDG fp = fopen(. . . ) 18

Categorization of same FCDGs Same FCDG fp = fopen(. . . ) 19

The problems on categrization of same FCDGs Number of function call fopen : 332 socket : 212 getopt : 175 Number of FCDG type fopen : 78 socket : 43 getopt : 22 • Same FCDG – in many programs include only a few nodes a few know-how – in a few programs include many know-how many nodes many similar FCDGs 20

Outline • Introduction – Background – The problems on retrieving • Retrieval of know-how – FCDG (Function Call Dependency Graph) – Categorization of • same FCDGs • similar FCDGs – Our System • Conclusions and Future Works 21

Similarity between FCDGs • wij : weight value of edge(ni, nj) • sim(Fx, Fy) : similarity between FCDG Fx and Fy same edge occur in many FCDGs natural dependency otherwise characteristic element in FCDG 22

Categorization of similar FCDGs Same FCDG 23

Categorization of similar FCDGs Same FCDG Similar FCDG 24

Categorization of similar FCDGs Similar FCDG 25

Extraction of know-hows • From Free. BSD-4. 5 RELEASE source tree (/usr/src/usr. sbin) – target program : C language – number : 162 , line : 311, 653 • Library functions – Declaration in /usr/include fopen : 332 socket : 212 getopt : 175 categorization of same FCDGs categorization of similar FCDGs fopen : 78 socket : 43 getopt : 22 fopen : 10 socket : 6 getopt : 264

Outline • Introduction – Background – The problems on retrieving • Retrieval of know-how – FCDG (Function Call Dependency Graph) – Categorization of • same FCDGs • similar FCDGs – Our System • Conclusions and Future Works 27

Retrieval System Configuration Diagram source code retrieval system categorize extract FCDG DB 28

Retrieval System of Library Function 29

Result of Categorization $1 = socket(); if ($1 < 0) { } } $2 = bind($1); setsockopt($1); if ($2 < 0) { } } $3 = listen($1); $3 = bind($1); if ($3 < 0) { } } $4 = listen($1); frequent dependency if ($4 < 0) { socket – setsockopt } socket – bind close($1); socket – close return value check $1 = socket(); if ($1 < 0) { } $2 = setsockopt($1); if ($2 < 0) { } $3 = bind($1); if ($3 < 0) { } $4 = ioctl($1); if ($4 < 0) { } close($1); 30

Result of Categorization $1 = socket(); if ($1 < 0) { } } $2 = bind($1); setsockopt($1); if ($2 < 0) { } } $3 = listen($1); $3 = bind($1); if ($3 < 0) { } } $4 = listen($1); frequent dependency if ($4 < 0) { socket – setsockopt } socket – bind close($1); socket – close return value check $1 = socket(); if ($1 < 0) { } $2 = setsockopt($1); if ($2 < 0) { } $3 = bind($1); if ($3 < 0) { } $4 = ioctl($1); if ($4 < 0) { } close($1); 31

Result of Categorization $1 = socket(); if ($1 < 0) { } } $2 = bind($1); setsockopt($1); if ($2 < 0) { } } $3 = listen($1); $3 = bind($1); if ($3 < 0) { } } $4 = listen($1); frequent dependency if ($4 < 0) { socket – setsockopt } socket – bind close($1); socket – close return value check $1 = socket(); if ($1 < 0) { } $2 = setsockopt($1); if ($2 < 0) { } $3 = bind($1); if ($3 < 0) { } $4 = ioctl($1); if ($4 < 0) { } close($1); 32

Conclusions • Extract the know-how for the usage of library function – Dependency between library function calls – Extraction of FCDG • Retrieve the usage of library function – Categorization of FCDG It enable to find out the objective usage easily 33

Future Works • Inter-function dependency analysis – To extract much more patterns • Agent system for coding – To navigate coding Such as MS-Office Asistant 34
- Slides: 34