Software Engineering Laboratory Department of Computer Science Graduate
既存ソフトウェア資産活用時代のため の ソフトウェア検索システム 大阪大学 井上克郎 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
ソフトウェア空間 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Source. Forge • 巨大なオープンソース開発支援サイト • ソフトウェア検索、版管理、連絡支援. . . プロジェクト数 ≧ 24万件 ユーザー数 ≧ 260万人 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3
検索の手法 • キーワード検索 – – 関数、クラス名 パラメーター 変数 コメント • プログラム断片検索 – 不完全な構造 – 完全なプログラム . . . 31 @author Ceki Gü lcü */ 32 public class Sort. Algo { 33 34 final static String class. Name = Sort. Algo. class. get. Name(); 35 final static Logger LOG = Logger. get. Logger(class. Name); 36 final static Logger OUTER = Logger. get. Logger(class. Name + ". OUT 37 final static Logger INNER = Logger. get. Logger(class. Name + ". INNE 38 final static Logger DUMP = Logger. get. Logger(class. Name + ". DUMP 39 final static Logger SWAP = Logger. get. Logger(class. Name + ". SWAP 40 41 int[] int. Array; 42 43 Sort. Algo(int[] int. Array) { 44 this. int. Array = int. Array; 45 } 46. . . いずれにせよ計算機パワーで多量なデータの分析・整 理が必要 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7
関連研究(1) ソフトウェアサーチエンジン • Google, Google Code Search (Google) • Koders (Black Duck) – 3 GB OSS, C/C++/C#/. . . 30言語 • Krugle (Krugle Enterprise) – OSSプロジェクトサポート、サーチエンジン • Source. Forge (Geeknet Inc. ) • SPARS/J – 阪大、他のコードサーチエンジンより先行 • Code. Broker, Sourcerer, Merobase, Exemplar, Strathcona, Assieme, XSnippet, . . . Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 8
関連研究(2) ソフトウェア部品推薦 • Historical Approach ユーザの履歴、利用実績を収集し、 – そのまま提供 – ユーザのパターンを協調フィルタリング等で選別 して提供 • Social Approach 開発者やユーザのネットワークを作ってエキ スパートに聞く Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 9
計算ソフトウェア 学 • Computation Intensive Software Engineering CISE • 高い品質のソフトウェアを効率よく作るために – 高い計算能力の環境を使う – 大規模なデータを扱う • オープンソースソフトウェア • 開発データ … • 例 – Search-based software engineering – Mining software repositories – Empirical approaches to software engineering Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 11
ソフトウェア 学の規模 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
SPARS-J Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
コンポーネントグラフ System Y System X A B F C D G E H I component use relation Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 16
頂点の重み System Y System X 0. 1 A B C 0. 1 D 0. 1 0. 2 E 0. 1 0. 05 全頂点の重み合計 = 1 重みが重要度を示す指標 H F 0. 1 G 0. 2 I 0. 05 . . . (1) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 17
辺の重み 0. 05 0. 2 d=1/4 0. 2 A d=1/4 0. 05 B 0. 05 d=1/4 0. 05 0. 4 0. 15 d: 分配率 w(A) = 全出力辺の重みの合計 全入力辺の重みの合計 = w(B) . . . (2). . . (3) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 18
重みの定義 • (1)~(3)の制約に基づく連立方程式が得られる = W: node weight vector . Dt: transposed matrix of distribution ratio Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 19
重みの伝播 0. 34 0. 17 A 0. 33 B 0. 17 0. 33 C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20
重みの伝播 0. 33 0. 175 A 0. 17 B 0. 175 0. 17 0. 5 C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 21
重みの伝播 0. 5 0. 25 A 0. 175 B 0. 25 0. 345 0. 175 0. 345 C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 22
重みの伝播 0. 4 0. 2 A B 0. 2 0. 4 C 安定した重み (固有ベクトル) Component Rank : 重みによる頂点の順序 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 23
マルコフモデル 0. 02 0. 01 0. 05 0. 03 0. 001 0. 1 • ユーザ視点のマルコフ連鎖 • 一定時間でどこに移動するか • 頂点の重みはユーザ視点がどれだけその頂点に留まって いるかを示す Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 24
疑似利用関係 A B C • 計算収束のために付加する • 各頂点から(つながっていない)全頂点へ Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 25
部品の集約 C G B F A D 部品グラフ C G BF E AD E 集約部品グラフ Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 26
コンポーネントランクの評価システム • inheritance • method call • attribute access • abstract class impl input similarity measure by SMMT . java file = component similarity criterion t: sharing 80% statements output componentrank pairs use relation extraction clustered graph clustering construction weight ratio p between real and pseudo edges : 0. 85 de-clustering to original graph node weight computation equal distribution ratio d to outgoing edges Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 27
実験 1 JDK 1. 3. 0 575, 000 lines, 1877 components 7 minutes on PC (Pentium IV, 2 GHz, 2 GB) rank class name 1 java. lang. Object 2 java. lang. Class 3 java. lang. Throwable 4 java. lang. Exception 5 java. io. IOException 6 java. lang. String. Buffer 7 java. lang. Security. Manager 8 java. io. Input. Stream 9 java. lang. reflect. Field 10 java. lang. reflect. Constructor. . . 1256 sunw. util. Event. Listener. . . 1256 weight 0. 16126 0. 08712 0. 05510 0. 03103 0. 01343 0. 01214 0. 01169 0. 01027 0. 00948 0. 00936. . . 0. 00011. . . Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 28
実験 2: ある企業のライブラリ • Javaアプリケーション開発用フレームワークと そのアプリケーション • 5 applications + framework – 1538 components, 339 clustered nodes • フレームワーククラスとそのデータ構造定義 のクラスが高いランク Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 29
議論 1: 重み計算モデル Reference Count Model Component Rank Model 0. 2 B 0. 31 B 0. 6 A 0. 33 A E D C 0 0 0. 2 0. 03 0. 30 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 30
議論 2: 集約方法(1) • 単純な重複部品は消去される A A X B B Y original copy others Clustering 0. 25 A X B Y 0. 25 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 31
議論 2: 集約方法(2) • 他の環境で再利用されるものは重みが加 わる A A X B C Y original modified others Clustering 0. 3 0. 2 A X B C Y 0. 15 0. 2 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 32
SPARS-J • Software Product Archive, Analysis, and Retrieve System for Java • 新しい発想に基づいたソースコード蓄積・検 索システム • Javaを対象に • 日々の管理は全自動 • 静的解析技術を用いて広域の依存関係抽出 • メトリクス技術を用いて類似部品抽出 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 35
SPARS-Jのランク手法 • コンポーネントランクCR • 検索語の重要度(TF-IDF) CR + TF-IDF Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 36
SPARS-Jの概要 検索 登録 ユーザー Java ファイル群 表示 依存関係 キーワード 解析・登録 レポジトリ キーワード検索 パッケージ階層 表示 Webブラウザ SPARS-J Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 37
SPARS-J検索入り口 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 40
検索結果表示 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 41
パッケージブラウザ サブパッケージ一覧 クラス一覧 メソッド定義行へ移動 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 42
部品詳細表示(類似部品群) SPARS技術解説 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 43
部品詳細表示(利用する部品) SPARS技術解説 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 44
部品詳細表示(利用される部品) SPARS技術解説 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 45
部品詳細表示(メトリクス) SPARS技術解説 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 46
類似コード Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
コードの類似性 難しい課題 • 構文的に似ている • 意味的に似ている • 全体が似ている • 部分的に似ている • 類似の閾値は … Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 50
簡単な例 AFG: : AFG(Ja. Object* obj) { objname = “afg"; object = obj; } AFG: : ~AFG() { for(unsigned int i = 0; i < children. size(); i++) if(children[i] != NULL) delete children[i]; . . . for(unsigned int i = 0; i < nodes. size(); i++) if(nodes[i] != NULL) delete nodes[i]; 52 } Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 52
コードクローンの定義 • • 簡単な統一的な定義はない(いろいろな研究者が定義) しかし、おおまかな統一認識 • • Type 1 clone: コメント、空白等を除き文字列的に同一 Type 2 clone: 固有名を正規化して同一(parameterized) Type 3 clone: 意味的に同一または途中に行を挿入したり削除したも の いろいろな検出法 1. 2. 3. 4. 5. 行ごとの照合 (type 1) AST (Abstract Syntax Tree)の比較 (type 2, 3) PDG (Program Dependency Graph)の比較 (type 3) メトリクス値の比較 (type 3) トークン列比較 (type 2) 53 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 53
適用例 1: Free. BSD, Net. BSD, Linux 55 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 55
BSD Unix OSの歴史 56 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 56
クローン率を距離としたクラスター分析による系統木 57 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 57
Free. BSDのアプリケーション集(Ports Collection) 10. 8 GB/403 M LOC in C 58 Software Engineering Livieri, S. , Higo, Y. , Matsushita, M. , Inoue, K. , “Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder“, International Conference Software Minneapolis, MN. Osaka (May 2007, to appear) Laboratory, Department of Computer Science, on Graduate School Engineering, of Information Science and Technology, University
136 Linux カーネル 7. 4 GB 260 M LOC in C 59 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 59
コーディングパータン抽出の例 public void reverse. Action(Figure figure) { set. Undo. Activity(create. Undo. Activity()); List l = Collections. Factory. current(). create. List(); l. add(figure); l. add(((Decorator. Figure)figure). peel. Decoration()); get. Undo. Activity(). set. Affected. Figures( new Figure. Enumerator(l)); ((Border. Tool. Undo. Activity)get. Undo. Activity()). replace. Affected. Figures(); } public void execute() { super. execute(); set. Undo. Activity(create. Undo. Activity()); get. Undo. Activity(). set. Affected. Figures(view(). selection()); Figure. Enumeration fe = get. Undo. Activity(). get. Affected. Figures(); … } 60 create. Undo. Activity() set. Undo. Activity() get. Undo. Activity() set. Affected. Figures() コーディングパターン シーケンシャル パターンマイニング Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 60
類似コード片の検索(単語の共起関係の利用) モジュール中で共起する識別子対を推移的に求めて関連 語とする • p 単語の共起回数の分布に基づいて関連語を求める . . . モジュールA host = host_alloc(. . . ); log(. . . ); if (!add_host(host)) host { // scan_host(host) // is missing! }. . . 61 モジュールB. . . node = node_alloc(. . . ); if (. . . ) { return; } if(!add_node(node)) node { // scan_node(node) // is missing! }. . . Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 61
まとめ Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
話した内容 • ソフトウェア空間 – ソフトウェア部品検索 • SPRAS-J – コンポーネントランク – キーワード検索 • 類似コード – CCFinder – 類似コード片検索 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 63
リソース • 論文 – Katsuro Inoue, Reishi Yokomori, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Ranking Significance of Software Components Based on Use Relations", IEEE Transactions on Software Engineering, Vol. 31, No. 3, pp. 213 -225, 2005. – 横森 励士, 梅森 文彰, 西 秀雄, 山本 哲男, 松下 誠, 楠本 真二, 井上 克郎: "Javaソフトウェア部品検索システムSPARS -J", 電子情報通信学会論文誌D-I, Vol. J 87 -D-I, No. 12, pp 1060 -1068, 2004. – 井上克郎, 神谷年洋, 楠本真二: "コードクローン検出法", コンピュータソフトウェア, Vol. 18, No. 5, pp. 47 -54, September 2001. (http: //sel. ist. osaka-u. ac. jp/~lab-db/betuzuri/archive/349. pdf). – T. Kamiya, S. Kusumoto, and K. Inoue, CCFinder: A multi-linguistic token-based code clone detection system for large scale source code, IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654 -670, Jul. 2002. • WEB – SPARS http: //www. spars. info/ – CCFinder. X http: //www. ccfinder. net/ccfinderxos-j. html – CCFinder http: //sel. ics. es. osaka-u. ac. jp/cdtools/index. html Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 65
- Slides: 65