I Know That I Dont Know Anything 1

  • Slides: 114
Download presentation
知識發掘之發展與應用 蔣以仁 I Know That I Don‘t Know Anything. 1

知識發掘之發展與應用 蔣以仁 I Know That I Don‘t Know Anything. 1

為何知識管理如此迫切? “The chief economic priority for developed countries is to raise the productivity of

為何知識管理如此迫切? “The chief economic priority for developed countries is to raise the productivity of knowledge. . . The country that does this first will dominate the twenty-first century economically. ” 開發中國家首要經濟目標為 知識的創造力…誰先掌握誰 就統領二十一世紀的經濟 Peter F. Drucker

資料 知識形成流程 Integration Interpretation/ Evaluation Data Mining Transformation Raw Data Preprocessing Selection/ cleansing Pattern

資料 知識形成流程 Integration Interpretation/ Evaluation Data Mining Transformation Raw Data Preprocessing Selection/ cleansing Pattern Target Data Warehouse Understanding Preprocessed Data Transformed Data Knowledge 14

BI 結構 Information Sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients

BI 結構 Information Sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) e. g. , MOLAP Semistructured Sources Data Warehouse extract transform load refresh etc. OLAP serve Query/Reporting serve e. g. , ROLAP Operational DB’s serve Data Marts Data Mining 15

Gaining market intelligence from news feeds 16 Sreekumar Sukumaran and Ashish Sureka

Gaining market intelligence from news feeds 16 Sreekumar Sukumaran and Ashish Sureka

Integrated BI Systems Intermedia Data ETL Complete Data Warehouse RDBMS Text taggor & Annotator

Integrated BI Systems Intermedia Data ETL Complete Data Warehouse RDBMS Text taggor & Annotator ETL Structural Data DBMS File System XML Unstructured Data EA Legacy CMS Scanned Documents Email 17 Sreekumar Sukumaran and Ashish Sureka

知識來源與價值 網路訊息 新聞報導 專利 電子郵件 文件… “On average, professional users spend 11 hours per

知識來源與價值 網路訊息 新聞報導 專利 電子郵件 文件… “On average, professional users spend 11 hours per week looking for information. Seventy-one percent said they could not find what they were looking for. " — "Information Management Software" Lazard Freres & Co. LLC February 2001 "The volume of digitized information will double every year from 2000 to 2005 (an increase to 30 times today's volume). " — "Knowledge Management vs. Information Management" Gartner Group September 2000 18

Find the Evidence ¡ Problems using MEDLINE: No articles retrieved ¡ “Answers” definitively answered

Find the Evidence ¡ Problems using MEDLINE: No articles retrieved ¡ “Answers” definitively answered years ago Manifestations of Renal TB Viral/Bacterial bronchitis: Duration of Symptoms Legionella: prevalence of relative bradycardia Acute allergic episodes: ? thrombocytopenia ¡ MEDLINE indexed using a system obtuse to most clinicians Too many articles retrieved

Evolution “To study history one must know in advance that one is attempting something

Evolution “To study history one must know in advance that one is attempting something fundamentally impossible, yet necessary and highly important. ” Father Jacobus (Hesse's Magister Ludi) Das Glasperlenspiel (The Glass Bead Game) 21

網路搜尋引擎 ¡ 以離線方式抓去網頁,透過建立一種內部資料儲 存方式,稱之為 (反轉;inverted) 索引,儲存資 料 ¡ 線上檢索 Monika Henzinger, Search Technologies for

網路搜尋引擎 ¡ 以離線方式抓去網頁,透過建立一種內部資料儲 存方式,稱之為 (反轉;inverted) 索引,儲存資 料 ¡ 線上檢索 Monika Henzinger, Search Technologies for the Internet Science, Vol. 317. no. 5837, 468 – 471, 27 July 2007

Search Engine Problems ¡Index Comprehensiveness ¡Relevance

Search Engine Problems ¡Index Comprehensiveness ¡Relevance

Deterministic Search ¡ Search Query Jaguar(Animal) Jaguar(Automobile) ¡ Problem: Scalable J, Beall, The Weaknesses

Deterministic Search ¡ Search Query Jaguar(Animal) Jaguar(Automobile) ¡ Problem: Scalable J, Beall, The Weaknesses of Full-Text Searching. The Journal of Academic Librianship, 34(5): 438 -444, 2008.

Expand

Expand

分群檢索 1. 2. Walter Warnick, Problems of Searching in Web Databases. Science. Vol. 316.

分群檢索 1. 2. Walter Warnick, Problems of Searching in Web Databases. Science. Vol. 316. no. 5829, 1284, June 2007. I-Jen Chiang, Discover the Semantic Topology in High-Dimensional Data, Expert Systems with Applications, 33 (1), September, 2007.

Gartner 2005 Hype Cycle for Emerging Technologies http: //www. gartner. com/resources/130100/130115/gartners_hype_c. pdf

Gartner 2005 Hype Cycle for Emerging Technologies http: //www. gartner. com/resources/130100/130115/gartners_hype_c. pdf

Gartner 2006 Hype Cycle for Emerging Technologies Mashup can quickly meet tactical needs with

Gartner 2006 Hype Cycle for Emerging Technologies Mashup can quickly meet tactical needs with reduced development costs and improved user satisfaction. Applications Architecture Enables new ways to performing vertical applications that will result in significantly increased revenue or cost savings f o r a n e n t e r p r i s e Enables new ways of doing business across industries that will result in major shifts in industry dynamics Real World Web http: //www. gartner. com/it/page. jsp? id=495475

知識產生 t 1 t 2 … tn d 1 d 2 … dm w

知識產生 t 1 t 2 … tn d 1 d 2 … dm w 11 w 12… w 1 n w 21 w 22… w 2 n … … wm 1 wm 2… wmn Stemming & Stop words Raw text tt t t tt 分群 Doc similarity Term Weighting Tokenized text tt t t tt Term similarity Sentence selection 摘要 META-DATA/ ANNOTATION d d dd dd d d Vector centroid d 分類 32

Text ETL to Mining target: individual text Mining unit: >texts >category labeled items extracted

Text ETL to Mining target: individual text Mining unit: >texts >category labeled items extracted from text using NLP IBM TAKMI (Nasukawa, Nagano, 1999) Original Data Structured Data Category Meta Data Category Dictionary Call Taker: James Date: Aug. 30, 2002 Duration: 10 min. Synonym Customer. ID: ADC 00123 Dictionary Item [Call Taker] James [Date] 2002/08/30 [Duration] 10 min. [Customer. ID] ADC 00123 Visualization & Interactive Mining Q: cust sys has Mining [Noun] Customer stopped Linguistic [Software] BIOS working. Analysis [Subj. . . Verb] A: checked cust bios customer system. . stop and Unstructured Data [SW. . Problem] BIOS. . need it need updated. … üTagging üDependency Analysis üNamed Entity Extraction üIntention Analysis 33

Luhn's ideas (1958) It is here proposed that the frequency of word occurrence in

Luhn's ideas (1958) It is here proposed that the frequency of word occurrence in an article furnishes a useful measurement of word significance. It is further proposed that the relative position within a sentence of words having given values of significance furnish a useful measurement for determining the significance of sentences. The significance factor of a sentence will therefore be based on a combination of these two measurements. Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2, 159 -165. 36 van Rijsbergen 79

資訊萃取 foodscience. com-Job 2 Job. Title: Ice Cream Guru Employer: foodscience. com Job. Category:

資訊萃取 foodscience. com-Job 2 Job. Title: Ice Cream Guru Employer: foodscience. com Job. Category: Travel/Hospitality Job. Function: Food Services Job. Location: Upper Midwest Contact Phone: 800 -488 -2611 Date. Extracted: January 8, 2001 Source: www. foodscience. com/jobs_midwest. html Other. Company. Jobs: foodscience. com-Job 1 37

Internet Collaborative Environment Library catalogs Search engine Locally held data Public repositories Commercial data

Internet Collaborative Environment Library catalogs Search engine Locally held data Public repositories Commercial data sources Agency data sources Spiders Search engine Dynamic content Search engine Metasearch Tool Custom content Automated categorization Taxonomy-driven web portal/Security control Personalized access Virtual Reference Email alerts Online collaboration Data/Text Mining Visualization

Text Analysis Spectrum Targeted Facts and Events Classification Concept Identification Entity Extraction Clustering What

Text Analysis Spectrum Targeted Facts and Events Classification Concept Identification Entity Extraction Clustering What is this document about? Who did what to whom when where, etc. 39

Why is getting dimensional data so hard? Hank bought plastic explosives from Henry in

Why is getting dimensional data so hard? Hank bought plastic explosives from Henry in Tucson yesterday. Named Entity Extraction Hank People, Weapons, Vehicles, Dates Henry NER Engine Frame. Net Plastic explosives 11/01/07 Tucson 40

Name Extraction via MMs The delegation, which training sentences included the commander of the

Name Extraction via MMs The delegation, which training sentences included the commander of the U. N. troops in Bosnia, Lt. Gen. Sir Michael Rose, went to the Serb stronghold of Pale, near Sarajevo, Speech for talks with Bosnian Recognition Serb leader Radovan Text Karadzic. Training Program answers NE Models Entities Extractor An easy but successful HMM application: • Prior to 1997 - no learning approach competitive with hand-built rule systems • Since 1997 - Statistical approaches (BBN (Bikel et al. 1997), NYU, MITRE, CMU/Just. Systems) achieve state-of-the-art performance The delegation, which included the commander of the U. N. troops in Bosnia, Lt. Gen. Sir Michael Rose, went to the Serb stronghold of Pale, near Sarajevo, for talks with Bosnian Serb leader Radovan Karadzic. Locations Persons Organizations 41

NER 42

NER 42

Annotation and Tagging Date Acquiring Organization Acquisition Event Acquired Organization On November 16, 2005,

Annotation and Tagging Date Acquiring Organization Acquisition Event Acquired Organization On November 16, 2005, IBM announced it had acquired Collation, a privately held company based in Redwood City, California for undisclosed amount. Place Amount Output to RDBMS Text Annotator Date Organization Place Amount Nov. 16 IBM Redwood City, CA Undisclosed XML output On <Date>November 16, 2005</Date>, <ACQUIRING ORG>IBM</ACQUIRING ORG> announced it had <ACQUISITION EVENT>acquired</ACQUISITION EVENT> <ACQUIRED ORG>Collation</ACQUIRED ORG>, a privately held company based in <PLACE>Redwood City, California</PLACE> for 43 <AMOUNT>undisclosed</AMOUNT> amount.

醫學文獻告訴我什麼 ¡ 醫學文獻來源:Medline ¡ 可發現疾病、症狀與藥物或化合物 的因果關聯 1. 2. 3. Swanson DR. Searching natural language

醫學文獻告訴我什麼 ¡ 醫學文獻來源:Medline ¡ 可發現疾病、症狀與藥物或化合物 的因果關聯 1. 2. 3. Swanson DR. Searching natural language text by computer. Machine indexing and text searching offer an approach to the basic problems of library automation. Science. 132: 1099– 1104, 21 Oct. 1960. Swanson DR. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med. 30(1): 7– 18, 1986. Swanson, D. R. , Complementary structures in disjoint science literatures. In A. Bookstein, et al (Eds. ), SIGIR 91: Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval Chicago, Oct 13 -16, 280 -289, 1991.

偏頭痛? ¡ Stress is associated with migraines ¡ Stress can lead to loss of

偏頭痛? ¡ Stress is associated with migraines ¡ Stress can lead to loss of magnesium ¡ Calcium channel blockers prevent some migraines ¡ Magnesium is a natural calcium channel blocker ¡ Spreading cortical depression (SCD) is implicated in some migraines ¡ High levels of magnesium inhibit SCD ¡ Migraine patients have high platelet aggregability ¡ Magnesium can suppress platelet aggregability Smalheiser, N. R. & Swanson, D. R. . Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease. Neuroscience Research Communications, 15, 1 -9, 1994.

文獻實証 All Migraine Research migraine CCB PA SCD stress All Nutrition Research magnesium

文獻實証 All Migraine Research migraine CCB PA SCD stress All Nutrition Research magnesium

找出新線索 雷諾氏現象 Hypothesis generation Raynauds Fish oils vasoconstrictions 血管收縮 platelet aggregation 血小板活化凝集 blood viscosity

找出新線索 雷諾氏現象 Hypothesis generation Raynauds Fish oils vasoconstrictions 血管收縮 platelet aggregation 血小板活化凝集 blood viscosity 粘滯血症 Intermediate concepts Swanson, D. R. (1994). Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med. Autumn; 30(1): 7 -18, 1986.

Literature processing MEDLINE citations Meta. Map NER Annotated citations UMLS EBP Domain Model Document

Literature processing MEDLINE citations Meta. Map NER Annotated citations UMLS EBP Domain Model Document Retrieval Query terms E-Utilities Essie Knowledge Extraction PICO Query Formulation Question frame Answer 48 Semantic processing Semantic matching Answer Generation Document frame Clinical Task Classification Strength of Evidence Classification Dina Demner-Fushman

Semantic processing example Semantic processor Problem Extractor Population Extractor Intervention Extractor Outcome Extractor Task

Semantic processing example Semantic processor Problem Extractor Population Extractor Intervention Extractor Outcome Extractor Task Classifier Strength of Evidence Classifier 49 Amiodarone versus diltiazem for rate control in critically ill patients with atrial tachyarrhythmias. … Patients with atrial fibrillation (n = 57), … were randomly assigned to one of three intravenous treatment regimens. Group 1 received diltiazem … group 2 received amiodarone …. Sufficient rate control can be achieved in critically ill patients with atrial tachyarrhythmias using either diltiazem or amiodarone … Task: Therapy Strength of Evidence: A (RCT) Dina Demner-Fushman

Outcome extractor Problem Extractor Population Extractor Intervention Extractor Base classifiers Cue-terms Heuristic N-gram Multiple

Outcome extractor Problem Extractor Population Extractor Intervention Extractor Base classifiers Cue-terms Heuristic N-gram Multiple Linear Regression Metaclassifier Naïve Bayes Position Length Score: 0. 99 Sufficient rate control can be achieved in critically ill patients with atrial tachyarrhythmias using either diltiazem or amiodarone. Score: 0. 75 Although diltiazem allowed for significantly better 24 -hr heart rate control, this effect was offset by a significantly higher incidence of hypotension requiring discontinuation of the drug. Training: 275 manually annotated abstracts 50 Dina Demner-Fushman

概念分群 {sun} {sun, beach} Frequent term set: {beach} document C 2 Clustering: C 3

概念分群 {sun} {sun, beach} Frequent term set: {beach} document C 2 Clustering: C 3 cluster C 4 C 1 {C 1, C 2, C 4, C 5}. Clustering Description: C 5 {surf, sun, beach, fun}. Document Collection {surf} {fun} 51

Anopheles 52

Anopheles 52

文件資料分群 1. 2. Walter Warnick, Problems of Searching in Web Databases. Science. Vol. 316.

文件資料分群 1. 2. Walter Warnick, Problems of Searching in Web Databases. Science. Vol. 316. no. 5829, 1284, June 2007. I-Jen Chiang, Discover the Semantic Topology in High-Dimensional Data, Expert Systems with Applications, 33 (1), September, 2007. 54

55

55

Extracting Information From Text Ontology Text Minimal recursion semantics representatio ns Database [Deep Thought

Extracting Information From Text Ontology Text Minimal recursion semantics representatio ns Database [Deep Thought EU project] ¡ Structuring knowledge from text tagging, compounds, grammatical analysis, ontological interpretation, regular expressions, patter recognition 56

Patterns Construction Taipei Tokyo Repository Tagging & annotation CDW Patterns New York Knowledge Repository

Patterns Construction Taipei Tokyo Repository Tagging & annotation CDW Patterns New York Knowledge Repository Or structured data 57

Knowledge Construction Manual labor Ontology Domain doc. coll. Statistical & linguistic analyses [Brasethvik &

Knowledge Construction Manual labor Ontology Domain doc. coll. Statistical & linguistic analyses [Brasethvik & Gulla, DKE, 38/1, 2001] ¡ Want to extract prominent concepts/relations from text tagging, compounds, NP recognition, term frequencies, stopwords, language identification 58

Patterns Explorer Web Browser Installed from http: //. . . Hard disk Windows XP

Patterns Explorer Web Browser Installed from http: //. . . Hard disk Windows XP crashes is a Desktop computer Hard disk size 40 GB Operating System Products Laptop computers Linux Macintosh 59

演進 Local data FTP Gopher HTML More structure Indexing Search Relevance Ranking Latent Semantic

演進 Local data FTP Gopher HTML More structure Indexing Search Relevance Ranking Latent Semantic Topology Crawling Web. SQL Social Network of Hyperlinks Web. L XML Clustering Collaborative Filtering Scatter. Gather Topic Directories Semi-supervised Automatic Learning Classification Web Communities Web Servers Topic Distillation Focused Crawling Monitor Mine Modify User Profiling Web Browsers

人、事、時、地、物元資料 refer to / refine refer to / identifie 應用 性質 Conceptual Objects 人物

人、事、時、地、物元資料 refer to / refine refer to / identifie 應用 性質 Conceptual Objects 人物 Physical Entities participate in affect or / refer to location Temporal Entities 時間 within at 地點 61

資源索引 Ontology expansion CIDOC CRM or DC 人物 Background knowledge / Authorities 事件 物件

資源索引 Ontology expansion CIDOC CRM or DC 人物 Background knowledge / Authorities 事件 物件 Thesauri extent CRM entities Derived knowledge data (e. g. RDF) Sources and metadata 62 (XML/RDF)

Explicit Events, Object Identity, Symmetry E 52 Time-Span E 39 Actor E 53 Place

Explicit Events, Object Identity, Symmetry E 52 Time-Span E 39 Actor E 53 Place 7012124 February 1945 P 11 par P 82 at some time within ticip ated in P 7 took place at E 7 Activity “Crimea Conference” E 39 Actor P 86 falls within E 38 Image P 6 7 is r efe rre E 65 Creation Event E 39 Actor P 14 p med r o f er P 81 ongoing throughout * P 9 cre 4 has ate d dt ob y E 31 Document “Yalta Agreement” E 52 Time-Span 11 -2 -1945 63

Rules Extraction ¡ The formal concept C 4 makes it possible the following rules

Rules Extraction ¡ The formal concept C 4 makes it possible the following rules ¡R 1 : t 3 t 1 t 6 ¡R 2 : t 5 t 1 t 6 ¡R 3 : t 3 t 5 ¡ The interpretation of the R 1 and R 2: The use of terms t 3 or t 5 is always associated with that of terms t 1 and t 6 ¡ The rule R 3 express mutual equivalence of the terms {t 3, t 5}: All the documents which have the term t 3 also have the t 5 term. 64

病歷紀錄整合 ROYAL MARSDEN NHS TRUST - PATIENT CASE NOTE ######: MRS ####### 27 Aug

病歷紀錄整合 ROYAL MARSDEN NHS TRUST - PATIENT CASE NOTE ######: MRS ####### 27 Aug 1998 Seen in the Follow Up Staging Clinic This 65 year old lady has been reviewed in the Breast staging clinic. As you know, she was originally diagnosed with a carcinoma of the left ROYAL MARSDEN NHS TRUST - PATIENT CASE NOTE breast in 1974 and treated with a total mastectomy. This was followed ######: MRS ####### with MEFUP chemotherapy. In 1982 she noticed a lump in the infraclavicular region which was excised and this was followed by ROYAL MARSDEN NHS TRUST - DIAGNOSTIC RADIOLOGY - CT REPORT radiotherapy. In 1994 she developed a tumour in the chest cavity that 15 Dec 1993 General Surgical was diagnosed with a CT guided biopsy and this was treated with VAC ######: #######, MRS ##### I reviewed this patient in clinic today. She has been followed chemotherapy and radiotherapy to the mediastinum. Since 1994 she had ROYAL MARSDEN NHS TRUST - PATIENT CASE NOTE Exam 18 Dec Examination LIVER/THORAX/ABDOMEN/PELVIS noticed a slight deterioration and earlier this year she had problems up for a left breast carcinoma for which she was treated with a ######: MRS ####### Exam Number [NUM] with occasional episodes of vomiting, nausea and general lethargy. She mastectomy. She had a prosthesis removed last year and has had was found to have lymphadenopathy in the right supraclavicular fossa Date of Birth 17 May 1933 some improvement in the symptoms of chest wall discomfort since and was treated with Arimidex. Since being on Arimidex there was Ref [HCA 1] OUTPATIENT originally stablisation of her disease but recently it appears that the 24 Jan 1997 Seen in the Chemotherapy Clinic (TPFRIDAY) then although she still gets quite sharp pains intermittently. Clinical node has started to enlarge. She has been reviewed in the pain clinic local to where she I saw ##### today in clinic. I am very pleased to say that she has had BR Verified by [HCA 2] On examination today, she has a 1. 5 x 1 cm lymph node in the right lives but has not had much relief of her symptoms. She feels supraclavicular fossa and an essence of thickening probably due to a complete response in her superior mediastinum and right DIAGNOSIS: Carcinoma of breast. previous therapy in the left supraclavicular fossa. She also has though that she can bear with these and does not want any CT scans have been obtained through chest, abdomen and pelvis with oral radiation changes in the lung which produced some physical sign at both supraclavicular fossa lymphadenopathy. There is some minimal thickening further intervention at present. bases and there was no evidence of abdominal organomegaly. contrast only. remaining in the soft tissues around the superior mediastinum and in On examination today there is no sign of recurrence of her Her recent staging investigations show that she has C 5 carcinoma cells There is thickening in the left clavicular fossa and small present in the lymph node fine needle aspirate. A right mammogram is disease. Chest and abdominal examination were unremarkable. We fact it is felt that this might now be related to previous volume residual abnormalities in the mediastinum. Comparison is made unremarkable. An ultrasound of the liver was normal and a chest x-ray will see her again in a year's time. showed some soft tissue thickening present in the left axilla due to radiotherapy. To be honest, however, symptomatically there has been with the most recent scan (21. 7. 95) and there is no discernible change 28/03/2003, 10: 35: 26 previous therapy. There is also some loss of volume in the left upper by CT criteria. little in the way of benefit with overall palliative response of no zone but no lung nodules seen. A bone scan shows evidence of Lung changes, which may have been related to radiotherapy, are now less degenerative changes but no specific evidence of bony metastases. Her change. She is tolerating the treatment fairly well. Interestingly she extensive. thyroid function tests show that the TSH is 0. 12 and her free T 3 are 4 which indicates that the TSH is slightly low. This does not amount to has had virtually complete alopecia with the treatment. She has been on There are no abnormally-enlarged nodes in the retroperitoneum primary hypothyroidism but it would be worth repeating the thyroid warfarin for about the same amount of time and I wonder whether this or pelvis. There are no focal hepatic masses. function tests in three months time. Overall, it appears that the patient has stable disease on Arimidex CONCLUSION: No CT evidence of disease progression. may be partly responsible. We have given her a fourth cycle of apart from in the right supraclavicular fossa. The Arimidex is not treatment today and we will see her in three weeks for consideration of her fifth. 28/03/2003, 10: 44: 20 28/03/2003, 12: 35: 06� holding the disease completely and we feel that the best approach to management would be to consider some radiotherapy to the right supraclavicular fossa. She has previously had radiation therapy to the left clavicular region and mediastinum. We have discussed performing a CT scan of the thorax but she was unable to lie flat for the duration of the investigation some months ago. We shall ask our radiotherapy colleagues to review her and consider her for therapy. We shall review her again in the follow up clinic in six weeks time. 28/03/2003, 10: 50: 25

What was done… What happened… And why Human: 1382 Pain: 5735 locus attends reason

What was done… What happened… And why Human: 1382 Pain: 5735 locus attends reason locus Breast: 1492 Clinic: 4096 reason plans Clinic: 1024 plans reason Biopsy: 1066 locus target plans finding Clinic: 2010 reason Radio: 1812 plans Chemo: 6502 treats reason time reason attends Mass: 1666 Ulcer: 1945 locus treats locus time Cancer: 1914 time time

Other Feature Status Name Laterality Status Name compare Name target INVESTIGATION has Age cus

Other Feature Status Name Laterality Status Name compare Name target INVESTIGATION has Age cus subpart INTERVENTION ca di in PATIENT Goal -lo has-locus has-l Sex part. Of LOCUS cat ndi ts/i after LOCATION Name Type causes PROBLEM part. Of located. At Draft Schema for Chronicle s cau es Form Name ind CONSULT about ng TIME By di fin rec DRUG ica ted end m om REGIME Dose trea mm Doctor reco about Occupation es end n involves tio Race Name Clinical Course PATHOLOGY Diagnostic Status Family History Evidence for Presence / absence Status Other Feature Route Size Name

即時性分群 Real-time Index Metadata of Searching Results 85

即時性分群 Real-time Index Metadata of Searching Results 85

Rule Generate B ﹁B A P(B|A) P(﹁ B|A) ﹁A P(B|﹁A) P(﹁B|﹁A) A ﹁A B

Rule Generate B ﹁B A P(B|A) P(﹁ B|A) ﹁A P(B|﹁A) P(﹁B|﹁A) A ﹁A B P(A|B) P(﹁ A|B) ﹁B P(A|﹁B) P(﹁A|﹁B) ¡ Let S be a document set A B : ¡ P(B|A) >> P(A|B) of S ¡ P(﹁ B|A) >> P(﹁ A|B) of S ¡ c 2(B|A) > c 2(A|B) of S B A : ¡ P(A|B) >> P(B|A) of S ¡ P(﹁ A|B) >> P(﹁ B|A) of S ¡ c 2(A|B) > c 2(B|A) of S 越獨特的文件集,規則越明確

Rule Structure Attribute 11 (Noun) … Attribute 21 Attribute 1 n (Noun) Object 1

Rule Structure Attribute 11 (Noun) … Attribute 21 Attribute 1 n (Noun) Object 1 (Noun) Relationship … Attribute 2 n (Noun) Object 2 (Noun) 1. Object: 具體名詞,Relationship: 抽象名詞,Attribute: 具體或抽象名詞 2. 具體名詞則為等價或屬性關係,抽象名詞則為作為方法 3. Relationship = null Object 1與 Object 2的屬性關係 (is A, part of) Object -- Attribute 4. Object 1 (Relationship) Object 2: Object 1 及 Object 2 具有Relationship的關係 5. Object 1 Attribute (Relationship) Object 2: Object 1的Attribute與 Object 2 具有Relationship的關係 (Attribute Object) 6. Attribute Object 1 (Relationship) Object 2: Object 1的Attribute與 Object 2 具有Relationship的關係 (Attribute Object) 7. Object 1 (Relationship) Object 2 Attribute : Object 1與 Object 2的Attribute具有Relationship的關係 (Attribute Object)

Recursive Rule Construction Properties (objects) object Methods or Utilities

Recursive Rule Construction Properties (objects) object Methods or Utilities

法規、法條等專業詞彙 屬性或條件 Object 1 (Noun) 方法或Utility Attribute 1 (Noun) … Attributen (Noun) Object 2

法規、法條等專業詞彙 屬性或條件 Object 1 (Noun) 方法或Utility Attribute 1 (Noun) … Attributen (Noun) Object 2 (Noun) Attribute 1 (Noun) 具動作意味

Generalize Object: attribute 貸款 Object: Attribute (condition) 震災重建暫行條例 受災戶 method 重建家園專案 object 災戶 Object:

Generalize Object: attribute 貸款 Object: Attribute (condition) 震災重建暫行條例 受災戶 method 重建家園專案 object 災戶 Object: attribute 金融機構 Object: attribute 利息 Object: attribute 房屋 Object: attribute 損毀 Object: condition Specify

Clustering 95

Clustering 95

97

97

Tasks in News Detection News Feeds Segmentation Detection Retro On-Line Tracking 98

Tasks in News Detection News Feeds Segmentation Detection Retro On-Line Tracking 98

Might be Relevant 世貿中心 五角大廈 2001年九月11日 USS Cole October 12, 2000 Location Aden, Yemen

Might be Relevant 世貿中心 五角大廈 2001年九月11日 USS Cole October 12, 2000 Location Aden, Yemen Date October 12, 2000 11: 18 am (UTC+3) Attack type suicide bombing Deaths 19 (including the 2 perpetrators) Injured 39 Perpetrator(s) al-Qaeda, carried out by Ibrahim al -Thawr and Abdullah al. Misawa 99

911事件 ¡ 可預防 FBI 明尼蘇達幹員 Zacarias Moussaoui 個人電 腦 FBI 鳳凰城備忘錄 (George Will) ¡

911事件 ¡ 可預防 FBI 明尼蘇達幹員 Zacarias Moussaoui 個人電 腦 FBI 鳳凰城備忘錄 (George Will) ¡ Dr. Bhandari (Virtual Gold, Inc) 資料探勘 可預防 911悲劇 100

108

108

Generative Discriminative Generalize Object: attribute 貸款 Object: Attribute (condition) 震災重建暫行條例 受災戶 method 重建家園專案 object

Generative Discriminative Generalize Object: attribute 貸款 Object: Attribute (condition) 震災重建暫行條例 受災戶 method 重建家園專案 object 災戶 Object: attribute 金融機構 Object: attribute 利息 Object: attribute 房屋 Object: attribute 損毀 Object: condition Specify 111

未來(NASA) Modeling Expert Knowledge • Systems model experts’ patterns Capturing Knowledge • Knowledge gathered

未來(NASA) Modeling Expert Knowledge • Systems model experts’ patterns Capturing Knowledge • Knowledge gathered anyplace Integrating Distributed Knowledge • Instrument design is semi-automatic Sharing Knowledge • Adaptive knowledge infrastructure • • is in place Knowledge resources identified and shared appropriately Timely knowledge gets to the right person to make decisions Intelligent tools for authoring through archiving Cohesive knowledge development between JPL, its partners, and customers based on knowledge repositories • Mission software auto-instantiates based on unique mission parameters • KM principals are part of Lab culture and supported by layered COTS products • Remote data management allows spacecraft to self-command Enables seamless integration of systems throughout the world and with robotic spacecraft Enables sharing of essential knowledge to complete Agency tasks • Mars. Net • Europa Orbiter • Space Interferometry Mission 2003 2007 from hand-held devices using standard formats on interplanetary Internet • Expert systems on spacecraft analyze and upload data • Autonomous agents operate across existing sensor and telemetry products • Industry and academia supply spacecraft parts based on collaborative designs derived from JPL’s knowledge system and behaviors to gather knowledge implicitly • Seamless knowledge exchange with robotic explorers • Planetary explorers contribute to their successor’s design from experience and synthesis • Knowledge systems collaborate with experts for new research Enables real-time capture of tacit knowledge from experts on Earth and in permanent outposts Enables capture of knowledge at the point of origin, human or robotic, without invasive technology • Europa Lander/Submersible • Titan Organics: Lander/Aerobot • Neptune Orbiter/Triton Observer • • • Interstellar missions • Permanent colonies Mars robotic outposts Comet Nucleus Sample Return Saturn Ring Observer Terrestrial Planet Finder 2010 2025 112