Mining Frequent Episodes for relating Financial Events and

  • Slides: 27
Download presentation
Mining Frequent Episodes for relating Financial Events and Stock Trends Anny Ng and Ada

Mining Frequent Episodes for relating Financial Events and Stock Trends Anny Ng and Ada Wai-chee Fu PAKDD 2003 報告者:Ming Jing Tsai date: 2004/03/05

Definition Events : financial news , political… ¡ e 1, e 2, e 3….

Definition Events : financial news , political… ¡ e 1, e 2, e 3…. , ek : event types ¡ day record Di: {ei 1, ei 2, ei 3…. , eik} ¡ Episode: {e 1, e 2, e 3…. , ek},has at least two elements and at least one ej is a stock event type ¡ Window = x days ¡

Definition Window frequency:number of windows that contains an event type ¡ DB frequency:number of

Definition Window frequency:number of windows that contains an event type ¡ DB frequency:number of occurrences of an event type in DB ¡ Frequency of an episode (ex) ¡ l l number of windows the first day of window contains at least one of the event types in episode.

Construct event tree Header in descending db frequencies order ¡ Event_set pair <(firstday) ,

Construct event tree Header in descending db frequencies order ¡ Event_set pair <(firstday) , (remaining day)> ¡ sorted in the descending db frequencies ¡ node<E: C: B>: E : event type , c : counts , b : binary bit

Pruning method window frequencies < min_sup ¡ Remove duplicate event type in both firstday

Pruning method window frequencies < min_sup ¡ Remove duplicate event type in both firstday part and remaining day part ¡

An Event database days events 1 b 2 ac 3 b 4 d 5

An Event database days events 1 b 2 ac 3 b 4 d 5 b 6 ca 7 d Window = 3, min_sup =3 Db frequencies<a: 2, b: 3, c: 2, d: 2>

windows Window = 3, min_sup =3 Ordered frequent event type<b, a, c, d> window

windows Window = 3, min_sup =3 Ordered frequent event type<b, a, c, d> window Day included Event_set pairs 1 1, 2, 3 <(b), (ac)> 2 2, 3, 4 <(a, c), (b, d)> 3 3, 4, 5 <(b), (d)> 4 4, 5, 6 <(d), (b, a, c)> 5 5, 6, 7 <(b), (a, c, d)> 6 6, 7 <(a, c), (d)> 7 7 <(d), ()> Window frequencies<a: 5, b: 5, c: 5, d: 6>

{null} b a {b: 1: 0} {a: 1: 0} c d {a: 1: 1}

{null} b a {b: 1: 0} {a: 1: 0} c d {a: 1: 1} {c: 1: 0} {c: 1: 1} {b: 1: 1} {d: 1: 1}

{null} b a {b: 2: 0} {a: 1: 0} c d {d: 1: 0}

{null} b a {b: 2: 0} {a: 1: 0} c d {d: 1: 0} {d: 1: 1} {a: 1: 1} {c: 1: 0} {b: 1: 1} {c: 1: 1} {b: 1: 1} {a: 1: 1} {d: 1: 1} {c: 1: 1}

{null} b a {b: 3: 0} {a: 1: 0} c d {d: 1: 0}

{null} b a {b: 3: 0} {a: 1: 0} c d {d: 1: 0} {d: 1: 1} {a: 1: 1} {c: 1: 0} {b: 1: 1} {c: 1: 1} {b: 1: 1} {a: 1: 1} {d: 1: 1} {c: 1: 1}

{null} b a {b: 3: 0} {a: 1: 0} c d {d: 1: 0}

{null} b a {b: 3: 0} {a: 1: 0} c d {d: 1: 0} {d: 1: 1} {a: 2: 1} {c: 1: 0} {b: 1: 1} {c: 2: 1} {b: 1: 1} {a: 1: 1} {d: 1: 1} {c: 1: 1}

{null} b a {b: 3: 0} {a: 2: 0} c d {d: 1: 1}

{null} b a {b: 3: 0} {a: 2: 0} c d {d: 1: 1} {a: 2: 1} {c: 2: 1} {d: 1: 1} {c: 2: 0} {d: 1: 0} {b: 1: 1} {d: 1: 1} {a: 1: 1} {d: 1: 1} {c: 1: 1}

{null} b a {b: 3: 0} {a: 2: 0} c d {d: 1: 1}

{null} b a {b: 3: 0} {a: 2: 0} c d {d: 1: 1} {a: 2: 1} {c: 2: 1} {d: 1: 1} {c: 2: 0} {d: 2: 0} {b: 1: 1} {d: 1: 1} {a: 1: 1} {d: 1: 1} {c: 1: 1}

Mining frequent episode ¡ ¡ Header table{h 0, h 1, …. . , h.

Mining frequent episode ¡ ¡ Header table{h 0, h 1, …. . , h. H} Mining recursively each of the linked list kept at the header table l ¡ ¡ Conditional path can build conditional event tree Object 1: found frequent episodes of form {a} ∪{hi} l ¡ from bottom to top first-part frequencies Object 2: found frequent episodes that contain hi and at least two other event types l Db frequencies

Traverse conditional path Remove invalid event types ¡ Adjust counts of nodes above hi

Traverse conditional path Remove invalid event types ¡ Adjust counts of nodes above hi in the path to be equal to that of hi ¡ If hi is in the firstdays part, then move all event types in the remainingdays part to the firstdays part ¡ Remove hi from the path ¡

Generate frequent episode ¡ When a conditional event tree contains only a single path

Generate frequent episode ¡ When a conditional event tree contains only a single path l l Any subset of firstpart ∪ event base set Any Subsets of firstpart ∪ Any Subsets of remainingpart ∪ event base set

min_sup =3 Mining Header d ¡ ¡ ¡ <(a: 1, c: 1), (b: 1)>

min_sup =3 Mining Header d ¡ ¡ ¡ <(a: 1, c: 1), (b: 1)> <(b: 1), ()> <(b: 1, a: 1, c: 1), ()> <(b: 1), (a: 1, c: 1)> <(a: 1, c: 1), ()> event base set {d} W Event_set pairs 1 <(b), (ac)> 2 <(a, c), (b, d)> 3 <(b), (d)> 4 <(d), (b, a, c)> 5 <(b), (a, c, d)> 6 <(a, c), (d)> 7 <(d), ()> db frequency: {<b: 4, a: 4, c: 4>} First_part frequency: {<b: 3, a: 3, c: 3>} Frequent episode : {bd, ad, cd}

event base set {cd} Recursively Mining Header c <(a: 1, b: 1), ()> ¡

event base set {cd} Recursively Mining Header c <(a: 1, b: 1), ()> ¡ <(b: 1, a: 1), ()> ¡ <(b: 1), (a: 1)> ¡ <(a: 1), ()> ¡ <(a: 1, c: 1), (b: 1)> <(b: 1), ()> <(b: 1, a: 1, c: 1), ()> <(b: 1), (a: 1, c: 1)> <(a: 1, c: 1), ()> db frequency: {<b: 3, a: 4>} First_part frequency: {<b: 3, a: 3>} Frequent episode : {bcd , acd}

event base set {acd} Recursively Mining Header a <(b: 1), ()> ¡ <(a: 1,

event base set {acd} Recursively Mining Header a <(b: 1), ()> ¡ <(a: 1, b: 1), ()> <(b: 1, a: 1), ()> <(b: 1), (a: 1)> <(a: 1), ()> db frequency: {<b: 3>} First_part frequency: {<b: 3>} Frequent episode : {bacd}

min_sup =3 Mining Header c <(b: 1), (a: 1)> ¡ <(a: 1, b: 1),

min_sup =3 Mining Header c <(b: 1), (a: 1)> ¡ <(a: 1, b: 1), ()> ¡ <(b: 1), (a: 1)> ¡ <(a: 1), ()> ¡ event base set {c} W Event_set pairs 1 <(b), (ac)> 2 <(a, c), (b, d)> 3 <(b), (d)> 4 <(d), (b, a, c)> 5 <(b), (a, c, d)> 6 <(a, c), (d)> 7 <(d), ()> db frequency: {<b: 3, a: 4>} First_part frequency: {<b: 3, a: 2>} Frequent episode : {bc}

min_sup =3 event base set {ac} Recursively Mining Header a <(b: 1), ()> ¡

min_sup =3 event base set {ac} Recursively Mining Header a <(b: 1), ()> ¡ db frequency: {<b: 3>} First_part frequency: {<b: 3>} Frequent episode : {bac}

min_sup =3 Mining Header a <(b: 1), ()> ¡ event base set {a} W

min_sup =3 Mining Header a <(b: 1), ()> ¡ event base set {a} W Event_set pairs 1 <(b), (ac)> 2 <(a, c), (b, d)> 3 <(b), (d)> 4 <(d), (b, a, c)> 5 <(b), (a, c, d)> 6 <(a, c), (d)> 7 <(d), ()> db frequency: {<b: 3>} First_part frequency: {<b: 3>} Frequent episode : {ba}

Experiment (synthetic data)

Experiment (synthetic data)

Dataset 2 T 20, I 5, M 1000, D 3 K

Dataset 2 T 20, I 5, M 1000, D 3 K

Experiment (real data) ¡ News event from a internet l l ¡ 121 event

Experiment (real data) ¡ News event from a internet l l ¡ 121 event types 757 days Stock data l Dow Jones , Nasdaq , Hang Seng , 12 top local companies

Experiment (real data)

Experiment (real data)

Experiment (real data) episode Nasdaq downs, PCCW downs Nasdaq ups, SHK properties flats, HSBC

Experiment (real data) episode Nasdaq downs, PCCW downs Nasdaq ups, SHK properties flats, HSBC flats China Mobile downs, Nasdaq downs, HK Electric flats support 151 178