A Computer Aided Detection System For Digital Mammograms

  • Slides: 30
Download presentation
A Computer Aided Detection System For Digital Mammograms Based on Radial Basis Functions and

A Computer Aided Detection System For Digital Mammograms Based on Radial Basis Functions and Feature Extraction Techniques By Mohammed Jirari Shanghai, China Sept 3 rd, 2005

Why This Project? • Breast Cancer is the most common cancer and is the

Why This Project? • Breast Cancer is the most common cancer and is the second leading cause of cancer deaths • Mammographic screening reduces the mortality of breast cancer • But, mammography has low positive predictive value PPV (only 35% have malignancies) • Goal of Computer Aided Detection CAD is to provide a second reading, hence reducing the false positive rate

Basic Components of the System • Preprocessing – Cropping – Enhancement (Histogram Equalization) •

Basic Components of the System • Preprocessing – Cropping – Enhancement (Histogram Equalization) • • • Feature extraction Normalization Training Testing ROC Analysis

What is a Mammogram? • A Mammogram is an x-ray image of the breast.

What is a Mammogram? • A Mammogram is an x-ray image of the breast. Mammography is the procedure used to generate a mammogram • The equipment used to obtain a mammogram, however, is very different from that used to perform an x-ray of chest or bones

Mammograms (cont. ) • In order to get a good image, the breast must

Mammograms (cont. ) • In order to get a good image, the breast must also be flattened or compressed • In a standard examination, two images of each breast are taken: one from the top and one from the side

Mammogram Examples Mammogram of a left breast, craniocaudal (from the top) view Mammogram of

Mammogram Examples Mammogram of a left breast, craniocaudal (from the top) view Mammogram of a left breast, mediolateral oblique (from the side) view

Purpose of CAD • Mammography is the most reliable method in early detection of

Purpose of CAD • Mammography is the most reliable method in early detection of breast cancer • But, due to the high number of mammograms to be read, the accuracy rate tends to decrease • Double reading of mammograms has been proven to increase the accuracy, but at high cost • CAD can assist the medical staff to achieve high efficiency and effectiveness • The physician/radiologist makes the call not CAD

Proposed Method • The proposed method will assist the physician by providing a second

Proposed Method • The proposed method will assist the physician by providing a second opinion on reading the mammogram, by pointing out an area (if one exists) delimited by its center coordinates and its radius • If the two readings are similar, no more work is to be done • If they are different, the radiologist will take a second look to make the final diagnosis

Data Used • The dataset used is the Mammographic Image Analysis Society (MIAS) MINIMIAS

Data Used • The dataset used is the Mammographic Image Analysis Society (MIAS) MINIMIAS database containing Medio. Lateral Oblique (MLO) views for each breast for 161 patients for a total of 322 images Each image is: 1024 pixels X 1024 pixels

Preprocessing • Cropping: cuts the black parts of the image (almost 50%) based on

Preprocessing • Cropping: cuts the black parts of the image (almost 50%) based on a threshold • Enhancement: Histogram equalization to accentuate the features to be extracted by increasing the dynamic range of gray levels

Preprocessing Result Original mammogram After cropping and histogram equalization

Preprocessing Result Original mammogram After cropping and histogram equalization

Co-occurrence Matrices to Calculate Features • The joint probability of occurrence of gray level

Co-occurrence Matrices to Calculate Features • The joint probability of occurrence of gray level a and b for two pixels with a defined spatial relationship in an image • The spatial relationship is defined in terms of distance d angle θ • From these matrices, a variety of features may be extracted

Co-occurrence Matrices (cont. ) • In this project, the matrices are constructed at distance

Co-occurrence Matrices (cont. ) • In this project, the matrices are constructed at distance of d=1 and d=3 and for angles θ=0°, 45°, 90°, 135° • For each matrix, seven features are extracted • Can be formally represented as follows:

Features Used • Energy or angular second moment: • Entropy: • Maximum Probability: •

Features Used • Energy or angular second moment: • Entropy: • Maximum Probability: • Inverse Difference moment: κ=2, λ=1

Features Used (cont. ) • Homogeneity: • Inertia or variance:

Features Used (cont. ) • Homogeneity: • Inertia or variance:

Features Used (cont. ) • Correlation:

Features Used (cont. ) • Correlation:

Feature Extraction • Calculate the co-occurrence matrices at distance d=1 and d=3 • The

Feature Extraction • Calculate the co-occurrence matrices at distance d=1 and d=3 • The angles used are θ=0°, 45°, 90°, 135° with the fifth matrix being the mean of the 4 directions • The co-occurrence matrices and seven statistical features are computed

Example of Calculated Features Feature 0 GLCM 45 GLCM 90 GLCM 135 GLCM Mean

Example of Calculated Features Feature 0 GLCM 45 GLCM 90 GLCM 135 GLCM Mean GLCM Energy 1. 62 e+9 1. 31 e+9 1. 73 e+9 1. 31 e+9 1. 48 e+9 Inertia 2. 29 e+7 5. 42 e+7 4. 22 e+7 5. 78 e+7 4. 43 e+7 Entropy 4. 76 e+6 4. 58 e+6 4. 84 e+6 4. 55 e+6 4. 66 e+6 Homogeneity 2. 98 e+5 2. 60 e+5 3. 24 e+5 2. 55 e+5 2. 84 e+5 Max. Prob. 2. 25 e+4 1. 99 e+4 2. 25 e+4 2. 00 e+4 2. 12 e+4 Inv. Diff. Mom. 2. 00 e+5 1. 83 e+5 1. 93 e+5 1. 77 e+5 1. 88 e+5 Correlation 9. 34 e+5 1. 16 e+6 8. 86 e+5 1. 15 e+6 1. 02 e+6

Radial Basis Network Used • Radial basis networks may require more neurons than standard

Radial Basis Network Used • Radial basis networks may require more neurons than standard feed-forward backpropagation (FFBP) networks • BUT, can be designed in a fraction of the time to train FFBP • Work best with many training vectors

Radial Basis Network with R Inputs

Radial Basis Network with R Inputs

Radbas Transfer Function Used a=radbas(n)=e^(-n^2)

Radbas Transfer Function Used a=radbas(n)=e^(-n^2)

Radial basis network consists of 2 layers: a hidden radial basis layer of S

Radial basis network consists of 2 layers: a hidden radial basis layer of S 1 neurons and an output linear layer of S 2 neurons:

Training • After normalizing the data, training begins • The first training set was

Training • After normalizing the data, training begins • The first training set was made up of 212 mammograms with 81 abnormal ones, with features calculated at distances d=1 and d=3 • The second training set was made up of 163 mammograms with 81 abnormal ones, with features calculated at distances d=1 and d=3

Testing • A mammogram is presented to the trained network and the output is

Testing • A mammogram is presented to the trained network and the output is a suspicious area denoted by its center’s x and y coordinates and its radius. If the mammogram is considered to be normal then zeros are returned for the coordinates and radius • The radiologist can then review his/her original assessment of the patient if some areas uncovered by the network were not originally looked at closely • The whole database is tested and the accuracy is calculated • The smaller dataset performed better than the larger one, and using d=3 leads to better results than d=1

Results • • • 2 training datasets: 163 and 212 2 distance measures: 1

Results • • • 2 training datasets: 163 and 212 2 distance measures: 1 and 3 3 spreads: 0. 1, 0. 25, and 0. 05 3 goals: 0. 00003, 0. 008, 0. 00005 For 12 possible combinations The NN was sensitive to the unbalanced data collection that contained about 70 -30 split in the larger training set. Therefore the smaller dataset was preferred • Achieving a high recognition % is not that appealing if the TPF is small

Representative Preliminary Results Network Goal Spread TPF FPF # of Neurons 1 0. 00003

Representative Preliminary Results Network Goal Spread TPF FPF # of Neurons 1 0. 00003 0. 1 0. 0163 0. 5939 163 2 0. 00005 0. 7297 0. 0 145 3 0. 008 0. 25 0. 9404 0. 1037 102 Recognition % 0. 3323 0. 9068 0. 8674 Az 0. 5568 0. 6522 0. 9104

Future work • Use more features like standard deviation, skewness, and kurtosis • Which

Future work • Use more features like standard deviation, skewness, and kurtosis • Which feature(s) have the most impact: * Rank the features from best to worst (single input to NN) * Select most significant feature(s) by using leave one out method • Determine whether the area is benign or malignant by adding the severity of the abnormality to the training

Future work (cont. ) • Try and reduce False Negatives on the basis of

Future work (cont. ) • Try and reduce False Negatives on the basis of region characteristics size, difference in homogeneity and entropy • Use larger database that contains both MLO and CC to train/learn, since most commercial CADs use hundreds of thousands of mammograms to try and recognize foreign samples

Thank you

Thank you

Questions

Questions