First Aid Pathology Data quality assessment in PHENIX
- Slides: 37
First Aid & Pathology Data quality assessment in PHENIX Peter Zwart Computational Crystallography Initiative Physical Biosciences Division
Introduction • PHENIX: – Software for bio-molecular crystallography • • • Molecular replacement (PHASER) Substructure solution (SOLVE, HYSS) Phasing (SOLVE; PHASER) Model building (RESOLVE; TEXTAL) Refinement (phenix. refine) Ligand building (ELBOW and RESOLVE) • http: //www. phenix-online. org – New release in due course Computational Crystallography Initiative Physical Biosciences Division
Introduction • Command line tools – – – Convert reflection files to and from any format Get basic statistics of reflection file Solve your substructure Compare substructures (site comparison) Characterize you data set – With a bit more effort and some knowledge of the CCTBX: crystallographic libraries for python Grosse-Kunstleve et al, (2002). J. App. Cryst. 35, 126 -136. • Find Harker sections for a given space group • Compute structure factors • Write your own direct methods program http: //cctbx. sf. net (go to tutorials of Sienna workshop) Computational Crystallography Initiative Physical Biosciences Division
Introduction • PHENIX Strategy – Each box represent a simple task • read in reflection file • Read in molecular replacement model • Do rotation search • Do translation search • Show solution – This allows users to quickly build there own interface for specific strategies Computational Crystallography Initiative Physical Biosciences Division
Introduction • PHENIX Wizard – ‘Grey box’ • Ask question you are supposed to answer • Needs only most basic information and figures things out itself afterwards – Most control parameters can be set by user though • Customizability of underlying parameters for those how want/need it Computational Crystallography Initiative Physical Biosciences Division
Introduction • Structure solution can be enhanced by the knowledge of the quality of the merged data – Presence of absence of anomalous signal • SAD/MAD or MR? – Resolution dependent completeness • Shall I recollect my low resolution? – Twinning • Which refinement target to use? – Anisotropy • Some regions of reciprocal space might be weak – Pseudo centering • Can I use this in Molecular Replacement; Do I expect possible problems in refinement? – Wilson plot • Is this protein? Computational Crystallography Initiative Physical Biosciences Division
Introduction • To answer these kind of question, data sets need to be characterized beyond the standard quantities as Rmerge and nominal resolution • The to-be-presented characterization is implemented in the program mmtbx. xtriage of the PHENIX suite (version > 1. 2; available at sector 5 and 8. 2. 1/8. 2. 2 beamlines) – Easy to use command-line driven program Computational Crystallography Initiative Physical Biosciences Division
Likelihood based Wilson Scaling • Both Wilson B and nominal resolution determine the ‘looks’ of the map Zwart & Lamzin (2003). Acta Cryst. D 50, 2104 -2113. Bwil : 9 Å2; dmin: 2Å Computational Crystallography Initiative Bwil : 50 Å2; dmin: 2Å Physical Biosciences Division
Likelihood based Wilson Scaling • Data can be anisotropic • Traditional ‘straight line fitting’ not reliable at low resolution • Solution: Likelihood based Wilson scaling – Similar to maximum likelihood refinement, but with absence of knowledge of positional parameters – Results in estimate of anisotropic overall B value. Zwart, Grosse-Kunstleve & Adams, CCP 4 newletter, 2005. Computational Crystallography Initiative Physical Biosciences Division
Likelihood based Wilson Scaling • Likelihood based scaling not extremely sensitive to resolution cut-off, whereas classic straight line fitting is. Computational Crystallography Initiative Physical Biosciences Division
Likelihood based Wilson Scaling • Anisotropy is easily detected and can be ‘corrected’ for. – Useful for molecular replacement and possibly for substructure solution • Anisotropy correction cleans up your N(Z) plots Computational Crystallography Initiative Physical Biosciences Division
Likelihood based Wilson Scaling • Useful by products – For the ML Wilson scaling an ‘expected Wilson plot’ is needed – Obtained from over 2000 high quality experimental datasets – ‘Expected intensity’ and its standard deviation obtained Computational Crystallography Initiative Physical Biosciences Division
Likelihood based Wilson Scaling • Resolution dependent problems can be easily/automatic ally spotted Data is from DNA structure – Ice rings • Empirical Wilson plots available for protein and DNA/RNA. Computational Crystallography Initiative Physical Biosciences Division
Pseudo Translational Symmetry • Can cause problems in refinement and MR – Incorrect likelihood function due to effects of extra translational symmetry on intensity • Can cause problems or be helpful during MR – Effective ASU is smaller is T-NCS info is used. • The presence of pseudo centering can be detected from an analyses of the Patterson map. – A Fobs Patterson with truncated resolution should reveal a significant off-origin peak. Computational Crystallography Initiative Physical Biosciences Division
• A database analyses reveal that the height of the largest off-origin peaks in truncated X-ray data set are distributed according to: Computational Crystallography Initiative F(Qmax ) Pseudo Translational Symmetry Relative peak height Qmax Physical Biosciences Division
Pseudo Translational Symmetry • 1 -F(Qmax): The probability that the largest off origin peak in your Patterson map is not due to translational NCS; This is a so-called p value • If a significance level of 0. 01 is set, all off origin Patterson vectors larger than 20% of the height of the origin are suspected T-NCS vectors. Computational Crystallography Initiative PDBID Height P-value (%) 1 sct 77 9*10 -6 1 ihr 45 1*10 -3 1 c 8 u 20 1 1 ee 2 10 5 Physical Biosciences Division
Twinning • Merohedral twinning can occur when the lattice has a higher symmetry than the intensities. • When twinning does occur, the recorded intensities are the sum of two independent intensities. – Normal Wilson statistics break down • Detect twinning using intensity statistics Computational Crystallography Initiative Physical Biosciences Division
• Cumulative intensity distribution can be used to identify twinning (acentric data) Pseudo centering Normal Perfect twin Computational Crystallography Initiative N(Z) Twinning Z Physical Biosciences Division
Twinning Pseudo centering + twinning = N(Z) looks normal • Anisotropy in diffraction data produces similar trend to Pseudo centering – Anisotropy can however be removed • How to detect twinning in presence of T-NCS? – Partition miller indices on basis of detected T-NCS vectors • Intensities of subgroups follow normal Wilson statistics (approximately) – Use L-test for twin detection • Not very sensitive to T-NCS if partitioning of miller indices is done properly: N(Z) and Wilson ratio are N • No need to know twin laws: not sensitive to pseudo symmetry or certain data processing problems. Computational Crystallography Initiative Physical Biosciences Division
Twinning • A data base analyses on highly quality, untwinned datasets reveals that the values of the first and second moment of L follow a narrow distribution • This distribution can be used to determine a multivariate Z-score – Large values indicate twinning Computational Crystallography Initiative Physical Biosciences Division
Twinning • Determination of twin laws – From first principles • No twin law will be overlooked • PDB analyses: 36% of structures has at least 1 possible twin law – 50. 9% merohedral; 48. 2% pseudo merohedral; 0. 9% both • 27% of cases with twin laws is suspected to be twinned – 10% of whole PDB(!) • Determination of twin fraction – Fully automated Britton and H analyses as well as ML estimate of twin fraction of basis of L statistic. Computational Crystallography Initiative Physical Biosciences Division
Conflicting information • PDBID: 1? ? ? – Unit cell: 99. 5 60. 9 70. 96 90 134. 5 90 – Space group : C 2 – Twin laws and estimated twin fractions: • H, -K, -H-L : 0. 44 • H+2 L, -K, -L : 0. 01 • -H-2 L, K, H+L : 0. 01 – <I 2>/<I>2 = 2. 10 (theory for untwinned data : 2. 0); • Data does not appear to be twinned – <L> = 0. 49 (theory for untwinned data : 0. 5); Multivariate Z-score of L test: 0. 963 • Data does not appear to be twinned Computational Crystallography Initiative Physical Biosciences Division
Conflicting information • What is going on? – Estimated twin fraction is large, but data does not seem to be twinned: • Twin law H, -K, -H-L is parallel to an existing NCS axis or • Twin law H, -K, -H-L is a symmetry axis, and the space group is too low – It should be C 2 + H, -K, -H-L = F 222 » http: //www. phenix-online. org/cctbx • Need images to make decision Computational Crystallography Initiative Physical Biosciences Division
Anomalous data • Structure solution via experimental methods (especially SAD) is on the rise. • How to identify the presence of anomalous signal? – <DI/I> ; <DF/F> • VERY sensitive to noise – <DI/s. DI>; <DF/s. DF> • 2? – Measurability • Fraction of Bijvoet differences for which – DI/s. DI>3 and (I+/s. I(+) and I(-)/s. I(-) > 3) • Easy to interpret – At 3 Angstrom 6% of Bijvoet pairs are significantly larger than zero Computational Crystallography Initiative Physical Biosciences Division
Anomalous data • Measurability and <DI/s. DI> are closely related • Measurability more directly translates to the number of ‘useful’ Bijvoet differences in substructure solution/phasing Computational Crystallography Initiative Physical Biosciences Division
Anomalous data <FOM> Sn. B success rate • The quality of the data determines the success of structure solutio Redundancy Weiss, (2000). J. App. Cryst, 34, 130 -135. Computational Crystallography Initiative Measurability Obtained via numerical methods Physical Biosciences Division
Anomalous data B Measurability A 6 (partially occupied) Iodines in thaumatin at l=1. 5Å. Raw SAD phases, straight after PHASER A B 1/resolution 2 Computational Crystallography Initiative Physical Biosciences Division
Anomalous data B Measurability A Density modified 6 (partially occupied) Iodines in thaumatin at phases l=1. 5Å. A B 1/resolution 2 Computational Crystallography Initiative Physical Biosciences Division
Anomalous data • SAD phasing with PHASER – Very sensitive residual maps • Residual map indicates where a certain type of anomalous scatterers need to be placed to improve fit between observed and expected F(+) and F(-) • Lysozyme soaked with solution containing (NH 4)2(Os. Cl 6) – Wilson B: 13. 7; dmin=1. 7 – Data collected at Os L-III edge (f”>10) – Measurability at 3. 0 is 67% • Anomalous signal is strong – Partial structure is large • Zheavy 2/(Zheavy 2+Zprotein 2)=35% Computational Crystallography Initiative PHASER residual map indicating location of main chain atoms Physical Biosciences Division
Anomalous data • SAD phasing with PHASER – Very sensitive residual maps • Residual map indicates where a certain type of anomalous scatterers need to be placed to improve fit between observed and expected F(+) and F(-) • Lysozyme soaked with solution containing (NH 4)2(Os. Cl 6) – Wilson B: 13. 7; dmin=1. 7 – Data collected at Os L-III edge (f”>10) – Measurability at 3. 0 is 67% • Anomalous signal is strong – Partial structure is large • Zheavy 2/(Z heavy 2+Z protein 2)=35% Computational Crystallography Initiative Raw PHASER SAD phases Physical Biosciences Division
Anomalous data • Another extreme – 2 Fe 4 S 4 clusters in 60 residues • Wilson B: 6. 5Å2; dmin=1. 2Å • Measurability at 3. 0Å: 6% – Data not terribly strong • ZFe 2/(ZFe 2+ZS 2+Zprotein 2)=17% • Fe f ”=1. 25 e; S f ”=0. 35 e – PHASER residual map from Fe SAD phases clearly show S positions Computational Crystallography Initiative SAD on Fe, residual maps indicate S positions (green balls) Physical Biosciences Division
Anomalous data • Inclusion of Sulfurs improves phasing – (ZFe 2+ZS 2)/(ZFe 2+ZS 2+Zprotein 2)=32 % – <FOM>=0. 67 (was 0. 53) – Residual maps show almost all non-hydrogen atoms – Inclusion of non hydrogen atoms results in <FOM>=0. 98. SAD on Fe, S. Residual maps (purple) and FOM weighted Fobs map (blue). Computational Crystallography Initiative Physical Biosciences Division
Discussion & Conclusions • Software tools are available to point out specific problems – mmtbx. xtriage <input_reflection_file> [params] • Log file are not just numbers, but also contains an extensive interpretation of the statistics • Knowing the idiosyncrasies of your X-ray data might avoid falling in certain pitfalls. – Undetected twinning for instance Computational Crystallography Initiative Physical Biosciences Division
First Aid Analyses at the beamline If problem are detected while at the beam line, possible problems could be solved by recollecting data or adapting the data collection strategy. The Surgeon and the Peasant – 1524. Lucas van Leyden Computational Crystallography Initiative Physical Biosciences Division
Pathology/Autopsy Analyses at home The anatomical lesson of dr. Nicolaes Tulp - 1632. Rembrandt van Rijn. Computational Crystallography Initiative Physical Biosciences Division
Ackowledgements Paul Adams Ralf Grosse-Kunstleve Pavel Afonine Nigel Moriarty Nick Sauter Michael Hohn Cambridge Randy Read Airlie Mc. Coy Laurent Storoni Funding: Texas A&M Univeristy Jim Sacchettini Tom Ioerger Eric Mc. Kee – LBNL (DE-AC 03 -76 SF 00098) – NIH/NIGMS (P 01 GM 063210) – PHENIX Industrial Consortium Computational Crystallography Initiative Los Alamos Tom Terwilliger Li Wei Hung Thirumugan Rhadakanan Physical Biosciences Division
Twinning - 2 + + +; /N Computational Crystallography Initiative 2 <L> Physical Biosciences Division
- First aid merit badge first aid kit
- Jargonisms
- Colloquial neutral literary words examples
- Phenix scan
- Phoenix scan
- First aid patient assessment form
- Secondary survey sample
- Objective of first aid
- Quality management in anatomic pathology
- Neasden and greenhill park medical centre
- Data-centric pathology
- Data quality and data cleaning an overview
- Data quality and data cleaning an overview
- Content analysis is a type of secondary data analysis
- Data quality and data cleaning an overview
- Cadet badge placement
- 6 vital signs first aid
- Unit 15:8 providing first aid for cold exposure
- Coyne first aid
- Taz hartwick
- First aid angels
- First aid quiz for students
- Srfac singapore
- Kitchen first aid
- Space exploration merit badge powerpoint
- Psychological first aid
- Preserve life first aid
- First aid for caregivers
- Emotional first aid kit ideas
- John furst first aid
- First aid list quotes
- Injury prevention, safety and first aid
- 4x4x4 asthma method 26 puffs
- Ivet first aid
- Sea base first aid kit
- Rice first aid
- What are the objectives of first aid?
- Shock position