Enabling Grids for Escienc E g Lite Error
Enabling Grids for E-scienc. E g. Lite Error Handling Steve Fisher for JRA 1 -UK www. eu-egee. org INFSO-RI-508833
Introduction Enabling Grids for E-scienc. E • Good error handling is appreciated by users • Bad error handling incurs the wrath of Stephen Burke INFSO-RI-508833 Errors - Brno 2
Examples from Stephen B Enabling Grids for E-scienc. E • From long experience I think there's a hierarchy of bad error messages: – 1) Crash, core dump etc. – 2) No error message, so you think it worked when it didn't § maybe worse than 1 – 3) Something which might indicate an error or might not, e. g. “No results returned” from R-GMA. – 4) A catch-all error which translates to “something went wrong”, e. g. “ERROR: Failed to instantiate Consumer” from RGMA. – 5) An error which assumes a particular cause when in fact there are many causes, e. g. “invalid argument” from the lcg-* tools. – 6) A message which can only be translated to the real cause by the initiated. INFSO-RI-508833 Errors - Brno 3
Examples - continued Enabling Grids for E-scienc. E – 7) A message which almost tells you what happened, but leaves out some vital information: “couldn't open file” - but which file? ! – 8) A 50 -line dump of everything the code can find, which has the real error buried somewhere in it, e. g. “expired host certificates” with GSI. – 9) A message which tells you what went wrong in a way which makes it clear that the code could have recovered itself but didn't bother, e. g. edg-rm giving up when the first replica fails even when there might be 30 others to try. – I would include 10), a helpful error message which tells you exactly what went wrong and what to do about it, but I don't think I've ever seen one of those. . . INFSO-RI-508833 Errors - Brno 4
So… Enabling Grids for E-scienc. E • Good error handling is most important when one g. Lite component calls another – Error passed finally back to the user must be § Comprehensible § Comprehensive – It must be easy for the API user to take appropriate action § i. e. don’t expect the user to do pattern matching on an error message • 4 Areas – – Internal to a service The service interface (WSDL) g. Lite API Displayed by a g. Lite provided tool INFSO-RI-508833 Errors - Brno 5
Internal to a service Enabling Grids for E-scienc. E • There is no reason to suggest any rules • Services can preserve their autonomy • For R-GMA we use moderately deep exception hierarchy INFSO-RI-508833 Errors - Brno 6
In the WSDL Enabling Grids for E-scienc. E • Use a small number of WSDL faults: <element name="Unknown. Resource. Exception" type="rgma: Unknown. Resource. Exception"/> <complex. Type name="Unknown. Resource. Exception"> <sequence> <element name="err. Msg" type="xsd: string" min. Occurs="0"/> <element name="err. No" type="xsd: int"/> </sequence> </complex. Type> <wsdl: message name="Unknown. Resource. Exception. Message"> <wsdl: part name="fault" element="rgma: Unknown. Resource. Exception"/> </wsdl: message> <wsdl: operation name="set. Termination. Interval" parameter. Order="resource. Id termination. Interval"> … <wsdl: fault name="Unknown. Resource. Exception" message="impl: Unknown. Resource. Exception. Message"> </wsdl: fault> … </wsdl: operation> INFSO-RI-508833 Errors - Brno 7
R-GMA set of faults Enabling Grids for E-scienc. E • RGMAException – xsd: string err. Msg(0. . 1) – xsd: int err. No – xsd: string trace(0. . 1) • Unknown. Resource. Exception – xsd: string err. Msg(0. . 1) – xsd: int err. No • RGMASecurity. Exception – xsd: string err. Msg(0. . 1) – xsd: int err. No INFSO-RI-508833 Errors - Brno 8
Could generalise Enabling Grids for E-scienc. E • Service. Exception – xsd: string error. Message(0. . 1) – xsd: int error. Number – xsd: string trace(0. . 1) error. Message is free format string error. Number is a “small” integer trace is free format string • Unknown. Resource. Exception – xsd: string error. Message(0. . 1) – xsd: int error. Number • Auth. Exception – xsd: string error. Message(0. . 1) – xsd: int error. Number Auth rather than Security because of java. lang. Security. Exception clashes If one service calls another which returns an exception it is the responsibility of the caller to generate a decent message and error number. Information from the underlying problem can be added to the trace. INFSO-RI-508833 Errors - Brno 9
API view Enabling Grids for E-scienc. E • Errors get passed from the Service back to the user in a style appropriate to the language. – For Java, C++ and Python use Exceptions matching the WSDL – For C we use an object like thing: if (RGMAPrimary. Producer_insert(pp, insert) != 0) { fprintf(stderr, "Failed to insert. n"); fprintf(stderr, "<%s>n", RGMA_get. Exception(pp)->error. Message); exit(1); } INFSO-RI-508833 Errors - Brno 10
API Errors Enabling Grids for E-scienc. E • Additionally some errors can be generated by the API: – Remote. Exception § unable to contact the service – Auth. Exception § same as service returns but this time due to authentication problem – Service. Exception § user does not know what is in the API and what is in the service. § from a user perspective the API is the service • Each API should provide a set of symbolic constants for the error numbers. – Changing the error numbers introduces an incompatibility – No attempt should be made to interpret the value of the number – The error messages are for humans and are subject to change INFSO-RI-508833 Errors - Brno 11
The 4 types of exception Enabling Grids for E-scienc. E • Auth. Exception – User should ensure that he is authenticated and has the right authorization. He should not get back much information. • Remote. Exception – Unable to contact the service. You might want to try again. • Unknown. Resource. Exception – Try remaking the resource – though you want to wait a little while first or limit the number of attempts • Service. Exception – This may be in invalid interaction with the service or it could be a faulty service. Consult the error message. INFSO-RI-508833 Errors - Brno 12
CLI Enabling Grids for E-scienc. E • The CLI will normally trap and handle errors • Unexpected errors should result in printing the error message but not the trace unless the CLI is being run in debug mode. INFSO-RI-508833 Errors - Brno 13
Conclusion Enabling Grids for E-scienc. E • Most of the issues about errors are non-technical • Error handling needs to be taken seriously with full attention to the messages: – Comprehensibility – Comprehensiveness • We should try to agree upon the principles INFSO-RI-508833 Errors - Brno 14
- Slides: 14