Ayelet Israeli and Dror G Feitelson The Linux
- Slides: 61
Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485 -501, Mar 2010. Presented by Dror Feitelson.
Synopsis • A study of 810 versions of the Linux kernel released over 14 years comparing the evolution of the system to Lehman’s Laws of software evolution. • Conclusion: several laws are supported by the data. • Observation: average complexity is decreasing with time.
Linux Background • First announced August 1991 • First release March 1994 • Dual release scheme till 2003 – Odd versions are development (1. 1, 1. 3, 2. 1, 2. 3, 2. 5) – Even versions are production (1. 0, 1. 2, 2. 0, 2. 2, 2. 4) • New release scheme in 2. 6 – New version every 2 -3 months – Development is distributed, no official releases • Full source code of all versions available online
Linux Kernel Versions • Paper used all 810 versions from March 1994 to August 2008 (all. h and. c files) – 144 production – 429 development – 237 of 2. 6 • Unprecedented scale of investigation • Other researchers used only production or only. c or a sample of versions • Some versions (test kernels and release candidates) were missed
1. 0 1. 1 Kernel version locations on www. kernel. org 1. 2 1. 3 2. 0 2. 1 2. 2 2. 3 2. 4 2. 5 2. 6 v 1. 0/linux-1. 0 v 1. 1/v 1. 1. 0 v 1. 1/linux-1. 1. * v 1. 2/linux-1. 2. * v 1. 3/linux-1. 3. * v 1. 3/linux-pre 2. 0. * v 2. 0/linux-2. 0. * v 2. 1/linux-2. 1. * v 2. 1/linux-2. 2. 0 -pre* v 2. 2/linux-2. 2. * v 2. 3/linux-2. 3. 99 -pre* v 2. 4/old-test-kernels/linux-2. 4. 0 -* v 2. 4/linux-2. 4. * v 2. 5/linux-2. 5. * v 2. 6/pre-releases/linux-2. 6. 0 -test* v 2. 6/linux-2. 6. * v 2. 6/testing/v 2. 6. */linux-2. 6. *-rc* v 2. 6/longterm/v 2. 6. */linux-2. 6. *
CPP Problems • Kernel is littered with preprocessor directives • Removed them in order to analyze all the code – This is what the developers see • Sometimes this leads to incorrect syntax – Files where this happened were ignored – About 1. 5% of the code • Alternative is to perform preprocessing (used by others) – Induces code bloat (macros and #include) – Only one configuration of the system
Evolution Background • Textbooks: Software developed in well-defined phases: – – – Elicit requirements Create specifications Design the system Implement Test and correct Install and maintain • Reality: Software evolves: – Start with a small useful project – Users will introduce new requirements – Adapt the system to do what is needed – Needs cannot be anticipated in advance
Three Types of Programs • S-type: derived from well defined formal specifications • P-type: can’t derive a formal solution, so use an iterative process to find and refine a solution Continuous evolution • E-type: a program that becomes embedded in instead of its environment and changes with it; development and mechanizing an activity changes it and induces then maintenance new requirements
Early Data • Data on OS/370 – Size in modules – As function of release serial no. • Main results – Steady growth – Ripple effect – Instability in late releases
Lehman's Laws 1) Continuing change (adaptation) 2) Increasing complexity (unless refactored) 3) Self regulation (of rate of change) 4) Invariant work rate (inertia) 5) Conservation of familiarity (of users and developers) 6) Continuing growth (more features) 7) Declining quality (unless maintained) 8) Feedback system (at multiple levels)
The Idea • Lehman used little data from closed-source systems • A lot of data is now available • Use Linux data to see if it supports Lehman’s Laws • In particular try to use software metrics to quantify the laws
Law VI Continuing Growth The functional capability of E-type systems must be continually enhanced to maintain user satisfaction over system lifetime • Law requires new functionality to be added • Can also be interpreted as requiring growth in size • Are the two interpretations equivalent?
Lehman’s Data • Size in modules as function of release number for OS/360 and other systems • Grows, but growth rate often seen to decline – Though not for OS/360 – Turski suggested inverse square law: – Idea: effort E is spent on all possible interactions among si modules – Leads to a model where
Law VI and Linux • The dominant effect (so we deal with it first) • The easiest to measure and analyze – If interpreted as size • Growth is super-linear (quadratic? ) • Explained by positive feedback with growth of developer base • Functional growth is harder to quantify
Godfrey & Tu Linux Data • LOC or tarball size as function of date, 1994 -2000 • Focus on development versions • Growth rate seen to increase • Fits quadratic model • Largely verified by others • Also for other (but not all) open-source systems
Release Number vs. Time • Doesn’t matter if releases are regular – In Linux before 2. 6 they are not • Changes growth shape if irregular • Question of interleaving multiple versions – Assume version 2. 3 was released after 3. 0 – If sorted by number their order is reversed – Justified because related to 2. 2, not to 3. 0
Linux Growth Data
Super-linear growth Contradicts Lehman and Turski who claimed growth should slow down due to increasing complexity
Functionality • Previous results good for all common size metrics • Different results if try to measure functional growth • System calls are leveling out – Possibly reflects maturity, as predicted by Torvalds • Config options are growing faster – Indicates growth is in internal mechanisms rather than user-visible services
System Calls
Config Options
Law I Continuing Change An E-type system must be continually adapted, else it becomes progressively less satisfactory in use • This means that software must evolve • “Adapt” implies keeping up with a changing environment
Law I and Linux • Change is obviously true – In 2. 6 a new version is released every 2 -3 months • Change is achieved through growth • Adaptation to changing hardware environment • Hard to distinguish adaptation from growth – Is adding support for sound cards a new feature or adaptation to a changing environment?
Adaptation to New Hardware • Special case of operating system environment • Confined to two subdirectories – arch (supported architectures) – drivers (supported peripherals) • Together about 60% of the code • Grow together with the rest of the system at about the same rate
arch + drivers vs. Whole Kernel
Law II Increasing Complexity As an E-type system is changed its complexity increases and it becomes more difficult to evolve unless work is done to maintain it and reduce the complexity • Functionality costs in complexity • Two-sided law: supported either way
Law II and Linux • Complexity not necessarily increasing • System is largely modular (e. g. no coupling between file systems, scheduler, and drivers) • New functions being added are short and simple • Growing number but reduced fraction of high -MCC functions • Active work to reduce complexity
Mc. Cabe Cyclomatic Complexity (MCC) • Introduced by Mc. Cabe in 1976 • Essentially counts the minimal number of paths through the code • Suggestion: functions with MCC>10 may require refactoring • Easily calculated by counting predicates – All while, for, if, and case statements • Widely used in tools and research • Has been criticized, but no better alternatives
Measuring MCC • Use commercial static analysis tool (klocwork) – Requires compilation of the code – Therefore limited to specific configuration – Some bug and usage problems • Use free tool (pmccabe) – But not in this paper • Write your own script – Simple and what we need – Danger of bugs and not being standard
Results • Total MCC grows with code
Results • Total MCC grows with code • But average MCC per function is decreasing
Distribution of MCC
Possible Explanations • Many new functions being added, and they tend to be simpler than the old ones – Indeed, new functions tend to have lower MCC • Code is being actively improved with time
High-MCC Functions • Distribution of MCC values is heavy-tailed • Highest values are in the hundreds – 369 functions with MCC ≥ 100 over the years • Some of these functions evolve – Massive reduction in MCC as in sys 32_ioctl – Gradual growth of MCC – Occasional large growth in production version • Very long, but actually not very complex
Tail of MCC Distribution
An Aside on Heavy Tails • Definition: tail decays as a power law • CDF: • CCDF: • Heavy tail: • LLCD:
Law VII Declining Quality Unless rigorously adapted and evolved to take into account changes in the operational environment, the quality of an E-type system will appear to be declining • Again can be supported either way • What is “quality”?
Law VII and Linux • Question of how to quantify quality • Quality is most probably not decreasing • It may even be improving
Perceived Quality • If quality declines system will fall out of use • Linux usage is strong and growing • Ergo Linux quality is not declining
Measured Quality Oman’s Maintainability Index (MI) • HV = Halstead’s volume (N ln n) – Bits required to write the function • MCC = Mc. Cabe Cyclomatic Complexity • Lo. C = Lines of Code • p. CM = percent Comment lines – Interpreted as fraction (0 -1) rather than percent
Changes in MI
Law IV Invariant Work rate The work rate of an organization evolving an E-type software system tends to be constant over the operational lifetime of that system or phases of that lifetime • Large organizations have inertia • What about open source communities?
Law IV and Linux • Work on Linux is growing superlinearly • Fraction of files handled is near constant • Release rate is near constant – 5 -10 days per minor release till 2. 5 – 2 -3 months for new version in 2. 6
Interpretation 1: Work Hours • Data not available • Ill-defined: developers typically have other daytime job • Nevertheless, work rate is most probably not constant – Growth in developer base – Increased growth rate of code
Interpretation 2: Elements Handled • • • Suggested by Lehman Use development versions (+ 1 st year of 2. 4) Includes number added (reflects growth) Absolute number grows with time Fraction of existing files relatively constant
Interpretation 3: Release Rate • Release rate of development versions 19962003 around 3 -6/month – Lower in 2. 4
Releases per Month
Interpretation 3: Release Rate • Release rate of development versions 19962003 around 3 -6/month • Production versions have high minor release rate until next development version is forked
Rate of Minor Releases Linear slope = steady release rate
Interpretation 3: Release Rate • Release rate of development versions 19962003 around 3 -6/month • Production versions have high minor release rate until next development version is forked • Since 2003 (version 2. 6) new version every 2 -3 months • Conclusion: seems to support constant rate
Law V Conservation of Familiarity In general, the incremental growth (growth rate trend) of E-type systems is constrained by the need to maintain familiarity • Capacity of humans to change constrains the rate of change
Law V and Linux • Rapid development releases imply small change between versions • Production versions branch off from development versions again with small change • Large difference between production versions – So user familiarity is not conserved • Users may continue to use production version for long time – Evidence for need for conservation of familiarity
Law III Self Regulation Global E-type system evolution is feedback regulated • Reflects a balance between forces that demand change, and constraints on what can actually be done
Lehman’s Ripple • Ripple indicates negative feedback control • Or maybe alternation of major/minor releases?
Increments of Growth • Large increment reflects desire to add more new functionality • Small increment reflects need to stabilize • Alternations reflect self regulation • Also seen to some degree in Linux
Alternating Increments
Law VIII Feedback System E-type evolution processes are multi-level, multi-loop, multiagent feedback systems • Extension of law III?
Law VIII and Linux • Archetypal open-source system • Continued development based on feedback from users – Defect reports – Bug fixes – Contribution of code • Change of release scheme in 2. 6 reflects need for more rapid dissemination • Hard to quantify
Lehman’s Laws and Linux: Summary • Some laws are two-sided – II (complexity), VII (quality) • Some laws are qualitative – I (adaptation), III (self regulation), V (familiarity), VII (quality), VIII (feedback) • Laws need to be interpreted and quantified – II (complexity), IV (work rate), VII (quality)
Lehman’s Laws and Linux: Summary I II III change complexity self regulation IV work rate V familiarity VI growth VII quality VIII feedback Adaptation to new hardware Not increasing Maybe Constant release rate, superlin. growth Within production versions Superlinear Not decreasing Inherent in open source paradigm
- Ayelet israeli
- Ayelet steckbeck
- Dror rom
- Dror levy
- Israeli ministry of education
- Israeli fighting system
- Israeli-palestinian conflict dbq answers
- Two vocal styles in singing of pakistan
- End of the say
- Israeli fighting system
- Telenor kontaktcenter
- "israeli autumn"
- "israeli autumn"
- Linux kernel linux security module m1
- Cs423
- Hình ảnh bộ gõ cơ thể búng tay
- Lp html
- Bổ thể
- Tỉ lệ cơ thể trẻ em
- Voi kéo gỗ như thế nào
- Tư thế worm breton là gì
- Alleluia hat len nguoi oi
- Các môn thể thao bắt đầu bằng tiếng bóng
- Thế nào là hệ số cao nhất
- Các châu lục và đại dương trên thế giới
- Công thức tiính động năng
- Trời xanh đây là của chúng ta thể thơ
- Mật thư anh em như thể tay chân
- 101012 bằng
- độ dài liên kết
- Các châu lục và đại dương trên thế giới
- Thơ thất ngôn tứ tuyệt đường luật
- Quá trình desamine hóa có thể tạo ra
- Một số thể thơ truyền thống
- Bàn tay mà dây bẩn
- Vẽ hình chiếu vuông góc của vật thể sau
- Nguyên nhân của sự mỏi cơ sinh 8
- đặc điểm cơ thể của người tối cổ
- Thế nào là giọng cùng tên
- Vẽ hình chiếu đứng bằng cạnh của vật thể
- Tia chieu sa te
- Thẻ vin
- đại từ thay thế
- điện thế nghỉ
- Tư thế ngồi viết
- Diễn thế sinh thái là
- Các loại đột biến cấu trúc nhiễm sắc thể
- Số nguyên tố là gì
- Tư thế ngồi viết
- Lời thề hippocrates
- Thiếu nhi thế giới liên hoan
- ưu thế lai là gì
- Khi nào hổ con có thể sống độc lập
- Khi nào hổ mẹ dạy hổ con săn mồi
- Hệ hô hấp
- Từ ngữ thể hiện lòng nhân hậu
- Thế nào là mạng điện lắp đặt kiểu nổi
- Is unix and linux same
- Difference between unix and linux commands
- Linux operation and administration chapter 8
- Security strategies in linux platforms and applications
- Linux operations and administration