THE IMPORTANCE OF BUG TESTING - Editorial by dethy _________________________________ 1. Software Development Stages i. What defines beta? ii. What defines alpha? iii. What defines stable 2. Why bug test? i. Importance to Client ii. Importance to Programmer 3. Development Goals i. Software testing vendor's goals ii. public's goal as bug testers 4. Software Testing Strategies i. Functional Prototypes ii. Designing Test Sets iii. Defect Testing iv. Acceptance Testing v. Structural Prototypes vi. Signs to observe 5. Bug discovery ? i. Alerting the Vendor ii. Alerting Clients 6. Final Note _________________________________ 1. Software Development Stages The following whitepaper discusses 'The Importance of Bug Testing' with respect to client and vendor environments. Various responsibilities are placed on either side of product development, and it is necessary to understand the reasons behind practising secure code and ethical loyalty. In the Real World systems (hardware or software) often go through two stages of release testing: * Alpha (in-house) * Beta (out-house) What defines alpha ? The term 'alpha' was adopted from the Greek number '1', in the early 1960's for computer terminology used to describe product cycle checkpoint, first used at IBM, and thus advanced to become a standard throughout the computer industry. This first phase of testing of a product/system is apart of the software development process. The Alpha development stage includes unit testing, component testing, and system testing, but originally was endowed to feasibility and manufacturability evaluation done before any commitment to design and development. What defines beta ? The term 'beta' was adopted from the Greek number '2', again with IBM using this terminology to categorise product development. In software development, a beta test is the second phase of software testing in which a sampling of the intended audience tries the product out. Beta testing can be considered "pre-release testing" of a product/system, in which there is a potential for code flaws/logic errors distributed throughout the program. However, in recent years Beta test versions of software have become distributed to a wide audience on the World Wide Web partly to give the program a "real-world" test and partly to improve the functionality by clients voicing their approval/(dis)satisifaction/comments about the product/system. Often it is the case that a product may stay in the Beta stage for several years, and could be considered stable, but it is the choice of the product's vendor to reside the product's stage in Beta until furthermore rare bugs have been ironed out. Releasing an advisory could take place after testing a beta release of a product. It is a general rule of thumb that most advisories come out as product testing of more stable applications, but it is know that in some cases an advisory for beta releases is necessary. Sometimes, it is the only way to get vendors to make a patch for their security risks involved in bug vulnerability discovery, additionally beta releases may stay in the phase for months or years, as was previously discussed. Releasing advisories for alpha products is somewhat non-sensical. Alpha releases are known to be highly unstable and should not be run without caution and hesitancy; most of the bugs are found at this stage, so releasing bug alerts publically at this time becomes relatively trivial. What defines stable ? Stable is the final outcome of the developed product. After the Beta product has been decided to be fit for the task, with most of the known bugs fixed or patched, and the product successfully fulfills all requirements of functionality, it then termed, 'stable'. Product testing has been utilised with most (if not all) flaws worked out, and as much code optimisation has been implemented as possible for the end software application. This is the accepted version of the product that is capable of handling data correctly, as needed. This phase attempts to please customers/clients to satisfy their needs as consumers and users of the product. 2. Why bug test ? It is often debated why people must test programs, as ethical people we do not have to, but the world is not entirely full of ethical people who ensure correct data is computed into a system, that is why safe practises need to be developed. The only way for this to take place is through bug testing. There are two categories that effect both the client and the programmer, each have different needs and wants in terms of the importance of bug testing. * Importance to Client * Importance to Programmer In the client's perspective, having a stable program that is guaranteed to performs it's desired task is not only a reflection of the program but also a reflection of the company itself. Poor products shines the light dimly on the company, that is why a solid and well tested product needs to be entrusted through bug testing, before manufacturing takes place. Whilst it is not always known by management about product flaws, company directors assume that every function works smoothly without defects at all. However, experience shows that no product/system can be deemed completely secure without controversy. There will always be existence of bugs in program, whether they are found or not is another question. On the other hand, open source software is much easier to spot bugs and code flaws, but active security checks through the public help create a much more stable and operable program. This is one of the reasons why Microsoft (c) products fail consistently when it comes to testing; their products are not open source, and therefore it is much harder to create a secure and flexible program without aid of the programming community to help optimise code. The importance to the client, purchaser of the software, is without doubt a key aspect in performing their daily tasks successfully. If the program was vulnerable to overflows, lack of input checks, or even lack of encryption, the program would quickly become known for its unstableness, and product sales will drop dramatically. Customers will purchase alternative products available that perform the same task, that have been carefully checked by multiple tests, as will be seen in the testing section of this document. There is a high level of ethics involved when the programmer is contracted to develop a program. The programmer is the top of the chain for importance in testing and coding a proficient software application. He/she is responsible for ensuring all functions of the program work, and work efficiently; code optimisation should be at its peak, with security functions in check. Better programs are known to have been thoroughly tested with all sorts of data sets been properly dealt with from within the program, operating systems like Linux are tested everyday by programmers, and hackers alike. Yes, security problems do exist in this environment but most have now been patched or fixed, pushing towards one of the most stable systems currently around. Sloppy programmers will not care about ethics, and will simply code the program to minimally function with all it's client side requirements implemented. Some programs deem financial security more important than ethical security - becareful of those whom you contract to fulfil your programming requirements. 3. Development Goals Goals should be adopted by programmers to ensure software quality assurance, but the customer has a responsibility to communicate to the programmer once a bug has been found. Software testing vendors goals The most important primary goal of a programmer is to actually complete a working program that serves purpose to client-side requirements. Once this stage has been reached, the more advanced and less known methods should be then put into practise. Added functionality such as: * security features * help support * contact addresses Added security features is a must, and assures code quality to be evident within a program. Use of secure functions and methodologies/implementations should at this stage make themselves known. This is where a gap between sloppy and aware programmers becomes apparent. All programs should aim for a level of code quality by utilising the secure function calls within their specified programming language which helps create a more reliable and flexible program. Of course one of the only certain ways to determine a programs reliability is through testing. Testing focuses on the need for rapid feedback and the evolving nature of the program under test, this is where clients/customers come into the picture. Public's goal as bug testers Although programmers bare the most responsibility in terms of code reliance, clients and customers alike need to be prepared to communicate with software engineers if a bug or flaw is observed in a program. If the expected output is different to what is given, it's time to get in contact by means of a bug discussion list, email, phone - whatever, but be sure to advise the correct people. Especially if the bug could lead to increased privileges, it most important to inform product vendors before the public know about it. This gives time for the vendors to write patches/advisories for their clients, before any harmful damage could be used against their products. Testing software is always a step in the right direction. Effective bug testing by customers/clients will force the programmer to improve code quality and security in future products, that is why we must tolerate and thank the software task forces out there, that make software vulnerability's public, such a bug advocacy is BUGTRAQ, http://www.securityfocus.com. When reporting a bug, always be sure you can reproduce it, always include detailed descriptions of *exactly* how the bug was found, and the type of system that you tested the software application on. The more information the better, but be sure not to obscure of obfuscate the description - get as much as basic facts down as possible. In particular segmentation faults generally cause core dumps (a memory image of the terminated process when any of a variety of errors occur), hold vasts amounts of information for the programmer to locate where the bug took place. Remember full disclosure is bliss. 4. Software Testing Strategies Developing a program or system effectively needs to be thoroughly thought out before any raw code is actually written down. One of the most important methods of establishing functional requirements is through a storyboard, as a means of a prototype. Prototypes may consist of a storyboard, which is a sequence and series of screens, showing the end-user a typical scenario of using the program/system. Functional prototypes This is one of the most useful methods for making sure the programmer understands just what a program is intended to do. A functional prototype is a very limited version of the final program, it gives some idea of the appearance of the final product, but with a lot of functions missing. Displaying a simple storyboard to a client or bug tester is necessary, as they will be able to comment on whether the 'expected input' takes the 'observed output' resulting from running the program. This will also force the programmer to think through many of the details of what the program is meant to do. Designing Test Sets Creating workable and effective sets of tests is intellectually challenging. Testing can almost never be exhaustive, and it may even be possible that not all programming flaws are evaluated even after very stringent testing has been covered. In the real ' commercial' world, a significant source of program defects is due to people running tests and not checking the results carefully. This means that the programmers actually run tests but do not take enough care in reviewing the results to see that the tests showed unexpected flaws in the programs. Tests must be convincing and must demonstrate a successful performance of the program. In a commercial setting there are many methodologies used to produce a designed set of tests. One of the necessary tests that should be first evaluated is the main function of the program. This means deciding on a set of tests that enable you (the programmer) to see if the code achieves its desired outcome. All conditions of the program need to be undoubtedly checked, statements like: * case, loops, if then else structures * boundary conditions [Ex. The pseudocode: IF $i<100 THEN .. - make sure that 99,100,101 values for $i are properly dealt with] * exercise all parts of the code [ Ex. designing a rigorous set of tests ] Naturally sets of tests will assess the same parts of the program known as 'equivalence partitioning' for tests, although this may seem duplicitous, it is standard of economical testing. Perhaps part of the code works in one scenario, but not another - this needs to be carefully checked. The first thing a programmer needs to understand is that testing will demonstrate the presence of bugs, but it will not demonstrate the absence of bugs. Semantic errors fall into this category, that is, errors in the logic of the program, that the compiler or interpreter is unable to help you with. Testings falls into two broad categories: * defect testing * acceptance testing Defect Testing This type of test tries to detect all the defects the program may have. All parts of the program should be tested, and if the programmer feels that one part of the code may not properly deal with unexpected input, more rigorous tests should be performed on that area of the code. One key point to remember from this is that "nobody knows a program better, than the programmer himself" - the programmer will know the area of the program that is most likely defective, such that a designed set of tests should be practiced before a Beta release is produced. Stemming from defect testing is 'regression testing'. Regression testing is the process of testing changes within the programming environment to programs to make sure that the older program still works with the new implemented changes. Regression testing is a normal part of the program development process and, in the commercial world is performed by code testing specialists. Test department coders develop code test scenarios and exercises that will test new units of code after they have been written. These test cases form what becomes the test bucket. Before a new version of a software product is released, the old test cases are run against the new version to make sure that all the old capabilities still work. The reason they might not work is because changing or adding new code to a program can easily introduce errors into code that is not intended to be changed, and thus will obscure test results. Recursive regression testing is a must ! Acceptance Testing In conjunction to defect testing is acceptance testing. This designed sets of tests means running an agreed set of sets with an agreed output. These should demonstrate that the code does an agreed task well enough for the programmer and client to be convinced that the program performs the task well enough. In the commercial world, the acceptance tests are part of the contract for defining what the customer insists on before actual monetary finance for the software has been transacted. Structural Prototyping Prototyping of this nature is relatively simple. Structural prototyping is a stripped down version of a program that will show a structure, in skeleton form, of the complete version. All major aspects of the code are written but routines and sub programs are written only as stubs, that is comments/statements within the program that show the programmer that the actual routine has been called or executed. Maintaining effective code that is easily interpreted by the programmer and other developers, and allows further extensions of the program with easy, follows three code cliche` characteristics: * understandibility * adaptibility * cohesion Understandibility means that programs that are easier to understand are considered to be better designed that ones that do the same task but are harder to understand. A key to developing stable code is a good functional prototype that allows the general idea of the program to be observed before code practise takes place. It may also be necessary to note that better code is clear and neatly presented - that is spaced out where necessary with comments throughout the program to let the reader understand what internal working is going on. Adaptibility effectively means how easy it is to modify areas of the code to perform alternate tasks. This is directly linked to understandibility. The more understand the code, the easier the adaptibility. Cohesion is a routine or sub program that does one clear task, apparent to the reader and programmer. A well-defined task should give a clear indication of what the program is intended to do, this includes well chosen names for variable, constants, headers etc. As small as this concept may seem, it allows any coder to pick up the source and be able to quickly scan through and understand what the program is about. Signs to observe Whether you are checking the source for bugs or testing the binary/executable file for presence of flaws all of the above tests need to be considered and exercised. It is most common that bugs present themselves in bounday structure conditions. When designing a set of tests, it can not be stressed enough that boundaries need to be checked on either side of their 'walls'. Other recent flaws that should checked before releasing a beta release of a product, is the current malpractice of dealing with format control bugs, such as %s. The programmer must employ capable input routines/parameters to correctly deal with user supplied input, ensuring all possible scenarios have been considered before adopting the most suitable code to perform the given command. This includes identifiers themselves, such as avoiding use of getenv() , strcpy(), sprint() wherever possible, in exchange for more secure methods like strncpy() or snprintf(); the 'n' refers to the number of bytes allowed to be copied to a buffer. Avoid common mistakes often used by sloppy programmers to get user supplied environment variables from the terminal or environment. Establish your own method of setting or checking the environment make it unsusceptable to malformed data that could possible lead to unexpected outcomes, such as spawning a shell - a definite security risk, one that is often observed in many UNIX environments. (Early ZGV [console graphics viewer] programs were always victim to getenv('HOME') problems, of this nature.) Another probability of using acceptance testing to expose bug flaws, is using the proper data set to be inputted to the program but sending extensive data to a particular input command, such as sending 1024 bytes to a 512 byte buffer, will cause an overflow, while the acceptance test of sending 256 bytes to the terminal would be deemed acceptable, and will pass this test, the 1024 byte buffer would not. Sometimes when a program appears to have decreased it's efficiently level in terms of speed or processing of the actual data may be directly linked to a heap or stack overflow, caused by corrupted data being entered. It is at this stage where vital tests need to be conducted by the bug tester for the presence of bugs. Let's take a real life example of a program that I exposed with a flaw not long ago. - WinSMTPD mailer/pop3d daemon. Version 1.06f and 2.X. After acceptance testing this program everything worked well. All the desired tasks of the program were fulfilled and the smptd/pop3d server performed their tasks efficiently. Now, here is where defect testing comes in to play. Firstly to start an SMTP transaction, the client needs to send a 'HELO %s' call, where the format string "%s" is your hostname. WinSTMPD only allows a fixed buffer of 170 bytes before the expected output becomes unexpected. So by sending 150 bytes after the HELO field, the program noticeably paused before proceeding to function as normal. This tells us one of two possibilities. 1. The program has been coded poorly in terms of speed, OR 2. The program does not deal with boundary tests, with exceeding data being entered. As it turns out WinSTMPD was vulnerable to a stack overflow, by sending 170+ bytes to the HELO field. The unexpected output for the program was: WINSMTP caused a general protection fault in module WINSMTP.EXE at 0003:00002359. Registers: EAX=461e0001 CS=42e7 EIP=00002359 EFLGS=00000246 EBX=00807fe0 SS=4207 ESP=00007e36 EBP=00004141 ECX=00010283 DS=4207 ESI=0000544c FS=05c7 EDX=58600000 ES=461e EDI=00001547 GS=0000 Bytes at CS:EIP: cb 49 73 49 63 6f 6e 69 63 00 00 58 4c 6f 63 00 Stack dump: 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 Obviously this isn't what the programmer had in mind when performing an SMTP transaction. The 41414141 that appears on the stack is "A" binary value, which I had filled the buffer with. From this general protection fault, we as bug testers and programmers, are able to ascertain that this 16-bit program (judged by the leading 0's within the memory registers) have successfully overwritten the EBP register (+4 bytes for EIP), and as ethical programmers/bug testers that's all we need to know to fix or patch this bug. If there were say, an unethical hacker out there, loading up the stack with malicious data could effectively allow arbitrary code to be executed from the stack, and anything is possible from there. This is why it is important to test for bugs, and especially check the boundaries and data that is allowed to be inputted by the client/user. Although I approve of people writing 'proof of concept' exploits to expose the existence of a bug in a program, as I am a firm believer in full disclosure and vouchee for open source, it not ethical or urged to run these scripts without the direct consent of those people(s) you are exploiting. (POC exploits are necessary in whitehat hacker security firms to prove and demonstrate a code flaw.) Data sets and tests computed to the program/system are effectively system calls executed by active processes. These include different kinds of programs (Ex. programs that run as daemons and those that do not), programs that vary widely in their size and complexity, and different purposes of programs. Spawns or fork()'s by applications are therefore tested when the maximum process limit is exhausted by various resource-depleting exploits, this too needs to be prepared for when making a heavily used program. Normal computed data can be "synthetic" or "live". Synthetic traces are collected in production environments by running a prepared script, often called a driver program; the program options are chosen solely for the purpose of exercising the program (acceptance testing), and not to meet any real user's requests. Live normal data traces of programs are computed during normal usage of a production computer system (manual specificities of code testing; boundary testing). Both these methods are often put to test when processing en-mass software applications. 5. Bug Discovery ? So, you think you've found a bug ? then read on, here's what to do next. Alerting the vendor If the client or user has somehow stumbled on a logical error, or security vulenrability with in the tested (beta/stable) product, it is then necessary to inform the bug immediately to the vendor. More of this criterion was discussed in the 'Development Goals' subtopic, but visually displaying a practical advisory was not. The bug report should include most, if not all of the following information, generally in brief conceptual form. * bug synopsis (brief paragragh explaining the vulnerability) * description (the sequential steps taken to produce the proposed bug) * attachments (any revelant materials, such as: core dumps, message logs) * environment (system specifications and conditions used to test the bug) * contact info (how the vendor can contact you for further comments/queries) Alerting clients If the proposed bug has been accepted by the vendors as being a risk or vulnerability that could lead to such things as network/software penetration, increased privileges, excessive system resource usage, the vendor should then issue their own advisory publically, through use contact by mailing-lists, the vendor's URL, and/or by e-mail. It is now the responsbility of the programmer/manufacturer to maintain sure fire advice for the client to patch their software/system so the vulnerability becomes non-existant. The Advisory after such an event has occured, should include the following information: * Date (date of advisory release) * Affected systems (listing of the environment/setting in which the bug may occur) * Description (similar to clients description, but with more technical inside info.) * Patch (URL of patch or description of how to correct the bug) * Contact (how clients can contact the vendor for more info, phone, e-mail, URL.) Having the above communicational link creates a much more friendly atmosphere between users and vendors, which in effect helps forward software development into becoming a more stable and reliable community - one that excels in safe security practices. 6. Final Note I made a generic resource kit named reskit.tgz earlier this year. Basically these are just 7 skeletal template scripts coded in perl for various purposes of testing network services on a Linux/Unix environment; such as malformed HTTP 'GET' requests, multiple thread connections, random data streaming, ICMP error generator etc. Mainly Used as a research and development kit to help spot bugs more easily, particularly on server/router applications/software, feel free to expand on them. These scripts can be downloaded in tarball form from: http://dethy.synnergy.net/reskit.tar Comments Main editorial by dethy [ dethy@synnergy.net | www.synnergy.net ]