THE IMPORTANCE OF BUG TESTING 

		    	   - Editorial by dethy 

			_________________________________
		
			  1. Software Development Stages
			    i. What defines beta?
			   ii. What defines alpha?
			  iii. What defines stable

			  2. Why bug test?
			   i. Importance to Client
			  ii. Importance to Programmer
	
			  3. Development Goals
			   i. Software testing vendor's goals
			  ii. public's goal as bug testers

			  4. Software Testing Strategies
			     i. Functional Prototypes
			    ii. Designing Test Sets
			   iii. Defect Testing
			    iv. Acceptance Testing
			     v. Structural Prototypes
	 		   vi. Signs to observe

			  5. Bug discovery ?
			   i. Alerting the Vendor
			  ii. Alerting Clients

			  6. Final Note
			_________________________________

 1. Software Development Stages

  The following whitepaper discusses 'The Importance of Bug Testing' with respect to client
  and vendor environments. Various responsibilities are placed on either side of product
  development, and it is necessary to understand the reasons behind practising secure code 
  and ethical loyalty.

  In the Real World systems (hardware or software) often go through two stages of release 
  testing: 

   * Alpha (in-house)
   * Beta  (out-house)

 What defines alpha ?

  The term 'alpha' was adopted from the Greek number '1', in the early 1960's for computer 
  terminology used to describe product cycle checkpoint, first used at IBM, and thus 
  advanced to become a standard throughout the computer industry. 

  This first phase of testing of a product/system is apart of the software development 
  process. 
  The Alpha development stage includes unit testing, component testing, and system testing, 
  but originally was endowed to feasibility and manufacturability evaluation done before 
  any commitment to design and development.

What defines beta ?

  The term 'beta' was adopted from the Greek number '2', again with IBM using this 
  terminology to categorise product development.

  In software development, a beta test is the second phase of software testing in which a 
  sampling of the intended audience tries the product out. Beta testing can be considered 
  "pre-release testing" of a product/system, in which there is a potential for code 
  flaws/logic errors distributed throughout the program. However, in recent years Beta test
  versions of software have become distributed to a wide audience on the World Wide Web 
  partly to give the program a "real-world" test and partly to improve the functionality by 
  clients voicing their approval/(dis)satisifaction/comments about the product/system.

  Often it is the case that a product may stay in the Beta stage for several years, and 
  could be considered stable, but it is the choice of the product's vendor to reside the
  product's stage in Beta until furthermore rare bugs have been ironed out.

  Releasing an advisory could take place after testing a beta release of a product. It is
  a general rule of thumb that most advisories come out as product testing of more stable 
  applications, but it is know that in some cases an advisory for beta releases is
  necessary. Sometimes, it is the only way to get vendors to make a patch for their
  security risks involved in bug vulnerability discovery, additionally beta releases may
  stay in the phase for months or years, as was previously discussed. Releasing advisories
  for alpha products is somewhat non-sensical. Alpha releases are known to be highly 
  unstable and should not be run without caution and hesitancy; most of the bugs are found
  at this stage, so releasing bug alerts publically at this time becomes relatively trivial.

 What defines stable ?

  Stable is the final outcome of the developed product. After the Beta product has been 
  decided to be fit for the task, with most of the known bugs fixed or patched, and the 
  product successfully fulfills all requirements of functionality, it then termed, 'stable'.

  Product testing has been utilised with most (if not all) flaws worked out, and as much
  code optimisation has been implemented as possible for the end software application. This
  is the accepted version of the product that is capable of handling data correctly, as 
  needed. This phase attempts to please customers/clients to satisfy their needs as 
  consumers and users of the product.

 2. Why bug test ?

  It is often debated why people must test programs, as ethical people we do not have to, 
  but the world is not entirely full of ethical people who ensure correct data is computed 
  into a system, that is why safe practises need to be developed. The only way for this to 
  take place is through bug testing. There are two categories that effect both the client 
  and the programmer, each have different needs and wants in terms of the importance of bug 
  testing.

   * Importance to Client
   * Importance to Programmer

  In the client's perspective, having a stable program that is guaranteed to performs it's 
  desired task is not only a reflection of the program but also a reflection of the company
  itself. Poor products shines the light dimly on the company, that is why a solid and well
  tested product needs to be entrusted through bug testing, before manufacturing takes 
  place.

  Whilst it is not always known by management about product flaws, company directors assume
  that every function works smoothly without defects at all. However, experience shows that 
  no product/system can be deemed completely secure without controversy. There will always 
  be existence of bugs in program, whether they are found or not is another question. On 
  the other hand, open source software is much easier to spot bugs and code flaws, but 
  active security checks through the public help create a much more stable and operable 
  program. This is one of the reasons why Microsoft (c) products fail consistently when it 
  comes to testing; their products are not open source, and therefore it is much harder to 
  create a secure and flexible program without aid of the programming community to help 
  optimise code.

  The importance to the client, purchaser of the software, is without doubt a key aspect in
  performing their daily tasks successfully. If the program was vulnerable to overflows, 
  lack of input checks, or even lack of encryption, the program would quickly become known 
  for its unstableness, and product sales will drop dramatically. Customers will purchase 
  alternative products available that perform the same task, that have been carefully 
  checked by multiple tests, as will be seen in the testing section of this document.

  There is a high level of ethics involved when the programmer is contracted to develop a 
  program. The programmer is the top of the chain for importance in testing and coding a 
  proficient software application. He/she is responsible for ensuring all functions of the 
  program work, and work efficiently; code optimisation should be at its peak, with 
  security functions in check. Better programs are known to have been thoroughly tested
  with all sorts of data sets been properly dealt with from within the program, operating 
  systems like Linux are tested everyday by programmers, and hackers alike. Yes, security 
  problems do exist in this environment but most have now been patched or fixed, pushing 
  towards one of the most stable systems currently around.
  
  Sloppy programmers will not care about ethics, and will simply code the program to 
  minimally function with all it's client side requirements implemented. Some programs 
  deem financial security more important than ethical security - becareful of those whom 
  you contract to fulfil your programming requirements. 

 3. Development Goals

  Goals should be adopted by programmers to ensure software quality assurance, but the 
  customer has a responsibility to communicate to the programmer once a bug has been found.

 Software testing vendors goals

  The most important primary goal of a programmer is to actually complete a working program
  that serves purpose to client-side requirements. Once this stage has been reached, the 
  more advanced and less known methods should be then put into practise. Added 
  functionality such as:

   * security features
   * help support
   * contact addresses

  Added security features is a must, and assures code quality to be evident within a 
  program. Use of secure functions and methodologies/implementations should at this stage 
  make themselves known. This is where a gap between sloppy and aware programmers becomes 
  apparent. All programs should aim for a level of code quality by utilising the secure 
  function calls within their specified programming language which helps create a more 
  reliable and flexible program. Of course one of the only certain ways to determine a 
  programs reliability is through testing. Testing focuses on the need for rapid feedback
  and the evolving nature of the program under test, this is where clients/customers come 
  into the picture.

 Public's goal as bug testers

  Although programmers bare the most responsibility in terms of code reliance, clients and
  customers alike need to be prepared to communicate with software engineers if a bug or 
  flaw is observed in a program. If the expected output is different to what is given, it's
  time to get in contact by means of a bug discussion list, email, phone - whatever, but be
  sure to advise the correct people. Especially if the bug could lead to increased 
  privileges, it most important to inform product vendors before the public know about it. 
  This gives time for the vendors to write patches/advisories for their clients, before any
  harmful damage could be used against their products.

  Testing software is always a step in the right direction. Effective bug testing by 
  customers/clients will force the programmer to improve code quality and security in 
  future products, that is why we must tolerate and thank the software task forces out 
  there, that make software vulnerability's public, such a bug advocacy is BUGTRAQ, 
  http://www.securityfocus.com.

  When reporting a bug, always be sure you can reproduce it, always include detailed 
  descriptions of *exactly* how the bug was found, and the type of system that you tested 
  the software application on. The more information the better, but be sure not to obscure 
  of obfuscate the description - get as much as basic facts down as possible. In particular
  segmentation faults generally cause core dumps (a memory image of the terminated process 
  when any of a variety of errors occur), hold vasts amounts of information for the 
  programmer to locate where the bug took place. Remember full disclosure is bliss.

 4. Software Testing Strategies

  Developing a program or system effectively needs to be thoroughly thought out before any 
  raw code is actually written down. One of the most important methods of establishing 
  functional requirements is through a storyboard, as a means of a prototype. Prototypes 
  may consist of a storyboard, which is a sequence and series of screens, showing the 
  end-user a typical scenario of using the program/system.

 Functional prototypes

  This is one of the most useful methods for making sure the programmer understands just 
  what a program is intended to do. A functional prototype is a very limited version of the 
  final program, it gives some idea of the appearance of the final product, but with a lot 
  of functions missing. Displaying a simple storyboard to a client or bug tester is 
  necessary, as they will be able to comment on whether the 'expected input' takes the 
  'observed output' resulting from running the program. This will also force the programmer
  to think through many of the details of what the program is meant to do.

 Designing Test Sets

  Creating workable and effective sets of tests is intellectually challenging. Testing can 
  almost never be exhaustive, and it may even be possible that not all programming flaws 
  are evaluated even after very stringent testing has been covered. In the real '
  commercial' world, a significant source of program defects is due to people running tests
  and not checking the results carefully. This means that the programmers actually run 
  tests but do not take enough care in reviewing the results to see that the tests showed
  unexpected flaws in the programs.

  Tests must be convincing and must demonstrate a successful performance of the program. In 
  a commercial setting there are many methodologies used to produce a designed set of tests.
  One of the necessary tests that should be first evaluated is the main function of the 
  program. This means deciding on a set of tests that enable you (the programmer) to see if
  the code achieves its desired outcome.

  All conditions of the program need to be undoubtedly checked, statements like:

   * case, loops, if then else structures
   * boundary conditions [Ex. The pseudocode: IF $i<100 THEN .. - make sure that 99,100,101
			 values for $i are properly dealt with]
   * exercise all parts of the code [ Ex. designing a rigorous set of tests ]

  Naturally sets of tests will assess the same parts of the program known as 'equivalence 
  partitioning' for tests, although this may seem duplicitous, it is standard of economical
  testing. Perhaps part of the code works in one scenario, but not another - this needs to 
  be carefully checked. 
  The first thing a programmer needs to understand is that testing will demonstrate the 
  presence of bugs, but it will not demonstrate the absence of bugs.  Semantic errors fall 
  into this category, that is, errors in the logic of the program, that the compiler or 
  interpreter is unable to help you with. 

  Testings falls into two broad categories:
 
   * defect testing
   * acceptance testing

 Defect Testing

  This type of test tries to detect all the defects the program may have. All parts of the
  program should be tested, and if the programmer feels that one part of the code may not
  properly deal with unexpected input, more rigorous tests should be performed on that area
  of the code. One key point to remember from this is that "nobody knows a program better, 
  than the programmer himself" - the programmer will know the area of the program that is 
  most likely defective, such that a designed set of tests should be practiced before a 
  Beta release is produced. Stemming from defect testing is 'regression testing'.

  Regression testing is the process of testing changes within the programming environment 
  to programs to make sure that the older program still works with the new implemented 
  changes. Regression testing is a normal part of the program development process and, in 
  the commercial world is performed by code testing specialists. Test department coders 
  develop code test  scenarios and exercises that will test new units of code after they 
  have been written. These test cases form what becomes the test bucket. Before a new 
  version of a software product is released, the old test cases are run against the new 
  version to make sure that all the old capabilities still work. The reason they might not 
  work is because changing or adding new code to a program can easily introduce errors into
  code that is not intended to be changed, and thus will obscure test results. Recursive 
  regression testing is a must !

 Acceptance Testing

  In conjunction to defect testing is acceptance testing. This designed sets of tests means 
  running an agreed set of sets with an agreed output. These should demonstrate that the 
  code does an agreed task well enough for the programmer and client to be convinced that 
  the program performs the task well enough. In the commercial world, the acceptance tests
  are part of the contract for defining what the customer insists on before actual monetary
  finance for the software has been transacted.

 Structural Prototyping
 
  Prototyping of this nature is relatively simple. Structural prototyping is a stripped 
  down version of a program that will show a structure, in skeleton form, of the complete 
  version. All major aspects of the code are written but routines and sub programs are 
  written only as stubs, that is comments/statements within the program that show the 
  programmer that the actual routine has been called or executed.

  Maintaining effective code that is easily interpreted by the programmer and other 
  developers, and allows further extensions of the program with easy, follows three code 
  cliche` characteristics:

   * understandibility
   * adaptibility
   * cohesion

  Understandibility means that programs that are easier to understand are considered to be
  better designed that ones that do the same task but are harder to understand. A key to 
  developing stable code is a good functional prototype that allows the general idea of the
  program to be observed before code practise takes place. It may also be necessary to note
  that better code is clear and neatly presented - that is spaced out where necessary with 
  comments throughout the program to let the reader understand what internal working is 
  going on.

  Adaptibility effectively means how easy it is to modify areas of the code to perform 
  alternate tasks. This is directly linked to understandibility. The more understand the 
  code, the easier the adaptibility.

  Cohesion is a routine or sub program that does one clear task, apparent to the reader and
  programmer. A well-defined task should give a clear indication of what the program is 
  intended to do, this includes well chosen names for variable, constants, headers etc. As
  small as this concept may seem, it allows any coder to pick up the source and be able to
  quickly scan through and understand what the program is about.
 
 Signs to observe

  Whether you are checking the source for bugs or testing the binary/executable file for 
  presence of flaws all of the above tests need to be considered and exercised. It is most
  common that bugs present themselves in bounday structure conditions. When designing a set
  of tests, it can not be stressed enough that boundaries need to be checked on either side
  of their 'walls'. Other recent flaws that should checked before releasing a beta release
  of a product, is the current malpractice of dealing with format control bugs, such as %s.
  The programmer must employ capable input routines/parameters to correctly deal with user
  supplied input, ensuring all possible scenarios have been considered before adopting the
  most suitable code to perform the given command. This includes identifiers themselves, 
  such as avoiding use of getenv() , strcpy(), sprint() wherever possible, in exchange for 
  more secure methods like strncpy() or snprintf(); the 'n' refers to the number of bytes
  allowed to be copied to a buffer. Avoid common mistakes often used by sloppy programmers 
  to get user supplied environment variables from the terminal or environment. Establish 
  your own method of setting or checking the environment make it unsusceptable to malformed
  data that could possible lead to unexpected outcomes, such as spawning a shell - a 
  definite security risk, one that is often observed in many UNIX environments. (Early ZGV 
  [console graphics viewer] programs were always victim to getenv('HOME') problems, of this
  nature.)
  
  Another probability of using acceptance testing to expose bug flaws, is using the proper
  data set to be inputted to the program but sending extensive data to a particular input 
  command, such as sending 1024 bytes to a 512 byte buffer, will cause an overflow, while 
  the acceptance test of sending 256 bytes to the terminal would be deemed acceptable, and 
  will pass this test, the 1024 byte buffer would not.  

  Sometimes when a program appears to have decreased it's efficiently level in terms of 
  speed or processing of the actual data may be directly linked to a heap or stack 
  overflow, caused by corrupted data being entered. It is at this stage where vital tests 
  need to be conducted by the bug tester for the presence of bugs.

  Let's take a real life example of a program that I exposed with a flaw not long ago.
  -   WinSMTPD mailer/pop3d daemon. Version 1.06f and 2.X.
 
  After acceptance testing this program everything worked well. All the desired tasks of 
  the program were fulfilled and the smptd/pop3d server performed their tasks efficiently.
  Now, here is where defect testing comes in to play.
  
  Firstly to start an SMTP transaction, the client needs to send a 'HELO %s' call, where 
  the format string "%s" is your hostname. WinSTMPD only allows a fixed buffer of 170 bytes 
  before the expected output becomes unexpected. So by sending 150 bytes after the HELO 
  field, the program noticeably paused before proceeding to function as normal. This tells
  us one of two possibilities.

  1. The program has been coded poorly in terms of speed,  OR
  2. The program does not deal with boundary tests, with exceeding data being entered.

  As it turns out WinSTMPD was vulnerable to a stack overflow, by sending 170+ bytes to the 
  HELO field. The unexpected output for the program was:

	WINSMTP caused a general protection fault
	in module WINSMTP.EXE at 0003:00002359.
	Registers:
	EAX=461e0001 CS=42e7 EIP=00002359 EFLGS=00000246
	EBX=00807fe0 SS=4207 ESP=00007e36 EBP=00004141
	ECX=00010283 DS=4207 ESI=0000544c FS=05c7
	EDX=58600000 ES=461e EDI=00001547 GS=0000
	Bytes at CS:EIP:
	cb 49 73 49 63 6f 6e 69 63 00 00 58 4c 6f 63 00 
	Stack dump:
	41414141 41414141 41414141 41414141 41414141 41414141 
	41414141 41414141 41414141 41414141 41414141 41414141 
	41414141 41414141 41414141 41414141

  Obviously this isn't what the programmer had in mind when performing an SMTP transaction.
  The 41414141 that appears on the stack is "A" binary value, which I had filled the buffer
  with. From this general protection fault, we as bug testers and programmers, are able to 
  ascertain that this 16-bit program (judged by the leading 0's within the memory registers)
  have successfully overwritten the EBP register (+4 bytes for EIP), and as ethical 
  programmers/bug testers that's all we need to know to fix or patch this bug. If there 
  were say, an unethical hacker out there, loading up the stack with malicious data could 
  effectively allow arbitrary code to be executed from the stack, and anything is possible 
  from there. This is why it is important to test for bugs, and especially check the 
  boundaries and data that is allowed to be inputted by the client/user. 
  Although I approve of people writing 'proof of concept' exploits to expose the existence 
  of a bug in a program, as I am a firm believer in full disclosure and vouchee for open 
  source, it not ethical or urged to run these scripts without the direct consent of those 
  people(s) you are exploiting. (POC exploits are necessary in whitehat hacker security 
  firms to prove and demonstrate a code flaw.)

  Data sets and tests computed to the program/system are effectively system calls executed 
  by active processes. 
  These include different kinds of programs (Ex. programs that run as daemons and those 
  that do not), programs that vary widely in their size and complexity, and different 
  purposes of programs. Spawns or fork()'s by applications are therefore tested when the 
  maximum process limit is exhausted by various resource-depleting exploits, this too needs
  to be prepared for when making a heavily used program. Normal computed data can be 
  "synthetic" or "live". Synthetic traces are collected in production environments by 
  running a prepared script, often called a driver program; the program options are chosen
  solely for the purpose of exercising the program (acceptance testing), and not to meet 
  any real user's requests. Live normal data traces of programs are computed during normal 
  usage of a production computer system (manual specificities of code testing; boundary 
  testing). Both these methods are often put to test when processing en-mass software 
  applications.

 5. Bug Discovery ?

  So, you think you've found a bug ? then read on, here's what to do next.

 Alerting the vendor

  If the client or user has somehow stumbled on a logical error, or security vulenrability
  with in the tested (beta/stable) product, it is then necessary to inform the bug
  immediately to the vendor. More of this criterion was discussed in the 'Development 
  Goals' subtopic, but visually displaying a practical advisory was not. The bug report
  should include most, if not all of the following information, generally in brief 
  conceptual form.

   * bug synopsis  (brief paragragh explaining the vulnerability)
   * description   (the sequential steps taken to produce the proposed bug)
   * attachments   (any revelant materials, such as: core dumps, message logs)
   * environment   (system specifications and conditions used to test the bug)
   * contact info  (how the vendor can contact you for further comments/queries)

 Alerting clients  

  If the proposed bug has been accepted by the vendors as being a risk or vulnerability
  that could lead to such things as network/software penetration, increased privileges,
  excessive system resource usage, the vendor should then issue their own advisory
  publically, through use contact by mailing-lists, the vendor's URL, and/or by e-mail.
  It is now the responsbility of the programmer/manufacturer to maintain sure fire
  advice for the client to patch their software/system so the vulnerability becomes
  non-existant. The Advisory after such an event has occured, should include the
  following information:
 
   * Date     (date of advisory release)
   * Affected systems (listing of the environment/setting in which the bug may occur)
   * Description (similar to clients description, but with more technical inside info.)
   * Patch    (URL of patch or description of how to correct the bug)
   * Contact  (how clients can contact the vendor for more info, phone, e-mail, URL.)
  
  Having the above communicational link creates a much more friendly atmosphere between
  users and vendors, which in effect helps forward software development into becoming a more
  stable and reliable community - one that excels in safe security practices.

 6. Final Note

  I made a generic resource kit named reskit.tgz earlier this year. Basically these are 
  just 7 skeletal template scripts coded in perl for various purposes of testing network 
  services on a Linux/Unix environment; such as malformed HTTP 'GET' requests, multiple 
  thread connections, random data streaming, ICMP error generator etc. Mainly Used as a 
  research and development kit to help spot bugs more easily, particularly on server/router
  applications/software, feel free to expand on them.
  These scripts can be downloaded in tarball form from: http://dethy.synnergy.net/reskit.tar

 Comments
  Main editorial by dethy [ dethy@synnergy.net | www.synnergy.net ]