It is essential for anyone who wants to be considered a professional in the areas of software to know several languages and several programming paradigms . . . it's not a good idea to know just C++, let alone to know just a single-paradigm language. Much of the inspiration in good programming comes from having learned and appreciated several programming styles. --Bjarne Stroustrup, C++ author

Chapter 1: P R O G R A M M I N G L A N G U A G E S C O N C E P T S

This course covers the concepts of high-level programming languages with a focus is design criteria and evaluation. The best way to learn a language concept is to code in it. Topics covered:

syntax & semantics
variables, identifiers, scope and lifetime
data types
expressions and assignment statements
control statements
functions, subroutines & parameter passing
object oriented constructs
implementation

Why learn programming language concepts?

understand programming language evolution
be a better coder
evaluate and select best language for a particular task
learn new languages more quickly
better understand language implementation

Programming languages are designed to solve problems in a particular programming domain.

Scientific applications (large number of floating point computations - Fortran)
Business applications (reports, decimal numbers and characters - COBOL)
Artificial intelligence (symbols rather than numbers - Lisp and Prolog)
Systems programming (efficient code, OS kernels, device drivers - C)
Text processing (process line by line, manipulate strings, pattern matching - SNOBOL and awk)
Scripting (quick, easy - Perl, Python, Ruby)
Web (client-side scripting - JavaScript; markup - XHTML, server-side scripting - PHP, Python)
General Purpose (C++, Java, Visual Basic, Python)
Educational (Pascal, Scheme)

l a n g u a g e e v a l u a t i o n c r i t e r i a

What makes a language good? Or better than another language?

bubble-sort example

READABILITY

Readability is the ease with which programs can be read and understood given the provisions of the language. Given well-written source how easy is it to understand the meaning of the code?

Overall simplicity - support for static modularity and encapsulation via enforcement of scope and lifetime rules; restrictions on global variables; manageable set of features and constructs -- too many features mean programmers learn different subsets of language, reducing readability; little feature multiplicity - how many ways of doing the same operation; minimal operator overloading -- user-defined overloading can reduce readability)
Orthogonality - having a relatively small set of primitives and a consistent way of combining them - where every possible combination is legal with no exceptions
Control statements - support for all widely used control structures - while, if, for, repeat, etc.
Data types and structures - support for data types that improve readability: boolean, strings, decimal, associative arrays, etc.

Syntax - support for expressive identifiers; consistent method for ending compound statements but, unlike C's "{ }", a rich set of special words such as Ada's "end if" and "end loop" in this Ada code

begin
      if Container'Length > 0 then
         loop
            Mid := (Low + High)/ 2;
            if Value < Container (Mid)then
               exit when Low = Mid;
               High := Mid - 1;
            elsif Container (Mid)< Value then
               exit when High = Mid;
               Low := Mid + 1;
            else
               return Mid;
            end if;
         end loop;
      end if;
      raise Not_Found;
   end Search;

Form and meaning (consistent constructs; context-free keywords - unlike "static" in C, support for programming conventions; force indentation as in Python and not as in obfuscated C)


    #  In Python,code blocks are defined by : 
    #  consistent indentation is MANDATORY 
    def sort(list):
      for first in range(len(list)):
        minimum = first
        for index in range(first,len(list)):
          if list[minimum] > list[index]:
            minimum = index
        temp = list[first]
        list[first] = list[minimum]
        list[minimum] = temp
   
 
    // code in C without indentation will compile but good luck reading it
    sort(list){
    int size = 5, first, minimum, temp, index;
    int list[size];
    for (first = 0; first < size; first++){
    minimum = first;
    for (index = first; index < size; index++)
    if (list[minimum] > list[index]);
    minimum = index; temp = list[first]; list[first] = list[minimum];
    list[minimum] = temp; } }

Some languages are easy to write but difficult to read. Regexes (a language albeit not a programming language) demonstrate this problem:

 
^.*[0-9a-z]+?\([a-zA-Z}*$\)    # difficult to proofread and debug

Lisp is equally easy to write but difficult to read (from H2wUc):


(lambda (*<8-]= *<8-[= ) (or *<8-]= *<8-[= ))

(defun :-] (<) (= < 2))

(defun !(!)(if(and(funcall(lambda(!)(if(and '(< 0)(< ! 2))1 nil))(1+ !))
(not(null '(lambda(!)(if(< 1 !)t nil)))))1(* !(!(1- !)))))

WRITABILITY

Writability is a measure of coding efficiency. It is the ease at which a coder can learn the language in order to write code and how long it takes to write code once you know the language.

A language can be easy to read but difficult to write or easy to write but difficult to read
Related to the number of keywords in the language (too many keywords means language may be more difficult to learn but too few means the language is not expressive); C is notoriously lean - it has under 25 keywords
Code reusability and extensibility - reusing code means quicker coding
Simplicity and orthogonality (few constructs, small number of primitives, small set of rules for combining them with no exceptions; non-orthogonality results in exceptions to the rules)
Support for abstraction - the ability to define and use complex structures for operations that allow details to be ignored - C++ String and STL classes, Perl's array operations
Expressiveness - a set of relatively convenient ways of specifying operations - ability to express conceptual abstractions directly and simply; Ex. Scheme: Perl: ($a,$b)= swap($a,$b);

RELIABILITY

Reliability is conformance to specification under all circumstances without fault.

Faults (fatal crashes, freezes, or erroneous results; solid, intermittent, or transient)
Fault prevention (goal is to catch most faults before run-time; static type checking; use of const in C++ to protect data)
Exception handling (intercept run-time errors and gracefully recover - not core dump)
Aliasing (two or more references to the same memory location; very problematic; cross-linked pointers in C/C++)
Pointers (generally problematic - buffer overruns, illegal access, etc.)
Readability and writability (natural, intuitive ways of expressing an algorithm)

COST

Total cost of a language throughout entire language lifecycle from development, testing and maintenance.

Initial cost to develop and implement the language (Ada vs. Ruby)
Cost to train programmers to use language (writability, widely used?)
Portability (cost to migrate to new OS and/or new hardware platforms)
Cost to write programs (coding rate; fitness for particular application)
Cost to compile, execute and maintain code (proprietary or GPL ? compiler and runtime license fees; availability and reliability of free compilers; access to good documentation and support)
Reliability (poor product reliability leads to high maintenance costs)

OTHER CRITERIA

Portability (the ease of porting from one platform to another; somewhat compiler specific; standardized library routines across multiple hardware platforms - Java)
Generality (applicability to a wide range of applications and problem domains - Java, C++, Python)
Well-definedness (standards; completeness and precision of the language's official definition)
Efficiency (compiled / interpreted runtime performance, library size, dynamic and static memory management, code optimization capabilities; benchmark testing)
Diagnosability (ease of testing and finding faults; compiler and runtime error messages, debugging capabilities; fault coverage - % of faults found)
Maintainability (ease of fixing faults and modifying code)
Dynamic Memory Management (automatic; no segmentation faults or memory leaks - C/C++ poor)
Security (protection against unauthorized intrusions or attacks; ability to prevent illegal code or data modification - C is poor - buffer overflow problem)
Network Support (support for network features - RPCs, concurrency; ease of writing network applications)

Prescelt's Comparison Study

C/C++, Java, Perl, Python, Rexx, and Tcl
experiment uses a group of programmers to reduce variation among individual programmers
same set of requirements implemented in each language
Prescelt's criteria: runtime efficiency, memory consumption, source text length, comment density, program structure, reliability, amount of effort to write the program
LOC/hr (coding rate)is poor metric for measuring the writability of a language - coding rate is more dependent on the programmer and less dependent of the language; good coders write code faster regardless of the language.
scripting languages are more productive than conventional languages

"differences between languages tend to be smaller than differences between programmers.."; poor code can be written in any language

       int things = 0
       const sTuff = 5;
       int Stuff[sTuff] = things,
           stufF[sTuff] = things,
           STUFF[sTuff] = things;

       for (int stuFf = 0; stuFf < sTuff; stuFf++){
          for (int stuFF = 0; stuFF < stuFf; stufF++){
             (Stuff[stuFf] * stufF[stuFF])+ (Stuff[stuFF ] * stuf[stuFf]);

l a n g u a g e d e s i g n

Influences on Language Design

#1: Computer Architecture (imperative languages model von Neumann architecture - variables model memory cells; assignment statements model memory storage; iteration most common control mechanism)
parallelism attempts to overcome von Neumann bottleneck: 5-stage pipeline - InstructionFetch InstructionDecode EXecute MEMory WriteBack - and the use of cache have not significantly impacted language design
High-level Programming Methodology - functional, object oriented, structured programming - has only recently impacted language design

Evolution of Language Design

1950s and early 1960s: Simple applications; maximize machine efficiency
Late 1960s - early 1970s: need to maximize programmer efficiency - readability, better control structures; structured programming: top-down design and step-wise refinement; added stronger type checking and sophisticated control structures
Late 1970s: move from process-oriented to data-oriented - data abstraction
Middle 1980s: Object-oriented programming Data abstraction + inheritance + polymorphism

Language Classifications

Programming languages are sometimes categorized into "generations":

first generation - machine code; second generation - assembly;
third generation - C, Fortran; fourth generation - C++, Java;
fifth generation - logic and contraint languages such as prolog

Languages are more importantly classified into one of several primary design paradigms:

Imperative (Central features are variables, assignment statements, and iteration; Statements are executed sequentially; a program evolves from an initial state to a final state through a series of state changes; Criticism: von Neumann bottleneck Examples: everything but logic and functional languages)
Declarative (the opposite of imperative - any language that does NOT follow a step by step algorithm ; examples are logic and functional languages)
Functional (No traditional variables, no assignments statements, no iteration; Main means of making computations is by applying functions to given parameters; No side effects - any changes to the state of the program; Examples: Lisp, Scheme)
Procedural (A paradigm based on the concept of a procedure call -- AKA: subprogram, subroutine, function, method, routine; structured, modular, scoping (!! reduces spaghetti code !!); All modern imperative languages are procedural - early Basic and Fortran was not procedural - hence term is often replaced with imperative
Logic (Based on symbolic logic and logical inferencing; Data are facts; Rule-based - rules are specified in no particular order; no assignment statement: z = y + 3 ; does not store a value in the variable z, but means "z is 3 plus the value of y" ; Examples: Prolog and XSLT)
Object-oriented (Extended the procedural paradigm; The focus is not on process but on data and the operations performed on the data; Classes, objects, Data; Supports abstraction, encapsulation, inheritance, late binding; allows loosely coupled components (i.e., ravioli code rather than spaghetti code) Java is object-oriented only - C++ is procedural and object-oriented; Examples: Java, C++, C#, Delphi)
Scripting ("A slick solution to a difficult problem"; Reduces the complexity and time to solve a problem; Targets a particular environment; Interpreted; Examples: perl, JavaScript, PHP, python, sh, awk)
Markup (not a programming language per se, but used to specify the layout of information in Web documents; Minimal control features; Examples: CSS, XHTML, XML)
Multi-Paradigm (Many modern languages are multi-paradigm - C++, Python, Ruby, Perl, Scheme, F# - functional,OO)

Language Design Trade-Offs

Cost at the expense of reliability (improving reliability will increase development costs but reduce maintenance costs; Example: Java demands all references to array elements be checked for proper indexing but that leads to increased execution costs )
Writability at the expense of readability (Examples: APL provides many powerful operators and a large number of new symbols - allowing complex computations to be written in a compact program, but with poor readability; using the same closing symbol or keyword for multiple control statements is easy to write but difficult to read - determining what is terminated requires searching previous code; Lisp is easy to write but more difficult to read)
Writability at the expense of reliability (C's pointers are powerful and flexible but not reliably used)
Cost at the expense of readability (profit puts tremendous pressure to get software to market with the testing phase left to the end user; e.g., at release Windows 2000 had 64,000 documented errors, 32,000 of which were problematic; out outcome is sloppy and difficult to maintain code)

i m p l e m e n t a t i o n

Three Methods

Compilation (source code is translated into machine language by a compiler to produce an executable; slow translation, fast execution)
Interpretation (source code is translated and executed line by line by an interpreter at runtime)
Hybrid Implementation Systems (part compilation / part interpretion or the option to do one or the other)

Preprocessors

Used in both compilation and interpretation
Preprocessor macros (instructions)are commonly used to specify that code from another file is to be included
A preprocessor processes a program immediately before the program is compiled to expand embedded preprocessor macros
A well-known example: C preprocessor expands #include, #define, and similar macros

Compilation and Execution Phases

Lexical analysis (converts characters in the source program into lexical units called lexemes)
Syntax analysis (transform lexical units into parse trees to represent the syntactic structure of program - multiple passes - create symbol table for each compilation unit)
Semantics analysis (generate machine code)
Linking (collecting system programs and linking them to user programs - resolving symbols across compilation units)
Load module (load executable image into memory)

Pure Interpretation

No translation into loadable machine code
Easier implementation of programs (run-time errors can easily and immediately displayed)
Slower execution (10 to 100 times slower than compiled programs)
Often requires more space
Often now given choice to compile or interpret
Significant comeback with Web scripting languages (JavaScript, Php)

Hybrid Implementation Systems

A compromise between compilers and pure interpreters
A high-level language program is translated to an intermediate language that allows easy interpretation
Faster than pure interpretation
Perl programs are partially compiled to detect errors before interpretation
Initial implementations of Java were hybrid; the intermediate form, byte code, provides portability to any machine that has a byte code interpreter and a run-time system (together, these are called Java Virtual Machine)
Just-in-Time (JIT)hybrid implementation process
Initially translate programs to an intermediate language then compile intermediate language into machine code; Machine code version is kept for subsequent calls
JIT widely used for Java programs; .NET languages implemented as JIT system

Development Environments

a collection of tools vs. integrated development environment (IDE)
Unix (a collection of separate command-line tools - vi, cc, gcc, make, tar, gzip, ... - not integrated but similar across all Unix platforms)
Borland JBuilder (IDE for Java)
Microsoft Visual Studio.NET (A large, complex GUI IDE for C#, Visual BASIC.NET, Jscript, J#, or C++)

t e r m i n o l o g y

Programming methodology

defines the overall approach a programmer takes to write software in a programming language. Programming methodology is the theoretical model that underlies language design. New methodologies (e.g., object-oriented) result in new language designs.

Von Neumann architecture

the prevalent design of modern computer systems. In this design data and programs are stored in memory, memory is separate from the CPU, instructions and data are piped from memory to the CPU. Execution of machine code on a von Neumann architecture follows this fetch-execute cycle:

      initialize program counter (PC) with address of first instruction
      repeat forever
         fetch the instruction in PC 
         increment PC 
         decode instruction
         execute instruction
      end repeat

Von Neumann bottleneck

this is the primary limiting factor in the speed of computers today (not processor speed). The bottleneck occurs because the connection speed between memory and the processor is slower than the speed at which instructions can be executed by the CPU. Parallelism and the use of cache attempt to solve this problem.

side effect

is any modification to the state of a running program. Side effects may be intentional (imperative languages depend on side effects in the form of mutable data and changes to input and output)or unintentional (generally a fault). A purely functional language has no side effects. Functional programs will behave the same in any context and can be executed in parallel without interference. Such programs are easily verified and optimized. This is a big advantage and may outweight the limitations of a functional language for some applications.

pointer

a reference (address) to a memory location. In C a pointer is a primitive data type that stores a memory address. (see code)

     int * stuff;
     int num = 5;
     stuff = #
     printf(%p %d,stuff, *stuff);
     stuff++; // what does this do?
     stuff = stuff/2;   // pointer division is not legal

alias

is two or more references to the same memory location. Pointers can be used as aliases in C/C++, which most consider to be the language's biggest liability. Aliasing can violate both reliability and readability. Assume you have a class Student that uses dynamic memory allocation for the student's name:

   char * name;

If an overload assignment operator is not coded, these statements will produce cross-linked pointers:

   Student a("Sam Spade");
  Student b("Joe Smoo")
  a = b;  // two pointers are now pointing to the same memory location

oorthogonality

is non-interference; e.g., orthogonal vectors. Orthogonality in a computer instruction set means that all instructions can be uniquely combined with all registers and addressing modes. In high-level languages, orthogonality means that language primitives can be consistently combined without exception. Non-orthogonality makes a language harder to learn. Complete orthogonality is impossible to achieve unless the language is so simple to be useless. Orthogonality requires a small set of primitives--the larger the set, the more difficult it is to maintain orthogonality.

For example, the primitive constructs in C/C++ include arithmetic operators (+ , -, * , /) and scalar data types int, float, double, pointer. In a completely orthogonal language it should be feasible to combine all operators with all data types.

C/C++ is not orthogonal because arithmetic operators do not consistently work on pointers.

  int  a = 5 ;
  int  b = 10;
  int  c = 15 ;

  int * aptr = & a;
  int * bptr = & b;
  int * cptr = & c;

  a++; // legal but does not behave like increment on integers 
  a = b * c;  // OK
  aptr = bptr;  // OK
  aptr = bptr + cptr;  // illegal
  aptr = bptr * cptr;  // illegal

C++ overloaded << and >> operators are non-orthogonal: they can mean bit shifting or output/input depending on the context (Stroustroup what were you thinking). Other examples of non-orthogonality in C:

1. C has two built-in data structures, arrays and records (structs). Structs can be returned from functions but arrays cannot.
2. A member of a struct can have any type except void or a structure of the same type.
3. An array element can be any data type except void or a function.
4. Parameters are passed by value, unless they are arrays, in which case they are passed by reference.

scalar

data type is one that cannot be divided into other data types. Examples of scalars are int, char, double, char *. Scalars are addressed by a single memory address. Examples of non-scalars are arrays, classes, and records. Non-scalars are addressable by more than one memory address. In the beginning, all data were scalars - arrays were added next.

top

Chapter 14.1 - 14.4: E X C E P T I O N H A N D L I N G

C++ sample code
C++ exception handling FAQ
C++ Exception Classes
Exception handling drawbacks

The concepts in this chapter will be investigated hands-on in this week's lab.

Basic Concepts

An exception is an asynchronous event of some kind, generally an error. Without exception handling, when an exception occurs, control is passed to the kernel and the program terminates. With exception handling the programmer can trap the exception and gracefully terminate of continue. Processing the exception is called *exception handling* Many languages allow programs to trap input/output errors (including EOF).

Most common exceptions: divide by zero, illegal memory access (dereferencing a bad pointer, array out-of-bounds), file open errors, input errors (reading float into char), reading past EOF

An exception is *raised* when its associated event occurs. The exception handling code unit is called an *exception handler*.

User-defined Exception Handling

A language that does not have exception handling built-in can still define, detect, raise, and handle exceptions Pass an argument (flag or error msg) to an error-handling routing or return a flag (boolean, int) Have some mechanism for interpreting what the return flag means (see Unix strcmp) Give exception handling utilities global scope

Advantages of Built-in Exception Handling in C++

Added to C++ in 1990 (Design based on CLU, Ada, and ML) Code reusability, uniformity, promotes readability Error detection code is tedious to write and clutters the program Allows you to rethrow back to main, unwind the runtime stack and perform cleanup (very difficult to code otherwise) Saves the return value for something other than error handling Supports dynamic memory management if objects are allocated in a try block since destructors are called upon exiting the block Increases chances that a program can recover without a complete crash

Things you should know about exception handling in any language

How is an exception occurrence bound to an exception handler?
How and where are exception handlers specified and what is their scope?
Are there any built-in exceptions?
Are default exception handlers used if exception handling is not explicitly coded?
Can built-in exceptions be explicitly raised?
Are hardware-detectable errors treated as exceptions that can be handled?
How can exceptions be disabled, if at all?
How does exception handling control flow work?
Where does execution continue, if at all, after an exception handler completes its execution?

Exception Handling in C++

Basic syntax:


  try {
    throw  
  }
  catch (formal parameter) {
     throw    // optional re-throw will propogate through runtime state
  }
  catch (...) { // a generic handler
  }

Specific example:


  const int DivideByZero = 10;
  //....
  double divide(double x, double y) {
      if(y==0) throw DivideByZero;
      return x/y;
  }
  ///...
  try {
     divide(10, 0);
  }
  catch(int i) {
    if (i==DivideByZero) cerr << "Divide by zero error";
  }

// example of error objects


  class DivideByZero
  {
      public:
          double divisor;
          DivideByZero(double x);
  };
  DivideByZero::DivideByZero(double x) : divisor(x)
  {}
  int divide(int x, int y)
  {
      if(y==0)
      {
          throw DivideByZero(x);
      }
  }
  try
  {
      divide(12, 0);
  }
  catch (DivideByZero divZero)
  {
      cerr<<"Attempted to divide "<<divZero.divisor<<" by zero";
  }

Notes on C++ facility:

catch function is overloaded--formal parameter of each catch must be unique The formal parameter need not have a variable It can be simply a type name to distinguish the handler it is in from others The formal parameter can be used to transfer information to the handler If formal parameter is an ellipsis it handles all exceptions not yet handled
A throw without an operand can only appear in a catch block; when it appears it re-throws the exception to the next available handler by unwinding the runtime stack
An unhandled exception is thrown to every function on the runtime stack until it is finally thrown to main; if no handler is found in main the program terminates with an error message (defeating the purpose of exception hndling)
After a handler completes its execution, control flows to the first statement after the last handler in the sequence of handlers of which it is an element
Exceptions are not named; hardware and system software-detectable exceptions cannot be handled
Exceptions are bound to handlers through the type of the parameter (does not promote readability)

top