It is essential for anyone who wants to be considered a professional in the
areas of software to know several languages and several programming paradigms
. . . it's not a good idea to know just C++, let alone to know just a
single-paradigm language. Much of the inspiration in good programming comes
from having learned and appreciated several programming styles.
--Bjarne Stroustrup, C++ author
Chapter 1: P R O G R A M M I N G L A N G U A G E S C O N C E P T S
This course covers the concepts of high-level programming languages with a
focus is design criteria and evaluation.
The best way to learn a language concept is to code in it.
Topics covered: - syntax & semantics
- variables, identifiers, scope and lifetime
- data types
- expressions and assignment statements
- control statements
- functions, subroutines & parameter passing
- object oriented constructs
- implementation
Why learn programming language concepts?
- understand programming language evolution
- be a better coder
- evaluate and select best language for a particular task
- learn new languages more quickly
- better understand language implementation
Programming languages are designed to solve problems in a particular
programming domain.
- Scientific applications
(large number of floating point computations - Fortran)
- Business applications
(reports, decimal numbers and characters - COBOL)
- Artificial intelligence (symbols rather than numbers - Lisp and Prolog)
- Systems programming (efficient code, OS kernels, device drivers - C)
- Text processing (process line by line, manipulate strings,
pattern matching - SNOBOL and awk)
- Scripting (quick, easy - Perl, Python, Ruby)
- Web (client-side scripting - JavaScript; markup - XHTML,
server-side scripting - PHP, Python)
- General Purpose (C++, Java, Visual Basic, Python)
- Educational (Pascal, Scheme)
l a n g u a g e
e v a l u a t i o n
c r i t e r i a
What makes a language good? Or better than another language?
bubble-sort example
READABILITY
Readability is the ease with which programs can be read and understood
given the provisions of the language.
Given well-written source how easy is it to understand the meaning
of the code?
- Overall simplicity - support for static modularity and
encapsulation via enforcement of scope and lifetime rules;
restrictions on global variables;
manageable set of features and constructs -- too many features mean
programmers learn different subsets of language, reducing
readability;
little feature multiplicity - how many ways of doing the same
operation; minimal operator overloading --
user-defined overloading can reduce readability)
- Orthogonality - having a relatively small set of primitives and
a consistent way of combining them - where every possible
combination is legal with no exceptions
- Control statements - support for all widely used control structures -
while, if, for, repeat, etc.
- Data types and structures - support for data types that improve
readability: boolean, strings, decimal, associative arrays, etc.
- Syntax - support for expressive identifiers; consistent method for
ending compound statements but, unlike C's "{ }", a rich set of
special words such as Ada's "end if" and "end loop" in this Ada code
begin
if Container'Length > 0 then
loop
Mid := (Low + High)/ 2;
if Value < Container (Mid)then
exit when Low = Mid;
High := Mid - 1;
elsif Container (Mid)< Value then
exit when High = Mid;
Low := Mid + 1;
else
return Mid;
end if;
end loop;
end if;
raise Not_Found;
end Search;
- Form and meaning (consistent constructs;
context-free keywords - unlike "static" in C, support for programming
conventions; force indentation as in Python and not
as in obfuscated C)
# In Python,code blocks are defined by :
# consistent indentation is MANDATORY
def sort(list):
for first in range(len(list)):
minimum = first
for index in range(first,len(list)):
if list[minimum] > list[index]:
minimum = index
temp = list[first]
list[first] = list[minimum]
list[minimum] = temp
// code in C without indentation will compile but good luck reading it
sort(list){
int size = 5, first, minimum, temp, index;
int list[size];
for (first = 0; first < size; first++){
minimum = first;
for (index = first; index < size; index++)
if (list[minimum] > list[index]);
minimum = index; temp = list[first]; list[first] = list[minimum];
list[minimum] = temp; } }
Some languages are easy to write but difficult to read. Regexes (a language
albeit not a programming language) demonstrate this problem:
^.*[0-9a-z]+?\([a-zA-Z}*$\) # difficult to proofread and debug
Lisp is equally easy to write but difficult to read (from H2wUc):
(lambda (*<8-]= *<8-[= ) (or *<8-]= *<8-[= ))
(defun :-] (<) (= < 2))
(defun !(!)(if(and(funcall(lambda(!)(if(and '(< 0)(< ! 2))1 nil))(1+ !))
(not(null '(lambda(!)(if(< 1 !)t nil)))))1(* !(!(1- !)))))
WRITABILITY
Writability is a measure of coding efficiency. It is the ease at which
a coder can learn the language in order to write code and how long it
takes to write code once you know the language.
- A language can be easy to read but difficult to write or easy to write but difficult to read
- Related to the number of keywords in the language (too many keywords
means language may be more difficult to learn but too few means the
language is not expressive); C is notoriously lean - it has under 25 keywords
- Code reusability and extensibility - reusing code means quicker coding
- Simplicity and orthogonality (few constructs, small number of
primitives, small set of rules for combining them with no exceptions;
non-orthogonality results in exceptions to the rules)
- Support for abstraction - the ability to define and use complex structures
for operations that allow details to be ignored - C++ String and
STL classes, Perl's array operations
- Expressiveness - a set of relatively convenient ways of specifying
operations - ability to express conceptual abstractions directly
and simply; Ex. Scheme: Perl: ($a,$b)= swap($a,$b);
RELIABILITY
Reliability is conformance to specification under all circumstances
without fault.
- Faults (fatal crashes, freezes, or erroneous results; solid,
intermittent, or transient)
- Fault prevention (goal is to catch most faults before run-time; static
type checking; use of const in C++ to protect data)
- Exception handling (intercept run-time errors and gracefully
recover - not core dump)
- Aliasing (two or more references to the same
memory location; very problematic; cross-linked pointers in C/C++)
- Pointers (generally problematic - buffer overruns, illegal access,
etc.)
- Readability and writability (natural, intuitive ways of expressing
an algorithm)
COST
Total cost of a language throughout entire
language lifecycle from development, testing and maintenance.
- Initial cost to develop and implement the language (Ada vs. Ruby)
- Cost to train programmers to use language (writability, widely used?)
- Portability (cost to migrate to new OS and/or new hardware platforms)
- Cost to write programs (coding rate; fitness for particular
application)
- Cost to compile, execute and maintain code (proprietary or GPL ?
compiler and runtime license fees; availability and reliability of
free compilers; access to good documentation and support)
- Reliability (poor product reliability leads to high maintenance costs)
OTHER CRITERIA
- Portability (the ease of porting from one platform to
another; somewhat compiler specific; standardized library
routines across multiple hardware platforms - Java)
- Generality (applicability to a wide range of applications
and problem domains - Java, C++, Python)
- Well-definedness (standards; completeness and precision of
the language's official definition)
- Efficiency (compiled / interpreted runtime performance, library size,
dynamic and static memory management,
code optimization capabilities; benchmark testing)
- Diagnosability (ease of testing and finding faults; compiler
and runtime error messages, debugging capabilities; fault
coverage - % of faults found)
- Maintainability (ease of fixing faults and modifying code)
- Dynamic Memory Management (automatic; no segmentation faults
or memory leaks - C/C++ poor)
- Security (protection against unauthorized intrusions or attacks;
ability to prevent illegal code or data modification -
C is poor - buffer overflow problem)
- Network Support (support for network features - RPCs, concurrency;
ease of writing network applications)
Prescelt's Comparison Study
- C/C++, Java, Perl, Python, Rexx, and Tcl
- experiment uses a group of programmers to reduce variation
among individual programmers
- same set of requirements implemented in each language
- Prescelt's criteria: runtime efficiency, memory consumption, source text
length, comment density, program structure, reliability, amount of
effort to write the program
- LOC/hr (coding rate)is poor metric for measuring the writability of a
language - coding rate is more dependent on the programmer and less
dependent of the language; good coders write code faster regardless of
the language.
- scripting languages are more productive than conventional languages
- "differences between languages tend to be smaller than differences
between programmers.."; poor code can be written in any language
int things = 0
const sTuff = 5;
int Stuff[sTuff] = things,
stufF[sTuff] = things,
STUFF[sTuff] = things;
for (int stuFf = 0; stuFf < sTuff; stuFf++){
for (int stuFF = 0; stuFF < stuFf; stufF++){
(Stuff[stuFf] * stufF[stuFF])+ (Stuff[stuFF ] * stuf[stuFf]);
l a n g u a g e
d e s i g n
Influences on Language Design
- #1: Computer Architecture (imperative languages model von Neumann
architecture - variables model memory cells; assignment statements
model memory storage; iteration most common control mechanism)
- parallelism attempts to overcome von Neumann bottleneck: 5-stage pipeline -
InstructionFetch InstructionDecode EXecute MEMory WriteBack - and
the use of cache have not significantly impacted language design
- High-level Programming Methodology - functional,
object oriented, structured programming - has only recently impacted language
design
Evolution of Language Design
- 1950s and early 1960s: Simple applications; maximize machine efficiency
- Late 1960s - early 1970s: need to maximize programmer efficiency -
readability, better control structures; structured programming:
top-down design and step-wise refinement; added stronger type
checking and sophisticated control structures
- Late 1970s: move from process-oriented to data-oriented - data abstraction
- Middle 1980s: Object-oriented programming
Data abstraction + inheritance + polymorphism
Language Classifications
Programming languages are sometimes categorized into "generations":
- first generation - machine code; second generation - assembly;
- third generation - C, Fortran; fourth generation - C++, Java;
- fifth generation - logic and contraint languages such as prolog
Languages are more importantly
classified into one of several primary design paradigms:
- Imperative (Central features are variables, assignment statements, and
iteration;
Statements are executed sequentially; a program evolves from an
initial state to a final state through a series of
state changes;
Criticism: von Neumann bottleneck
Examples: everything but logic and functional languages)
- Declarative (the opposite of imperative - any language that
does NOT follow a step by step algorithm ; examples are logic
and functional languages)
- Functional
(No traditional variables, no assignments statements, no iteration;
Main means of making computations is by applying functions to given
parameters;
No side effects - any changes to the state of the program;
Examples: Lisp, Scheme)
- Procedural (A paradigm based on the concept of a procedure call
-- AKA: subprogram, subroutine, function, method, routine;
structured, modular, scoping (!! reduces spaghetti code !!);
All modern imperative languages are procedural - early Basic and
Fortran was not procedural - hence term is often replaced with
imperative
- Logic (Based on symbolic logic and logical inferencing;
Data are facts;
Rule-based - rules are specified in no particular order;
no assignment statement:
z = y + 3 ;
does not store a value in the variable z, but means "z is 3 plus the
value of y" ;
Examples: Prolog and XSLT)
- Object-oriented
(Extended the procedural paradigm; The focus is not on process but
on data and the operations performed on the data;
Classes, objects, Data; Supports
abstraction, encapsulation, inheritance, late binding;
allows loosely coupled components (i.e., ravioli code rather than
spaghetti code)
Java is object-oriented only - C++ is procedural and object-oriented;
Examples: Java, C++, C#, Delphi)
- Scripting ("A slick solution to a difficult problem";
Reduces the complexity and time to solve a problem;
Targets a particular environment;
Interpreted;
Examples: perl, JavaScript, PHP, python, sh, awk)
- Markup (not a programming language per se, but used to specify the layout
of information in Web documents;
Minimal control features;
Examples: CSS, XHTML, XML)
- Multi-Paradigm
(Many modern languages are multi-paradigm - C++, Python, Ruby, Perl,
Scheme, F# - functional,OO)
Language Design Trade-Offs
- Cost at the expense of reliability (improving reliability will increase
development costs but reduce maintenance costs;
Example: Java demands all references to array elements be checked for
proper indexing but that leads to increased execution costs )
-
Writability at the expense of readability (Examples:
APL provides many powerful operators and a large number of
new symbols - allowing complex computations to be written in a
compact program, but with poor readability;
using the same closing symbol or keyword for multiple control
statements is easy to write but difficult to read -
determining what is terminated requires searching
previous code; Lisp is easy to write but more difficult to read)
- Writability at the expense of reliability (C's pointers are powerful and
flexible but not reliably used)
- Cost at the expense of readability (profit puts
tremendous pressure to get software to market with the
testing phase left to the end user; e.g., at release Windows 2000
had 64,000 documented errors, 32,000 of which were problematic;
out outcome is sloppy and difficult to maintain code)
i m p l e m e n t a t i o n
Three Methods
- Compilation (source code is translated into machine language by
a compiler to produce an executable; slow translation,
fast execution)
- Interpretation (source code is translated and executed line by
line by an interpreter at runtime)
- Hybrid Implementation Systems (part compilation / part interpretion
or the option to do one or the other)
Preprocessors
- Used in both compilation and interpretation
- Preprocessor macros (instructions)are commonly used to specify that code
from another file is to be included
- A preprocessor processes a program immediately before the program is
compiled to expand embedded preprocessor macros
- A well-known example: C preprocessor
expands #include, #define, and similar macros
Compilation and Execution Phases
- Lexical analysis (converts characters in the source program into lexical
units called lexemes)
- Syntax analysis (transform lexical units into parse trees to
represent the syntactic structure of program - multiple
passes - create symbol table for each compilation unit)
- Semantics analysis (generate machine code)
- Linking (collecting system programs and linking them to user programs -
resolving symbols across compilation units)
- Load module (load executable image into memory)
Pure Interpretation
- No translation into loadable machine code
- Easier implementation of programs (run-time errors can easily and
immediately displayed)
- Slower execution (10 to 100 times slower than compiled programs)
- Often requires more space
- Often now given choice to compile or interpret
- Significant comeback with Web scripting languages (JavaScript, Php)
Hybrid Implementation Systems
- A compromise between compilers and pure interpreters
- A high-level language program is translated to an intermediate language
that allows easy interpretation
- Faster than pure interpretation
- Perl programs are partially compiled to detect errors before
interpretation
- Initial implementations of Java were hybrid; the intermediate form, byte
code, provides portability to any machine that has a byte code
interpreter and a run-time system (together, these are called Java
Virtual Machine)
- Just-in-Time (JIT)hybrid implementation process
- Initially translate programs to an intermediate language then compile
intermediate language into machine code;
Machine code version is kept for subsequent calls
- JIT widely used for Java programs; .NET languages implemented as JIT system
Development Environments
- a collection of tools vs. integrated development environment (IDE)
- Unix (a collection of separate command-line tools - vi, cc, gcc, make,
tar, gzip, ... - not integrated but similar across all Unix platforms)
- Borland JBuilder (IDE for Java)
- Microsoft Visual Studio.NET (A large, complex GUI IDE
for C#, Visual BASIC.NET, Jscript, J#, or C++)
t e r m i n o l o g y
- Programming methodology
- defines the overall
approach a programmer takes to write software in
a programming language. Programming methodology is the theoretical
model that underlies language design.
New methodologies (e.g., object-oriented)
result in new language designs.
- Von Neumann architecture
- the prevalent design of modern computer
systems. In this design data and programs are stored in memory, memory
is separate from the CPU, instructions and data are piped from memory to
the CPU. Execution of machine code on a von Neumann architecture follows
this fetch-execute cycle:
initialize program counter (PC) with address of first instruction
repeat forever
fetch the instruction in PC
increment PC
decode instruction
execute instruction
end repeat
- Von Neumann bottleneck
- this is the primary limiting factor in the
speed of computers today (not processor speed).
The bottleneck occurs because the
connection speed between memory and the processor
is slower than the speed at which instructions
can be executed by the CPU. Parallelism and the use of
cache attempt to solve this problem.
- side effect
- is any modification to the state of a running program.
Side effects may be intentional
(imperative languages depend on side effects in the form of mutable data and
changes to input and output)or unintentional
(generally a fault). A purely functional language has no
side effects. Functional programs will behave the same in any context and
can be executed in parallel without interference. Such programs
are easily verified and optimized.
This is a big advantage and may outweight the limitations of a
functional language for some applications.
- pointer
- a reference (address) to a memory location. In C a
pointer is a primitive data type that stores a memory address.
(see code)
int * stuff;
int num = 5;
stuff = #
printf(%p %d,stuff, *stuff);
stuff++; // what does this do?
stuff = stuff/2; // pointer division is not legal
- alias
- is two or more references to the same memory location.
Pointers can be used as aliases in C/C++, which most consider
to be the language's biggest liability. Aliasing can
violate both reliability and readability. Assume you have a class Student
that uses dynamic memory allocation for the student's
name:
char * name;
If an overload assignment operator is not coded, these statements will produce
cross-linked pointers:
Student a("Sam Spade");
Student b("Joe Smoo")
a = b; // two pointers are now pointing to the same memory location
- oorthogonality
- is non-interference; e.g., orthogonal vectors.
Orthogonality in a computer
instruction set means that all instructions can be uniquely combined with all
registers and addressing modes. In high-level languages, orthogonality
means that language primitives can be consistently combined
without exception.
Non-orthogonality makes a language harder to learn.
Complete orthogonality is impossible to achieve unless the language is so
simple to be useless.
Orthogonality requires a small set of primitives--the
larger the set, the more difficult it is to maintain orthogonality.
For example, the primitive constructs in C/C++ include arithmetic operators
(+ , -, * , /) and scalar data types int, float, double, pointer.
In a completely orthogonal language it should be feasible to combine
all operators with all data types.
C/C++ is not orthogonal because
arithmetic operators do not consistently work on pointers.
int a = 5 ;
int b = 10;
int c = 15 ;
int * aptr = & a;
int * bptr = & b;
int * cptr = & c;
a++; // legal but does not behave like increment on integers
a = b * c; // OK
aptr = bptr; // OK
aptr = bptr + cptr; // illegal
aptr = bptr * cptr; // illegal
C++ overloaded << and >> operators are non-orthogonal:
they can mean bit shifting or output/input depending on the context
(Stroustroup what were you thinking).
Other examples of non-orthogonality in C:
1. C has two built-in data structures, arrays and records (structs).
Structs can be returned from functions but arrays cannot.
2. A member of a struct can have any type except void or a structure of the
same type.
3. An array element can be any data type except void or a function.
4. Parameters are passed by value, unless they are arrays, in which case
they are passed by reference.
- scalar
- data type is one that cannot be divided into other data types.
Examples of scalars are int, char, double, char *. Scalars are addressed by
a single memory address. Examples of non-scalars
are arrays, classes, and records. Non-scalars are addressable by more than
one memory address. In the beginning, all data were scalars - arrays were
added next.
top
Chapter 14.1 - 14.4: E X C E P T I O N H A N D L I N G
The concepts in this chapter
will be investigated hands-on in this week's lab.
Basic Concepts
An exception is an asynchronous event of some kind, generally an error.
Without exception handling, when
an exception occurs, control is passed to the kernel and the
program terminates.
With exception handling the programmer can trap the exception and gracefully
terminate of continue.
Processing the exception is called *exception handling*
Many languages allow programs to trap input/output errors (including EOF).
Most common exceptions: divide by zero, illegal memory access (dereferencing
a bad pointer, array out-of-bounds), file open errors, input errors (reading
float into char), reading past EOF
An exception is *raised* when its associated event occurs.
The exception handling code unit is called an *exception handler*.
User-defined Exception Handling
A language that does not have exception handling built-in can still
define, detect, raise, and handle exceptions
Pass an argument (flag or error msg) to an error-handling routing
or return a flag (boolean, int)
Have some mechanism for interpreting what the return flag means (see
Unix strcmp)
Give exception handling utilities global scope
Advantages of Built-in Exception Handling in C++
Added to C++ in 1990 (Design based on CLU, Ada, and ML)
Code reusability, uniformity, promotes readability
Error detection code is tedious to write and clutters the program
Allows you to rethrow back to main, unwind the runtime stack and perform
cleanup (very difficult to code otherwise)
Saves the return value for something other than error handling
Supports dynamic memory management if objects are allocated in a try block
since destructors are called upon exiting the block
Increases chances that a program can recover without a complete crash
Things you should know about exception handling in any language
- How is an exception occurrence bound to an exception handler?
- How and where are exception handlers specified and what is their scope?
- Are there any built-in exceptions?
- Are default exception handlers used if exception handling is not
explicitly coded?
- Can built-in exceptions be explicitly raised?
- Are hardware-detectable errors treated as exceptions that can be handled?
- How can exceptions be disabled, if at all?
- How does exception handling control flow work?
- Where does execution continue, if at all, after an exception handler
completes its execution?
Exception Handling in C++
Basic syntax:
try {
throw
}
catch (formal parameter) {
throw // optional re-throw will propogate through runtime state
}
catch (...) { // a generic handler
}
Specific example:
const int DivideByZero = 10;
//....
double divide(double x, double y) {
if(y==0) throw DivideByZero;
return x/y;
}
///...
try {
divide(10, 0);
}
catch(int i) {
if (i==DivideByZero) cerr << "Divide by zero error";
}
// example of error objects
class DivideByZero
{
public:
double divisor;
DivideByZero(double x);
};
DivideByZero::DivideByZero(double x) : divisor(x)
{}
int divide(int x, int y)
{
if(y==0)
{
throw DivideByZero(x);
}
}
try
{
divide(12, 0);
}
catch (DivideByZero divZero)
{
cerr<<"Attempted to divide "<<divZero.divisor<<" by zero";
}
Notes on C++ facility:
- catch function is overloaded--formal parameter of each catch must be unique
The formal parameter need not have a variable
It can be simply a type name to distinguish the handler it is in from others
The formal parameter can be used to transfer information to the handler
If formal parameter is an ellipsis it handles all exceptions not yet handled
- A throw without an operand can only appear in a catch block; when it
appears it re-throws the exception to the next available handler by unwinding
the runtime stack
- An unhandled exception is thrown to every function on the runtime stack
until it is finally thrown to main; if no handler is found in main the program
terminates with an error message (defeating the purpose of exception hndling)
- After a handler completes its execution, control flows to the first statement
after the last handler in the sequence of handlers of which it is an element
- Exceptions are not named; hardware and system software-detectable
exceptions cannot be handled
- Exceptions are bound to handlers through the type of the parameter
(does not promote readability)
top