Chapter 7 - Expressions and Assignment Statements

resources:
C ref man
C/C++ precedence & associativity chart
hw07.c sample C code

Expressions and assignment statements are the fundamental means of computation in imperative languages. An expression is any statement that can legally appear on the right side of an assignment statement. An expression can be as simple as a literal constant

     num =  5.787;
     str = "hello";

or can involve any number of the operators in the language. Supported operators vary by language. C/C++ include these low-level bitwise operators:

 << - shift left
        >> - shift right
        & - bitwise and
        | - bitwise or
        ^ - bitwise exclusive or

) and a comma (sequence) operator:


  while(read_string(s), s.len() > 5)
  {
     //do something
  }

. Scripting languages and Java support string concatenation. The power of an imperative language is related to the number and type of operators that are supported. See Java Expressions. Java includes concatenation. The more strongly typed a language is, the more complicated are the constraints placed on expressions (compare C with Java).

The two primary types of expressions in modern languages are expressions that return a number (arithmetic) and expressions that return true or false (relational and boolean expressions). Every expression in C returns a number since there is no boolean type.

ARITHMETIC EXPRESSIONS

arithmetic computation influenced development of programming languages

expressions consist of operators, operands, and parentheses

in most modern languages, the return value from a function call can be an operand

 
Design Issues
o  operator precedence and associativity rules
o  order of operand evaluation (issue if side-effects)
o  operand evaluation side effects 
o  operator overloading
o  mode mixing expressions (e.g. float with integer)

Operators
  Arity (number of operands)
  unary:   ++
  binary:  +
  ternary: ?:    (a>5) ? b++ : b--;   <= ternary conditional 

Operator Precedence Rules 

   Typical precedence levels: C++ chart

      parentheses
      postfix ++, --
      prefix ++, --
      unary +, -
      *,/,%
      +, -
      =

Operator Associativity Rules (see chart)
o associativity sets evaluation order of adjacent operators of equal precedence 
o binary operators typically associate left to right
o unary operators typically associate right to left
o Sample code in C


Ternary Conditional Expressions

   average = (count == 0) ? 0 : sum / count
   means:
     if (count == 0) 
          average = 0
     else 
          average = sum /count
			
Operand Evaluation Order

o  Variables: fetch the value from memory
o  Constants: sometimes a fetch from memory; sometimes the constant is in the 
	machine language instruction
o  Parenthesized expressions: evaluate all operands and operators first
o  postfix v. prefix increment/decrement operators: 
   y = x * z++;  the current value of z is used to evaluate the expression 
                 (i.e., y = x * z) then z is incremented 
   y = x * ++z;  z is incremented first

Unwanted Functional Side Effects 

o  A side-effect is anything that changes the environment of a program during
   execution ; imperative languages are built on side-effects ; a functional
   side-effect occurs in the time between a function call and a function 
   return

o  There is a potential "unwanted side-effect" when a function changes a 
   two-way parameter or a non-local (global or static) variable:

      int b, a = 10;
      b = a + fun(&a);  /* assume fun changes a to 5 and returns it*/
      What is the value of b?
      If 'a' becomes 5 before addition, then b=10, otherwise b=15
         
o  Solution 
   1. Disallow functional side effects in language ; no two-way parameters or 
      non-local references in functions; Disadvantage: inflexibility 

   2. Demand operand evaluation order be fixed in language definition
      Disadvantage: limits some compiler optimizations

 OVERLOADED OPERATORS

o  Use of an operator for more than one purpose is called operator overloading
o  Some are easy to understand (e.g., + for int and float)
o  Some are not (*  in C/C++ is both multiplication and pointer dereferencing)
o  Loss of readability if meaning is not intuitive
o  Avoided by use of new symbols (e.g., Pascal's div for integer division)
o  C++ and Ada allow user-defined overloaded operators
o  Potential problems: 
   Users can define nonsense operations 
   Can increase code complexity

   Example in JavaScript of overloaded + operator:
   
      // JavaScript is dynamically typed but does not behave like Perl
      stuff = prompt('Enter an integer or a string:');
      myInt = 5;
      // + is overloaded to accept numbers or strings
      // by default myInt is coerced to a string - unlike perl 
      area.innerHTML = stuff + ' + ' + myInt + '=' + stuff + myInt; 
      // if stuff is an integer OK, otherwise returns NaN
      // the parens around parseInt MUST be there or + is string concat
      area2.innerHTML = 
      stuff + ' + parstInt(' + myInt + ')=' + (parseInt(stuff)+myInt); 

      (run the script)


o  Advantage: 
   overload '=' operator to prevent cross-linked pointers in C++; (see C++ example)

TYPE CONVERSIONS

A narrowing conversion converts an object to a type that reduces precision or range of values of original type e.g., float to int or int to short

A widening conversion converts an object to a type that increases precision or the range of values of original type e.g., int to float or short to int. There are some standards but see /usr/include/limits.h for the limits on your specific compiler. limits.h for sleipnir is shown below:

  
   
                   Type  Bytes  Bits                Range

            short int    2      16          -16,384 -> +16,383          
   unsigned short int    2      16                0 -> +32,767          
         unsigned int    4      16                0 -> +4,294,967,295   
                  int    4      32   -2,147,483,648 -> +2,147,483,647   
             long int    4      32   -2,147,483,648 -> +2,147,483,647   
          signed char    1       8             -128 -> +127
        unsigned char    1       8                0 -> +255
                float    4      32
               double    8      64
          long double   12      96


  (see types.c)

Mixed Mode
o  A mixed-mode expression contains operands of different types
o  A coercion is an implicit type conversion made by the compiler or 
   runtime system
   Disadvantage: decrease type error detection of the compiler 
o  In most languages, numeric types are coerced using widening conversions
o  In C++, polymorphism uses implicit coercions from derived to base class
   (downcasting) (see C++ code)


Explicit Type Conversions
o  Called casting in C-based language. Examples:
     C: int sum = 100; int num = 15;
        float avg = (float) sum / num;
     C++: static_cast <int>(num)


Errors in Expressions
o  Inherent limitations of arithmetic; e.g., division by zero
o  Limitations of computer arithmetic; e.g. overflow
o  either ignored by run-time system or will give compiler specific results:

  
         num = __INT_MAX__: 2147483647
         01111111111111111111111111111111
         num2 = num + 1:  -2147483648
         10000000000000000000000000000000
         num + num2 = -1	
         11111111111111111111111111111111

RELATIONAL AND BOOLEAN EXPRESSIONS

Relational Expressions

o  consists of relational operators and operands of various types
o  evaluates to some boolean representation (e.g. T or 1)
o  operator symbols vary among languages; e.g. not equal: !=, /=, .NE., <>, #
o  relational expressions are a type of boolean expression
o bitwise boolean operators are not boolean expressions (do not evaluate to T/F)

Boolean Expressions
o a boolean expression evaluates to T or F (or some representation of T/F)  
o  boolean operators are: and, or, not, xor 
o  most modern languages use C notation: && is AND, || is OR, ! is NOT (no XOR)
o  operands are also boolean expressions; e.g. ((5 > 3) || (7 == 3)) is true  

Languages Without a Boolean Type 
o  C has no boolean type--it uses int type: 0 is false and nonzero is true
o  For C's relational expressions, associativity is L to R: 
           a < b < c;  (legal code) 
o  a and b are compared, producing 0 or 1; the result (0 or 1) is compared w/ c

Operator Precedence 
o  Precedence in C-like languages:
      !
      <, >, <=, >=
      =, !=
      &&
      ||

SHORT CIRCUIT EVALUATION

If the result of an expression can be determined without evaluating all operands, you can stop evaluation; i.e., short-circuit evaluation. Example:

     (13*a) * (b/13-1)       # if 'a' is zero, no need to evaluate (b/13-1)

A disjunctive boolean expression (clauses separated by ORs) can be short-circuited after the first true in the expression:

     ( (5 < 7) || (A > B) || (C == D) )     # stop at (5 < 7)

A conjunctive boolean expression (separated by ANDs) can be short-circuited after the first false in the expression:

     ( (5 > 7) && (A > B) && (C == D) )     # stop at (5 > 7)

This is a problem with non-short-circuit evaluation:

   int LIST[MAX]; 
	index = 0;
	while ( (index < MAX ) && (LIST[index] != value) )
		index++;

When index == MAX, evaluating LIST [index] will be an out-of-bounds exception

C, C++, Java use short-circuit evaluation for the usual Boolean operators (&& and ||)

bitwise Boolean operators (& | ^ ) are NOT short circuit

short-circuit evaluation can cause side effects in expressions

	    ((stuff[index++] == 99) || (index < SIZE))   #  what index do you mean?

ASSIGNMENT STATEMENTS

Expressions can be part of a condition statement ((num+num2/5)>num3) or part of an output statement (cout << num + num2). But the most common use of expressions is to be the right hand side of an assignment statement. BNF syntax for an assignment statement:

	<target_var> <assign_operator> <expression>

The assignment operator differs by language
' = ' FORTRAN, BASIC, PL/I, C, C++, Java, Perl, php,...
' := ' ALGOL, Pascal, Ada
*The use of '=' in C-like languages is problematic since '=' means equality in mathematics

Conditional Target on Assignment

Adopted by C and all C-like languages (java, php, perl, javascript...)

  
	(flag)? total : subtotal = 0

  means:	

	if (flag)
		total = 0
	else
		subtotal = 0

Compound Assignment Operators

A compound assignment statement is a shorthand method of assignment introduced in ALGOL and adopted by C and all C-like languages; Example

 a += b    # is shorthand for a = a + b
 a *= b    # is shorthand for a = a * b

Unary Assignment Operators

Unary assignment operators in C-based languages combine increment and decrement operations with assignment; ex.

	sum = ++count ( count incremented then assigned to sum )
	sum = count++ ( count assigned to sum then incremented )
	count++ (count incremented)

*Modifying a variable more than once in the same statement is undefined:

  count = 5; 
  count = -count++;   // what does this mean? should count be -4 or -6?

Assignment as an Expression

In C-like languages, an assignment statement itself can be an expression; i.e., the result of the assignment becomes an operand for the expression; e.g.,:

	   while ((ch = getchar())!= EOF){. . .}

ch = getchar() is carried out; the result is assigned to ch and becomes the lefthand side of the != operator. Disadvantage is in languages that do not have a Boolean type and use '=' for assignment (as in all C-like languages):

   int n = 0; 
   if ( n = 0 )            # assignment statement is confused for equality
      cout << "made it!";

Mixed Mode Assignment

Assignment statements, just like expressions, can be mixed-mode to varying degrees (depending on the type rules of the language):

	int a, b;
	float c;
	c = a / b;

In Java, only widening assignment coercions are supported (see docs)

In Ada, there is no assignment coercion

C/C++ supports both widening and narrowing coercions in assignments (the compiler will issue a warning that can be ignored)