C++

From DrugPedia: A Wikipedia for Drug discovery

Revision as of 04:49, 8 September 2008 by Harinder (Talk | contribs)
Jump to: navigation, search

Contents

An Introduction to C++

Welcome to the inaugural edition of the ObjectiveViewPoint column! Here we will touch on many aspects of object-orientation. The word object has surfaced in more ways than you can count. There are OOPLs (Object-Oriented Programming Languages) and OODBs (Object-Oriented Databases), OOA (object-oriented analysis), and OOD (object-oriented design). We are sure you can come up with some OOisms of your own.

Our goal in this column is to explore object-orientation through practical object-oriented programming. This time, we look at C++, but in the future we will explore other areas of object-orientation. Learning an object-oriented language-a whole new way of programming-will pave the way for many exciting topics down the road.

Our intended audience consists of humble beginners to seasoned hackers. We assume that you have programmed in at least one procedural language, such as C or Pascal. Even if you are familiar with C++, please stay with us, you may learn some interesting new language features. Also, we will illustrate our points with many self-contained examples that you may later wish to incorporate into your own programs.

C++: A Historical Perspective

We begin our journey of C++ with a little history. C, the predecessor to C++, has become one of the most popular programming languages. Originally designed for systems programming, C enables programmers to write efficient code and provided close access to the machine. C compilers, found on practically every Unix system, are now available with most operating systems.

During the 1980s and into the 1990s, an explosive growth in object-oriented technology began with the introduction of the Smalltalk language. Object-Oriented Programming (OOP) began to replace the more traditional structured programming techniques. This explosion led to the development of languages which support programming with objects. Many new object-oriented programming languages appeared: Object-Pascal, Modula-2, Mesa, Cedar, Neon, Objective-C, LISP with the Common List Object System (CLOS), and, of course, C++. Although many of these languages appeared in the 1980s, many ideas of OOP were taken from Simula-67. Yes! OOP has been around since 1967.

C++ originated with Bjarne Stroustrop. In the simplest sense, if not the most accurate, we can consider it to be a better C. Although it is not an entirely new language, C++ represents a significant extension of C abilities. We might then consider C to be a subset of C++. C++ supports essentially every desirable behavior and most of the undesirable ones of its predecessor, but provides general language improvements as well as adding OOP capability. Note that using C++ does not imply that your are doing OOP. C++ does not force you to use its OOP features. You can simply create structured code that uses only C++'s non-OOP features.

C++: A Better C

The designers of C++ wanted to add object-oriented mechanisms without compromising the efficiency and simplicity that made C so popular. One of the driving principles for the language designers was to hide complexity from the programmer, allowing her to concentrate on the problem at hand.

Because C++ retains C as a subset, it gains many of the attractive features of the C language, such as efficiency, closeness to the machine, and a variety of built-in types. A number of new features were added to C++ to make the language even more robust, many of which are not used by novice programmers. By introducing these new features here, we hope that you will begin to use them in your own programs early on and gain their benefits. Some of the features we will look at are the role of constants, inline expansion, references, declaration statements, user defined types, overloading, and the free store.

Most of these features can be summarized by two important design goals: strong compiler type checking and a user-extensible language.

By enforcing stricter type-checking, the C++ compiler makes us acutely aware of data types in our expressions. Stronger type checking is provided through several mechanisms, including: function argument type checking, conversions, and a few other features we will examine below.

C++ also enables programmers to incorporate new types into the language, through the use of classes. A class is a user-defined type. The compiler can treat new types as if they are one of the built-in types. This is a very powerful feature. In addition, the class provides the mechanism for data abstraction and encapsulation, which are key to object-oriented programming. As we examine some of the new features of C++ we will see these two goals resurface again and again.

A NEW FORM FOR COMMENTS.

It is always good practice to provide comments within your code so that it can be read and understood by others. In C, comments were placed between the tokens /* and */ like this:

/* This is a traditional C comment */

C++ supports traditional C comments and also provides an easier comment mechanism, which only requires an initial comment delimiter:

// This is a C++ comment

Everything after the // and to the end of the line is a comment.

THE CONST KEYWORD.

In C, constants are often specified in programs using #define . The #define is essentially a macro expansion facility, for example, with the definition:

  • #define PI 3.14159265358979323846

the preprocessor will substitute 3.14159265358979323846 wherever PI is encountered in the source file. C++ allows any variable to be declared a constant by adding the const keyword to the declaration. For the PI constant above, we would write:

const double PI = 3.14159265358979323846;

A const object may be initialized, but its value may never change. The fact that an object will never change allows the compiler to ensure that constant data is not modified and to generate more efficient code. Since each const element also has an associated type, the compiler can also do more explicit type checking.

A very powerful use of const is found when it is combined with pointers. By declaring a ``pointer to const, the pointer cannot be used to change the pointed-to object. As an example, consider:

int i = 10; const int *pi = &i;

  • *pi = 15;

// Not allowed! pi is a const pointer!

It is not possible to change the value of i through the pointer because *pi is constant. A pointer used in this way can be thought of as a read-only pointer; the pointer can be used to read the data to which it points, but the data cannot be changed via the pointer. Read-only pointers are often used by class member functions to return a pointer to private data stored within the class. The pointer allows the user to read, but not change, the private data.

Unfortunately, the user can still modify the data pointed at by the read-only pointer by using a type cast. This is called ``casting away the const-ness. Using the above example, we can still change the value of i like this:

// Cast away the constness of the pi pointer and modify i

  • ((int*) pi) = 15;

By returning a const pointer we are telling users to keep their hands off of internal data. The data can still be modified, but only with extra work (the type cast). So, in most cases users will realize they are not to modify that data, but can do so at their own risk.

There are two ways to add the const keyword to a pointer declaration. Above, when const comes before the * , what the pointer points to is constant. It is not possible to change the variable that is pointed to by the pointer. When when const comes after the *, like this:

int i = 10; int j = 11; int* const ptr = &i; // Pointer initialized to point to i

the pointer itself becomes constant. This means that the pointer cannont be changed to point to some other variable after it has been initialized. In the above example, the pointer ptr must always point at the variable i. So, statements such as:

ptr = &j; // Not allowed, since the pointer is const!

are not allowed and are caught by the compiler. However, it is possible to modify the variable that the pointer points to:

  • ptr = 15;

// This is ok, what is pointed at is not const

If we want to prevent modification of what the pointer points to and prevent the value of the pointer from being changed, we must provide a const on both sides of the * like this:

const int * const ptr = &i;

Remember that adding const to a declaration simply invokes extra compile time type checking; it does not cause the compiler to generate any extra code. Another advantage of using the const mechanism is that the C++ construct will be available to a symbolic debugger, while the preprocessing symbols generally are not.

INLINE EXPANSION

Another common use of the C #define macro expansion facility is to avoid function call overhead for small functions. Some functions are so small that the overhead of invoking the function call takes more time than the body of the function itself. C++ provides the inline keyword to inform the compiler to place the function inline rather than generate the code for calling the routine. For example, the macro

  1. define max (x, y) ((x)>(y)?(x):(y))

can be replaced for integers vy the C++ inline function

inline int max (int x, int y) { return (x > y ? x : y); }

When a similar function is needed for multiple types, the C++ template mechanism can be used.

Macro expansion can lead to notorious results when encountering an expression with side effects, such as

max (f(x), z++);

which, after macro expansion becomes:

((f(x)) > (z++) ? (f(x) : (z++));

The variable z will be incremented once or twice, depending on the values of the x and y arguments to the function max(). Such errors are avoided when using the inline mechanism.

When defining a C++ class, the body of a class member function can also be specified. This code is also treated as inline code provided it does not contain any loops (e.g., while). For example:

class A { int a;

public: A() { }

// inline

int Value()

{ return a;

}

// inline

}

Since the code for both the constructor A() and the member function Value() are specified as part of the class definition, the code between the braces will be expanded inline whenever these functions are invoked.

REFERENCES

Unlike C, C++ provides true call-by-reference through the use of reference types. A reference is an alias or a name to an existing object. They are simliar to pointers in that they must be initialized before they can be used. For example, let's declare an integer:

int n = 10;

and then declare a reference to it:

int& r= n;

Now r is an alias for n; both identify the same object and can be used interchangeably. Hence, the assignment

r = - 10;

changes the value of both r and n to -10.

It is important to note that initialization and assignment are completely different for references. A reference must have an initializer. Initialization is an operator that operates only on the reference itself. The initialization

int& r = n;

establishes the correspondence between the reference and the data object that it names. Assignment behaves like we expect an operation to, and operates through the reference on the object referred to. The assignment,

r = -10;

is the same for references as for any other lvalue, and simply assigns a new value to the designated data object.

C programmers know that C uses the call-by-value parameter mechanism. In order to enable functions to modify the values of their parameters, pointers to the parameters must be used as the ``value, which is passed. For example, a routine Swap(), which swaps its parameters would be written like this in C:

void Swap (int* a, int* b) { int tmp; tmp = *a;

  • a = *b;
  • b = tmp;

}

The routine would be invoked like this:

int x = 1; int y = 2; Swap (&x, &y);

C programmers are all too familiar with what happens when one of the ampersands is forgotten; the program usually ends with a core dump!

Now consider the C++ version of Swap() which makes use of true call-by-reference.

void Swap (int& a, int& b) { int tmp; tmp = a; a = b; b = tmp; }

The routine would be invoked like this:

int x = 1; int y = 2; Swap (x, y);

The compiler ensures that the parameters of Swap() will be passed by reference. In C, often a run-time error results if the value of a parameter is passed instead of its address. References eliminates these errors and is syntactically more pleasing.

Another use for references is as return types. Consider this routine:

int& FindByIndex (int* theArray,int index) { return theArray[index]; }

Note that the FindByIndex() returns a reference to the element in the array rather than its value. The expression FindByIndex (A, i) yields a reference to the ith element of the array A. Now, because a reference is an lvalue, it can be used on the left hand side of an expression, we can write:

FindByIndex(A, i) = 25;

which will assign 25 to the ith element of the array A.

Note that if FindByIndex() is made inline, the overhead due to the function call is eliminated. Inline functions that return references are attractive for the sake of efficiency.

DECLARATIONS AS STATEMENTS.

In a C++ program, a declaration can be placed wherever a statement can appear, which can be anywhere within a program block. Any initializations are done each time their declaration statement is executed. Suppose we are searching a linked list for a certain key:

int IsMember (const int key) { int found = 0; if (NotEmpty()) { List* ptr = head; // Declaration

while (ptr && !found) { int item = ptr->data; // Declaration

ptr = ptr->next;

if (item == key) found = 1; } } return found; }

By putting declarations closer to where the variables are used, you write more legible code.

IMPROVED TYPE SYSTEM.

Through the use of classes, user-defined types may be created, and if properly defined, C++ will behave as if they are one of the built-in types: int, char, float, and double. It is possible to define a Vector type and perform operations such as addition and multiplication just as easily as is done with ints:

// Define some arrays of doubles double a[3] = { 11, 12, 13 }; double b[3] = { 21, 22, 23 };

// Initialize vectors from the // double arrays Vector v1 = a; Vector v2 = b;

// Add the two matrices. Vector v3 = v1 + v2;

The Vector class has been defined with all of the appropriate arithmetic operations so that it can be treated as a built-in type. It is even possible to define conversion operators so that we can convert the Vector to a double, we get the magnitude, or norm, of the Vector:

double norm = (double) v3;

OVERLOADING.

One of the many strengths of C++ is the ability to overload functions and operators. By overloading, the same function name or operator symbol can be given several different definitions. The number and types of the arguments supplied to a function or operator tell the compiler which definition to use. Overloading is most often used to provide different definitions for member functions of a class. But overloading can also be used for functions that are not a member of any class.

Suppose we need to search different types of arrays for a certain value. We can provide implementations for searching arrays of integers, floats, and doubles:

int Search ( const int* data, const int key);

int Search ( const float* data, const float key);

int Search ( const double* data, const double key);

The compiler will ensure that the correct function is called based on the types of the arguments passed to Search(). When arguments do not exactly match the formal parameter types, the compiler will perform implicit type conversions (e.g., int to float) in an attempt to find a match.

Overloading is most often used for member functions and operators of classes. Most classes have overloaded constructors, for there is often more than one way to create a given object. All of the built-in types also have operators such as addition, subtraction, multiplication, and division. In fact, we can mix different types and still add them together:

int i = 1; char c = 'a'; float f = -1.0; double d = 100.0; int result = i + c + f + d;

The compiler takes applies the type conversions appropriate for the above calculation. When we define our own types, we can inform the compiler which operations and type conversions can be applied to our type. The compiler will allow our type to blend in with the built-in types. We will see more examples of this when we look at classes in detail.

A FREE STORE IS PROVIDED.

In C, variables are placed in the free store by using the sizeof() macro to determine the needed allocation size and then calling malloc() with that size. Variables are removed from the free store by calling free(). With classes, using malloc() and free() becomes tedious. C++ provides the operators new and delete, which can allocate not only built-in types but also user-defined types. This provides a uniform mechanism for allocating and deallocating memory from the free store.

For example, to allocate an integer:

int *pi; pi = new int;

  • pi = 1;

and to allocate an array of 10 ints:

int *array = new int [10]; for (int i=0;i < 10; i++) array[i] = i;

Just as with malloc() the memory returned by new is not initialized; only static memory has a default initial value of zero.

Suppose we have defined a type for complex numbers, called complex. We can dynamically allocate a complex number as follows:

complex* pc = new complex (1, 2);

In this case, the complex pointer pc will point to the complex number 1 + 2i.

All memory allocated using new should be deallocated using delete. However, delete takes on different forms depending on whether the variable being deleted is an array or a simple variable. For the complex number above, we simply call delete:

delete pc;

Delete calls the destructor for the object to be deleted. However, to delete each element of an array, you must explicitly inform delete that an array is to be deleted:

delete [] array;

The C++ compiler maintains information about the size and number of objects in an array and retrieves this information when deleting an array. The empty bracket pair informs the compiler to call the class destructor for each element in the array.

Be careful, attempting to delete a pointer that has not been initialized by new results in undefined program behavior. However, it is safe to apply the delete operator to a null pointer.

New and delete are global C++ operators and can be redefined (e.g., if it is desirable to trap every memory allocation). This is useful in debugging, but is not recommended for general programming. More often, the operators new and delete are overridden by providing new and delete operators for a specific class.

When C++ allocates memory for a user-defined class, the new operator for that class is used if it exists, otherwise the global new is used. Most often, programmers define new for certain classes to achieve improved memory management (i.e., reference counting for a class).

The Class: Data Encapsulation, Data Hiding, and Objects

Like a C structure, a C++ class is a data type. An object is simply an instantiation of a class. C++ classes have additional capabilities as the following example should show:

Vector v1(1,2),

Vector v2(2,3),

Vector vr;

vr = v1 + v2;

Vector is a class. v1, v2, and vr are objects of class Vector. v1 and v2 are given initial values through their constructor. vr is also initialized through its constructor to certain default values. The example illustrates a major power of C++. Namely, we can define functions on a class as well as data members. Here, we have an overloaded addition operator which makes our expression involving Vectors seem much more natural than the equivalent C code:

Vector v1, v2, vr;

add_vector( &vr , &v1, &v2 );

The ability to define these member functions allows us to have a constructor for Vector, code that creates an object of class Vector. The constructor ensures proper initialization of our Vectors.

Though not illustrated in the above example, a class can limit the use of its data members and member functions by non-member code. This is encapsulation. If class K defines member M as private, then only members of class K can use M. Defining M as public means any other class or function can use M.

Let's take a look at a trivial implementation of Vector that will show is a little about constructors, operators, and references.

  1. include <iostream.h>

class Vector

{ public:

Vector(double new_x=0.0,double new_y=0.0)

{ if ((new_x<100.0) && (new_y<100.0))

{ x=new_x;

y=new_y;

} else

{ x=0;

y=0;

} } Vector operator +

( const Vector & v)

{ return

(Vector (x + v.x, y + v.y));

} void PrintOn (ostream& os)

{ os << "["

<< x

<< ", "

<< y

<< "]";

}

private:

double x, y;

};

int main() { Vector v1, v2, v3(0.0,0.0);

v1=Vector(1.1,2.2);

v2=Vector(1.1,2.2);

v3=v1+v2;

cout << "v1 is ";

v1.PrintOn (cout);

cout << endl;

cout << "v2 is ";

v2.PrintOn (cout);

cout << endl;

cout << "v3 is ";

v3.PrintOn (cout);

cout << endl; }

Encapsulation of x and y means that they cannot be altered without the help of specific member functions. Any member function or data member of Vector can use x and y freely. For everyone else, the member functions provide a strict interface. They ensure a particular behavior in our objects. In the example above, no Vector can be created that has an x or y component that exceeds 100. If at some point code tries to do this, then the constructor performs bounds-checking and sets x and y both to zero. In a normal C structure we can simply do the following:

Vector v1;

InitVector( & v1, 99 , 99 );

v1.x = 1000;

InitVector() closely approximates a C++ constructor. Assume it tries to behave like the constructor Vector() in example three. This C code demonstrates how without encapsulation we can easily violate the rules set up in our pseudo-constructor. With class Vector, both x and y are private. As a result, they can only be accessed by member functions. If our goal is to prevent x and y from exceeding 100, we simply have all accessor functions perform bounds-checking. In fact, once created and outside of the addition operation there's no way to modify x or y. They are private and no member function, outside of the constructor, sets their values. Notice how the constructor Vector() limits our Vector component values. By returning a new object, the addition operator uses the constructor to check for overflow.

We could have made `+' do multiplication instead. Though such manipulation is atypical, it can be quite useful. For example, C++ comes standard with a streams library which uses the << operator to provide output.

There is one useful thing about the addition operator: we don't have to pass the addresses of arguments. The arguments for the addition operator specify that the parameters are references (using the reference operator &-not the same as the address-of operator &). Recall that the reference operator allows us to use the same calling syntax as call-by-value and yet modify the value of an argument. The reference operator avoids the overhead of actually creating a new object. Thus, we can avoid a lot of indirection.

However, the most powerful OOP extension C++ provides is probably inheritance. Classes can inherit data and functions from other classes. For this purpose we can declare members in the base class as protected: not usable publicly but usable by derived classes.

In conclusion, we looked at some of the features that make C++ a better C. C++ provides stronger type checking by checking arguments to functions and reduces syntactic errors through the use of reference types. Programmers can also add new types to the language by defining classes. Although we have only taken a brief look at classes, we will see more abstract discussion of C++ object-orientation as well as general OOP concepts in upcoming columns.