C++ Container Mechanisms

Posted on
Programming Containers Algorithms C++

Below are my notes on section 3 of “A tour of C++”

Libraries

The C++ standard library provides several facilities to the programmer, in order to simplify the given task. These include:

  • Basic Runtime Language Support
  • C standard Library
  • Strings and I/O streams
  • Framework of containers
  • Support for numerical computation
  • Support for REGEX matching
  • Support for Concurrent processing
  • Utilities to sopport template metaprorgamming
  • Smart pointers
  • Special-purpose containers

Strings

String concatenation is efficient given it has a move operator. To append to the end of the string, you can use the operator +=. A string is mutable, so operations like string replacement and character substitution and assignment are supported.

string name = "Niels Stroustrup"
void m3(string& name){
  string s = name.substr(6, 10); // s = Stroustrup
  name.replace(0, 5, "nicholas"); // name = nicholas Stroustrup
  name[0] = 'N'; // name = Nicholas Stroustrup
}

Stream I/O

The standard library provides formatted character input and output through the iostream library.

Output

The I/O stream library defines output for every built-in-type. The operator << is used as an output operator for objects of type ostream.

Input

The standard library uses istream for input. It deals with input from every built-in-type and can easily be extended to cope with user-defined types. The operator >> is used as a input operator.

Strings I/O

To read sequences of characters, we can use string. By default, a string terminates reading when a whitespace character is reached.

If you want to read an entire line, the function getline() can be used. This function discards the newline that terminates the line.

I/O of user-defined types

The I/O library allows programmers to define I/O for their own types. For example:

struct Entry{
  string name;
  int number;
};

ostream& operator<<(ostream& os, const Entry& e){
  return os << "{ " << e.name << " | " << e.number <<  " }";
}

istream& operator>>(istream& is, Entry& e){
  // Read {"name", number} pair.
  char c, c2;
  if(is>>c && c=='{' && is>>c2 && c2=='"'){ // start to match {"
    string name;
    while(is.get(c) && c!='"'){
      name+=c;
    }
    if(is>>c && c==','){
      int number = 0;
      if(is>>number>>c && c=='}'){ // Read the number and a }
        e = {name, number};        // Assign to the entry
        return is;
      }
    }
  }
  is.setf(ios_base::failbit); // register the failure in the stream
  return is;
}

An input operator returns a reference to its istream. This way, we can find out if the operation has succeeded.

The is>>c skips a whitespace by default, but is.get(c) does not. Therefore this entry-input operator skips whitespaces outside of the name string, but not within it.

Containers

A class with the main purpose of holding objects is called a container.

Vector

A vector is a sequence of contiguously stored elements of a given type.

We can initialize a vector with a set of values of its element types:

Vector<Entry> phone_book = {
  {"David Hume", 123456},
  {"Karl Popper", 2345678},
  {"Bertrand Arthur William Russell", 3456789}
};

We can also print the contents of the vector by accessing it’s elements through the operator [].

One of the most useful operations on a vector is push_back(), which adds a new element at the end of a vector, increasing it’s size() by 1.

Assigning a vector involves copying it’s contents. (v2 = v1) Such operations might seem harmless, but if a vector is very big, they can be very expensive to complete. In these cases, references, pointers or move operations are encouraged.

Elements

A vector contains the elements assigned to each container, therefore a assignment such as: v[0] = 2 actually inserts 2, (and not a reference to the value) into the vector’s v[2] container. This makes for fast, compact containers.

Range-checking

By default, vector doesn’t guarantee range-checking. This is a common problem. To overcome this, you can implement a vector template which returns a out_of_range exception if it is accessed out of range.

template<typename T>
class Vec : public std::vector<T> {
  public:
    using vector<T>::vector; // Use the constructor from vector
                             // under the name Vec
    T& operator[](int i){ return vector<T>::at(i); } // range-checked
    const T& operator[](int i) const {  return vector<T>::at(i); } // const range-checked
}

Vec inherits everything from vector except for the subscript operations that it re-defines to do range-checking. The at() operation is a vector subscript operation that throws an exception type out_of_range if it’s argument is out of the vector’s range.

List

The standard library offers a doubly-linked list called list.

We use a list for sequences where we want to insert and delete elements without moving other elements.

list<Entry> phone_book = {
  {"David Hume", 123456},
  {"Karl Popper", 2345678},
  {"Bertrand Arthur William Russell", 3456789}
};

To search for a element in a list, we search for the value we are looking for instead of it’s index (Done in vector).

To insert or delete a element of a list, we use a iterator. A list iterator identifies a element of a list and can be used to iterate through a list. Every standard library container provides the functions begin() and end() which return an iterator to the first and to one-past-the-last element respectively.

Adding and removing elements from a list is done by the provided functions:

void f(const Entry& ee, list<Entry>::iterator p, list<Entry>::iterator q){
  phone_book.insert(p, ee);   // Add ee before the element refered to by p
  phone_book.erase(q);        // Remove the element refered to by q
}

When all we want is a sequence of element, we have a choice between using a vector and a list. Unless we have a reason to do so, use a vector. A vector performs better for traversal (find() and count()) and for searching and sorting operations (binary_search() and sort()).

Map

The standard library’s take on a search tree. (Balanced Binary Tree) Is a container of pairs of values optimized for lookup

map<Entry> phone_book = {
  {"David Hume", 123456},
  {"Karl Popper", 2345678},
  {"Bertrand Arthur William Russell", 3456789}
};

When indexed for a value of it’s first type (key) the map returns the value of it’s second type (value). To obtain a name’s phone number, we could simply do: return phone_book[key]; provided surrounding logic. The cost of a lookup is O(log(n)) where n is the number of elements in the map.

If a key isn’t found, it will insert it and assign the default type’s value to it’s value. If we don’t want to insert a key, and we only want to get the value of a key, we have to use find()

Unordered_map

If the order the elements are arranged in isn’t important, we should use the unordered_map. This container uses a hashed lookup rather than a ordering function, such as <.

Unordered_map

unordered_map<Entry> phone_book = {
  {"David Hume", 123456},
  {"Karl Popper", 2345678},
  {"Bertrand Arthur William Russell", 3456789}
};

Like map, we can access a element using the operator [] like so: return phone_book[key];

The standard-library unordered_map provides a hash function for strings. If we need to lookup other values, we can provide our own.

Algorithms

A data structure is useless without useful algorithms to operate upon it. The standard library provides a useful set of these.

If more than one element is written (For example, in a unique_copy(vec.begin(), vec.end(), lst.begin())) the elements following that initial element will be overwritten. Thus to avoid errors, lst must have at least as many elements as unique values in vec. If we would have wanted to insert values into a new container, we could have written:

list<Entry>f(vector<Entry>& vev){
  list<Entry> res;
  sort(vec.begin(), vec.end());
  unique_copy(vec.begin(), vec.end(), back_inserter(res)); // Append to res
  return res;
}

back_inserter() appends elements to a container extending the container to make room for them. The combination of the standard containers and back_inserter() eliminate the need to use error-prone explicit C-style realloc() memory management.

Use of iterators

Many algorithms return iterators. When we use find(), it returns a iterator to the object found or end() if not found. A example of the use of find() to check if a element is in a container could be written like so:

bool has_c(const string& s, const char c){
  return find(s.begin(), s.end(), c)!=s.end();
}

Another use of a iterator could be to find out the locations of specific characters in a string:

vector<string::iterator> find_all(string& s, const char c){
  vector<string::iterator> res;
  for(auto i = s.begin(); i != s.end(); ++i){
    if(*i == c){
      res.push_back(i);
    }
  }
  return res;
}

We could test for the find_all() integrity like so:

bool test(){
  string s{"Mary had a little lamb"};
  for(auto i : find_all(s, 'a'))
    if(*i != 'a')
      return false;
  return true;
}

Generalizing find_all():

template<typename T>
using iterator<T> = typename T::iterator;

template<typename T, typename V>
vector<iterator<T>> find_all(const T& t, const V v){
  vector<iterator<T>> res;
  for(auto i = t.begin(); i != t.end(); ++i){
    if(*i == c){
      res.push_back(i);
    }
  }
  return res;
}

Iterators are used to separate algorithms and containers. An algorithm iterates on it’s data through iterators and knows nothing about the container on which elements are stored. Conversely, a container knows nothing about the algorithms operating on it’s elements. All it does, is supply iterators upon request (begin(), end()). The result is very general and flexible software.

Iterator types

There are as many different types of iterators as containers the iterators can operate in. A iterator on a vector, might be as simple as a pointer to the element’s location, but a iterator on a linked list, cannot, because each link doesn’t know where the next link resides.

There is however a common notation on iterators. Operator ++ always refers to the next element. Operator * yields the element to which the iterator refers. Each container knows it’s iterator type and makes them available under the conventional names iterator and const_iterator. For example, list<Entry>::iterator is the general iterator type for list<Entry>.

Stream iterators

An Input stream produces a sequence of values and we write a sequence of values to an output stream, consequently, the notion of iterators can be usefully applied to Input and Output.

To make a ostream_iterator we have to specify the stream used and the type of object written to it. For example, we can define an iterator that refers to the standard output stream cout.

ostream_iterator<string> oo {cout};

The effect of assigning to *oo is to write the assigned value to cout.

int main(){
  ostream_iterator<string> oo {cout};
  *oo = "Hello, ";  // Equivalent to "cout << "Hello, ";
  ++oo;
  *oo = "World!\n"; // Equivalent to "cout << "World!\n";
  return 0;
}

The ++oo is done to mimic writing writing into an array through a pointer.

Similarly, an istream_iterator is something that allows us to treat an input stream as a read-only container.

istream_iterator<string> ii {cin};

istream_iterators are represented by pairs, representing a sequence, so we must provide a istream_iterator to indicate the end of input. This is the default istream_iterator:

istream_iterator<string> eos {};

Typically, istream_iterators and ostream_iterators are not used directly, but are passed as arguments to algorithms.

// A simple program to read a file, sort the words read, eliminate duplicates and write the result to another file.

int main(){
  string from, to;
  cin >> from >> to;                      // Get source and target file names

  ifstream is {from};                     // Input stream for file "from"
  istream_iterator<string> ii {is};       // Input iterator for stream
  istream_iterator<string> eos {};        // Input sentinel

  ofstream os {to};
  ostream_iterator<string> oo {os, "\n"}; // Output iterator for stream

  vector<string> b {ii, eos};             // b is a vector initialized from input [ii:eos)
  sort(b.begin(), b.end());               // sort the buffer

  unique_copy(b.begin(), b.end(), oo)     // copy buffer to output, discard replicated values

  return !is.eof() || !os;                // Return error states
}

Predicates

We often want to make the action a parameter to the algorithm. For example, the find() algorithm provides a convenient way of looking for a specific value. A more general variant looks for an element that fulfills a specified requirement. A predicate.

Algorithm overview

A list of standard algorithms that provide a general purpose interface to mutate and operate over containers.

algorithms

Container algorithms

A sequence is defined by a pair of iterators [begin:end). This is general and flexible, but often, we want to sort the contents of a container.

We can easily provide a shorthand for this:

namespace Estd{
  using namespace std;

   template<class T>
   void sort(T& t){
    sort(t.begin(), t.end());
   }

   template<class T, class Pred>
   void sort(T& t, Pred p){
    sort(t.begin(), t.end(), p);
   }
   ...
}