An interesting problem arose while practicing for ICPC, attempting The Net.

The problem was inherently a shortest path problem using breadth-first search; however, the interesting aspect was the input which was an absolute string representation of an adjacency list.

Objective of this Analysis

We want to explore a plaintext representation of adjacency lists to quickly reconstruct it for an in-memory structure in another language (in this case, C++).

The Plaintext Representation

The first line of the input given determines the number of vertices in our graph. The following lines list each vertex on its own line with their adjacent vertices listed after the "-". Each vertex line uses a notation that reminisces of adjacency list entries: vertex-adjacent_1,adjacent_2,adjacent_3,adjacent_n.

6
1-2,3,4
2-1,3
3-1,2,5,6
4-1,5
5-3,4,6
6-3,5

Delimiters play a critical role here in separation of persistent data (plaintext). Incidentally, delimiters are critical components of parsing plaintext data into in-memory structures.

Transforming the Plaintext Representation into an In-Memory Structure

Consider how the data is separated: "-" separates vertices from their adjancency list; "," separates individual vertices in the adjacency list. Subsequently, we can construct a parse tree based on the formal grammar that we have just defined:

Adjacency List Parse Tree for the first entry

Easy Parsing through Delimiters

From the parse tree above, we can see that data can easily be separated according to the delimiters starting from the top-most delimiter, the hyphen. Afterwards, we can subdivide the lower-level strings if necessary such as the comma-separated list on the right-hand-side.

Note that a subtle delimiter here aside from the hyphen and comma is the newline which separates (vertex, adjacency list) pairs.

After dissecting this plaintext form of data, we can obtain our individual pieces of information and store it into our in-memory structure which is inherently the adjacency list node per entry:

Adjacency List Node

  • Vertex (1)
  • Adjacency List
    1. Vertex (2)
    2. Vertex (4)
    3. Vertex (5)

The above adjacency list node appears to be a much more feasible representation as an in-memory structure. Consider it in C++:

#include <vector>

class AdjacencyListNode {
public:
    int data;
    vector<int> adjacency_list;
};

Delimiter-Separated Parsing in C++

In order to adhere to the process that we have outlined above, we need to program a few utility functions to help us parse the input data. Specifically, we need to be able to split strings and then convert from a string to an integer.

#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
#include <stdexcept>

using namespace std;

// Splits a string into a sequence of tokens based on the delimiter
vector<string> split(const string& s, char delim) {
    vector<string> elems;
    istringstream iss(s);
    string elem;
    while (getline(iss, elem, delim))
        elems.push_back(elem);
    return elems;
};

// Converts strings to another primitive data type (in this case, integers)
template <typename T>
T str_to_type(const string& s) {
   istringstream iss(s);
   T value;
   if (!(iss >> value)) throw runtime_error("invalid type conversion");
   return value;
};

With the parse tree in the previous section, we can see that two splits are necessary using the "-" and "," delimiters. Afterwards, each data is considered a string fragment; subsequently, we need to parse them all as integers. Fortunately, the transform STL algorithm can apply a function, str_to_type<int>, to all the string fragments in a container for us.

Parsing the Data

string data = "1-2,4,5";

// Parse the data into a list of tokens
vector<string> vertex = split(data, '-');
vector<string> adjacencies = split(vertex[1], ',');

The code above will read split the adjacency list entry string into the set of tokens as shown in the parse tree in the previous section.

// Read the tokens into an in-memory structure
AdjacencyListEntry entry;
entry.data = atoi(vertex[0].c_str());
transform(adjacencies.begin(), adjacencies.end(), 
    back_inserter(entry.adjacency_list), str_to_type);

After retrieving the tokens, we must construct the in-memory data structure as our objective defines.

At this point, we have completed our goal and we can quickly test its correctness by translating it back into plaintext representation:

// Print out the in-memory structure
cout << entry.data << "-";
copy(entry.adjacency_list.begin(), entry.adjacency_list.end(), 
    ostream_iterator<int>(cout, ","));

The above code should print out 1-2,4,5,. The extra "," should exist as the algorithm defines; however, it's easy to see that the in-memory structure can successfully translate back to its plaintext representation.

The Final Source Code

#include <iostream>
#include <sstream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <string>
#include <stdexcept>

using namespace std;

class AdjacencyListEntry {
public:
    int data;
    vector<int> adjacency_list;
};

// Splits a string into a sequence of tokens based on the delimiter
vector<string> split(const string& s, char delim) {
    vector<string> elems;
    istringstream iss(s);
    string elem;
    while (getline(iss, elem, delim))
        elems.push_back(elem);
    return elems;
};

// Converts strings to another primitive data type (in this case, integers)
template <typename T>
T str_to_type(const string& s) {
   istringstream iss(s);
   T value;
   if (!(iss >> value)) throw runtime_error("invalid type conversion");
   return value;
};

int main(string argc, char* argv[]) {
    string data = "1-2,4,5";

    // Parse the data into a list of tokens
    vector<string> vertex = split(data, '-');
    vector<string> adjacencies = split(vertex[1], ',');

    // Read the tokens into an in-memory structure
    AdjacencyListEntry entry;
    entry.data = atoi(vertex[0].c_str());
    transform(adjacencies.begin(), adjacencies.end(), 
              back_inserter(entry.adjacency_list), str_to_type<int>);

    // Print out the in-memory structure
    cout << entry.data << "-";
    copy(entry.adjacency_list.begin(), entry.adjacency_list.end(), 
         ostream_iterator<int>(cout, ","));

    return 0;
};

Conclusions

The data structure is complex yet the plaintext representation is fairly straightforward. It's possible then that all data can easily be separated and distinguishable using delimiter-separated values. A product of this can be seen through the CSV and TSV file formats.

Consequently, the plaintext representation of data is the most ideal way to represent it because of its portability; however, it isn't always easy to parse in another language.