Facebook puzzle Breathalyzer

To safeguard against the dreaded phenomenon of wall posting while drunk, Facebook is implementing a feature that detects when post content is too garbled to have been done while sober and informs the user that they breathalyzer-puzzleneed to take an online breathalyzer test before being allowed to post.

My Solution:
Your are given input sentence and dictionary.
Your program must print out the minimum number of changes necessary to turn all words in the input wall post into accepted words as defined by the word list file. Words may not be joined together, or separated into multiple words. A change in a word is defined as one of the following:
Replacing any single letter with another letter.
Adding a single letter in any position.
Removing any single letter.
All the 3 rules above shouts for Levenshtein distance, which is a metric for measuring the amount of difference between the two sequences.

After doing lot of search for the data structure to solve this problem I came across with the interesting datastructure. A BK-tree is a metric tree suggested by Walter Austin Burkhard and Robert M. Keller [1] specifically adapted to discrete metric spaces. For simplicity, let us consider integer discrete metric d(x,y). Then, BK-tree is defined in the following way. An arbitrary element a is selected as root node. The root node may have zero or more subtrees. The k-th subtree is recursively built of all elements b such that d(a,b) = k. BK-trees can be used for approximate string matching in a dictionary [2].

BK-Trees were first proposed by Burkhard and Keller in 1973, in their paper “Some approaches to best match file searching”. The only copy of this online seems to be in the ACM archive, which is subscription only. Further details, however, are provided in the excellent paper “Fast Approximate String Matching in a Dictionary”. Good Explanation about the BK-Tree can be found at Damn Cool Algorithms, Part 1: BK-Trees

Here is how I coded the solution.

Interface declaration for BK-Tree is as follows:

class BkTree {
    public:
        BkTree();
        ~BkTree();
        void insert(std::string m_item);
        int  getWithinDistance(std::string center, size_t k);
        std::vector<std::string> wordVector;
        void solve();
        int BuildDictionary();
        int ReadInputFile(const char *inpFileName);
    private:
        size_t EditDistance( const std::string &s, const std::string &t );
        struct Node {
            std::string m_item;
            size_t m_distToParent;
            Node *m_firstChild;
            Node *m_nextSibling;
            Node(std::string x, size_t dist);         
            ~Node();
        };
        Node *m_root;
    protected:
};

EditDistance is implemented below using dynamic programming approach.

size_t BkTree::EditDistance( const std::string &left, const std::string &right ) {
    size_t asize = left.size();
    size_t bsize = right.size();
    std::vector<size_t> prevrow(bsize + 1);
    std::vector<size_t> thisrow(bsize + 1);
    for(size_t i = 0; i <= bsize; i++)
        prevrow[i] = i;
    for(size_t i = 1; i <= asize; i++) {
        thisrow[0] = i;
        for(size_t j = 1; j <= bsize; j++) {
            thisrow[j] = std::min(prevrow[ j - 1] + size_t(left[ i - 1] != right[ j - 1]), 
                    1 + std::min(prevrow[j],thisrow[ j - 1]) );
        }
        std::swap(thisrow,prevrow);
    }
    return prevrow[bsize];
}

Build the dictionary using BK-Tree.

int BkTree::BuildDictionary) {
    std::ifstream dictFile("/var/tmp/twl06.txt");
    if ( dictFile.peek() == EOF ) {
        return -1;
    } 
    std::string line; 
    if (dictFile.is_open()) {
        while (! dictFile.eof() ) {  
            std::getline (dictFile,line);
            trim(line);
            std::transform(line.begin(), line.end(), line.begin(), ::tolower);
            insert(line);
        }
        dictFile.close();
    } else {
        return EXIT_FAILURE;
    }
    return 0;
}

Try getting a new word from a dictionary using a BK-Tree based dictionary. Start with Edit Distance equals to 0, if found no changes are required and keep incrementing Edit Distance till you find some word in the dictionary.

int BkTree::getWithinDistance(std::string center, size_t k) {
    if( !m_root ) return 0;
    int found = 0;
    std::queue< Node* > nodeQueue;
    nodeQueue.push( m_root );
    while( !nodeQueue.empty() ) {
        Node *t = nodeQueue.front(); 
        nodeQueue.pop();
        size_t d = EditDistance( t->m_item, center );
        if( d <= k ) { 
            found++; 
            break;
        }
        Node *pChild = t->m_firstChild;
        while( pChild ) {
            if( d - k <= pChild->m_distToParent && pChild->m_distToParent <= d + k )
                nodeQueue.push(pChild);
            pChild = pChild->m_nextSibling;
        }
    }
    return found;
}

Complete code with tested cases are uploaded on my .

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s