# Lunch break coding - Word Break Problem

## Simple programming puzzle to break up words according to dictionary.

Posted on August 15, 2011 haskell, puzzle, algorithm .

Another funny little programming puzzle that I found here describes the Word Break Problem.

Given an input string and a dictionary of words, segment the input string into a space-separated sequence of dictionary words if possible. For example, if the input string is “applepie” and dictionary contains a standard set of English words, then we would return the string “apple pie” as output.

So let’s give it a shot at using haskell:

First we need some imports and a convenient data type for the dictionary that we are fed:

``````import Data.List(inits)
import qualified Data.Set as S
import Maybe(isJust)
import Control.Applicative((<\$>))
import System(getArgs)
type Dictionary = S.Set String``````

Next the formulation of the problem itself:

``````breakWords ::  String -> Dictionary -> Maybe [String]
breakWords xs dict = splittUp xs [] where
splittUp :: String -> [String] -> Maybe [String]
splittUp [] rs = Just \$ reverse rs -- done, give back result
splittUp xs rs = safeHead \$ filter isJust solutions where
solutions = [splittUp rest (match:rs) | (match,rest) <- findMatches xs dict]``````

And of course the helper functions we need for this to work:

``````findMatches :: String -> Dictionary -> [(String,String)]
findMatches xs dict =
[(x,drop (length x) xs) | x <- reverse \$ inits xs, x `S.member` dict]

Now drop into ghci for a quick test to validate the implementation:

```*Main> S.fromList . lines <\$> readFile "/usr/share/dict/words" >>=
print . breakWords "sometimesitisjustsosimple"
Just ["sometimes","it","is","just","so","simple"]
```

Seems to work alright!
This implementation will give us exactly one valid solution (trying first to find the largest possible matches).
If we are interested in all possibilities, this is just a small tweak:

``````breakWords2 xs dict = splittUp xs [] where
splittUp [] rs = [Just \$ reverse rs] -- done, give back result
splittUp xs rs = filter isJust \$ concat solutions where
solutions = [splittUp rest (match:rs) | (match,rest) <- findMatches xs dict]``````

Full source-code is available here.