Someone posted a question on StackOverflow.com asking how to split words that have been concatenated together (likethis). This sounded like fun, so I spent an hour or two putting together a solution.
As it turns out, this is a common problem in Information Retrieval, where you might be dealing with, say, German (and Germansconcatenateeverything), and you need to split strings in order to get out your terms. So this is a naive “compound splitter” (that’s the technical term). For how the Pro’s do it, consider reading the following description of a Compound Splitter for Swedish: http://www.nada.kth.se/theory/projects/xcheck/rapporter/sjoberghkann04.pdf
But for a quick and dirty “my evening with Perl” approach, read on.