This service can be used to restore punctuation in unsegmented English text. Works best for Europarl-style text. Send me an e-mail to ottokar.tilk@phon.ioc.ee if you have any questions or problems. The service can also be used by sending the text with HTTP POST directly, e.g:
curl -d "text=hello%20world" http://bark.phon.ioc.ee/punctuator
We used roughly first 80% of lines from the Europarl v7 monolingual English corpus as trainging data, next 10% as development data and last 10% as test data (preprocessing script here). The training set size was about 40 million words. The corpus was obtained from the IWSLT 2012 TED task web page.
Try an example of a few random sentences from our test set.