Pattern web mining module

CLIPS (Computational Linguistics & Psycholinguistics) has released a new module for web mining for Python called Patterns.

Quoting from the CLIPS site:

It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).

Visit their site to get the download.

Kaggle hosts data mining competitions

Kaggle has hosted several data mining competitions, similar to the Netflix prize, but recently announced a new and big one. It’s called the Heritage Health Prize and the prize has been set at $3M. The focus on the prize is being able to predict when a person needs to go to the hospital before they actually make a visit. Here’s some more info from O’Reilly Radar. And here is Anthony Goldbloom of Kaggle announcing the contest at the Strata Conference…