Journal beachdog's Journal: Review of Natural Language Processing With Python by...
Natural Language Processing with Python.
Analyzing Text with the Natural Language Toolkit,
by Steven Bird, Ewan Klein and Edward Loper.
Published by O'Reilly Media Inc., c. 2009, price $44.99
Reviewed by Lee McKusick
07/22/09
Natural Language Processing with Python is about scanning text samples of human languages like English, or Persian or Chineese with computer routines and doing tasks like counting word frequencies, parsing sentences, and further analyses that begin the difficult task of finding limited kinds of meaning in pieces of text .
The book has a matching website www.nltk.org.
This book is addressed to a broad academic community:
One audience is liberal arts students..
The second audience is the computer science based student.
The third audience is teachers and researchers worldwide.
This book tries hard to be a high quality introduction to natural language processing.
Natural Language Processing itself is one of the great problems of computing. One of the enjoyable things this book does is the authors carefully outline some of the great problems in computer science that are central to natural language processing. These problems are described starting with the texts and programs provided in the toolkit. The liberal arts students are included right at the start. The discussions include further reading references to the classics of the field, like Knuth.
Natural Language Processing is also a field of some interest and utility to linguists, critics, historians, students of language and rhetoric and students of 20th century philosophy. This dimension is also covered with a good sequence of examples and references.
I remember reading the philosopher Wittgenstein (his writings vintage 1943) where he did thought experiments of putting words in a tray. This way of thinking about meaning is a provocative way of thinking about meaning that could lead to some interesting Toolkit projects.
The fourth audience for this book might be the programmer seeking an interesting opportunity:
Is this a book that might help me write a project specific text analysis engine? I have been wishing for a way to clarify and reorganize the Ubuntu Forums website with a structured language query tree.
Would the NLTK be useful if I wanted to write a search engine?
Good news item one is the executable code of the NLTK is licensed under the Apache license. This means the components can be used in a new project. The web site of the natural language toolkit is licensed under a Creative Commons license. A link to both license statements is:
http://code.google.com/p/nltk/
Good news item two is there already exists a site scraper project that has NLTK lead author Steven Bird as a contributor. The Google Code web site for this Site Scraper is:
http://code.google.com/p/sitescraper/
Would the NLTK be useful if I wanted to figure out the vocabulary used by a specific group of people to talk about a specific subject? A really fascinating item in this book in chapter 6 is the "Maximium Entropy Classifier". Here is the first occurrence in print of a formula for entropy that I can understand and duplicate with a pocket calculator.
Entropy is a key concept discussed by Shannon in his classic information theory article. I sometimes feel very disappointed that computers are not doing much with information. Rather, computers and the Internet are moving data very well. But the computers are not doing much in the way of "information processing".
Does the Natural Language Processing Toolkit summarize the state of the art in natural language processing on a computer?
This tool kit embodies the divide and conquer approach to language processing. Some of the tasks being worked on in the later chapters include making connections between two sentence statements, attempting to pick out fact phrases from text, attempting to parse symbolic logic text statements.
8-10-2009. Corrected this journal entry to note the Apache license and the Google Code URL links. A copy of this review is posted on the O'Reilly web site, reviews section.
Review of Natural Language Processing With Python by... More Login
Review of Natural Language Processing With Python by...
Slashdot Top Deals