图书标签: 文本分析 TextMining 计算机 IR 计算机科学 数据挖掘 NLP Programming
发表于2024-12-27
Taming Text pdf epub mobi txt 电子书 下载 2024
It is no secret that the world is drowning in text and data. This causes real problems for everyday users who need to make sense of all the information available, and software engineers who want to make their text-based applications more useful and user-friendly. Whether you're building a search engine for a corporate website, automatically organizing email, or extracting important nuggets of information from the news, dealing with unstructured text can be a daunting task.
Taming Text is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. The book guides you through examples illustrating each of these topics, as well as the foundations upon which they are bulit.
Grant Ingersoll is an independent consultant developing search and natural language processing tools. Prior to being a consultant, he was a Senior Software Engineer at the Center for Natural Language Processing at Syracuse University with 11 years of hands-on experience developing Java applications, many of which have been spent working on text processing applications. At the Center and, previously, at MNIS-TextWise, Grant worked on a number of text processing applications involving information retrieval, question answering, clustering, summarization, and categorization. Grant is a committer, as well as a speaker and trainer, on the Apache Lucene Java project and a co-founder of the Apache Mahout machine-learning project. He holds a master's degree in computer science from Syracuse University and a bachelor's degree in mathematics and computer science from Amherst College.
Thomas Morton writes software and performs research in the area of text processing and machine learning. He has been the primary developer and maintainer of the OpenNLP text processing project and Maximum Entropy machine learning project for the last 5 years. He received his doctorate in Computer Science from the University of Pennsylvania in 2005, and has worked in several industry positions applying text processing and machine learning to enterprise class development efforts. Currently he works as a software architect for Comcast Interactive Media in Philadelphia.
几百年前读完的,对于文本分析处理入门来说很翔实
评分NLP相关的,已介绍工具和库为主,偏实用
评分看完了MEAP,已经完成九章中的七章,写作进行将尽了三年了。满怀希望而来,失望而归。本身题材难度就大,定目标又不明确,既然是工程方面的书,不如写成taming text with solr, umia and open nlp, 集成一些工程实践方面的结果。总之,此书可以忽略。
评分文字不是那么流畅,介绍了Solr/Lucene, OpenNLP,还有很多其他的开源工具,较全面的介绍了NLP相关问题。
评分几百年前读完的,对于文本分析处理入门来说很翔实
还是那句话,有英文版的就绝不要读中文版的,特别是对于技术书籍。翻译的低级错误真是太多了。我就读了中文版不到一章就发现好多坑。 吐槽开始: 中文版77、81页:3.6.1 数量判定 3.6.2 判断数量 这他么玩文字游戏呢!换个位置就好了?! 对应的英文版是3.6.1 Judging qualit...
评分偏重实践的书,理论部分略有欠缺。最重要的是:只讨论了Java。现在NLP应该Python是主流。 ---------------------------------- ---------------------------------- ---------------------------------- ---------------------------------- ---------------------------------...
评分还是那句话,有英文版的就绝不要读中文版的,特别是对于技术书籍。翻译的低级错误真是太多了。我就读了中文版不到一章就发现好多坑。 吐槽开始: 中文版77、81页:3.6.1 数量判定 3.6.2 判断数量 这他么玩文字游戏呢!换个位置就好了?! 对应的英文版是3.6.1 Judging qualit...
评分还是那句话,有英文版的就绝不要读中文版的,特别是对于技术书籍。翻译的低级错误真是太多了。我就读了中文版不到一章就发现好多坑。 吐槽开始: 中文版77、81页:3.6.1 数量判定 3.6.2 判断数量 这他么玩文字游戏呢!换个位置就好了?! 对应的英文版是3.6.1 Judging qualit...
评分偏重实践的书,理论部分略有欠缺。最重要的是:只讨论了Java。现在NLP应该Python是主流。 ---------------------------------- ---------------------------------- ---------------------------------- ---------------------------------- ---------------------------------...
Taming Text pdf epub mobi txt 电子书 下载 2024