Apache Tika™ 4.0.0-alpha-1 is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
Copyright notice - License terms