下载并安装Java运行环境(JRE)和Apache-Tika服务器。
在命令行中运行以下命令启动Tika服务器:
java -jar tika-server.jar
通过向服务器发送HTTP请求来使用Tika服务器。可以使用Java或Python等编程语言进行处理。以下是Java的示例代码:
URL url = new URL("http://localhost:9998/tika"); URLConnection connection = url.openConnection(); connection.setDoOutput(true); connection.setRequestProperty("Content-Type", "text/plain");
OutputStreamWriter out = new OutputStreamWriter(connection.getOutputStream()); FileInputStream stream = new FileInputStream(new File("example.docx"));
byte[] buffer = new byte[stream.available()]; while (stream.read(buffer) != -1) {}
out.write(new String(buffer)); out.close();
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream())); String line; while ((line = in.readLine()) != null) { System.out.println(line); } in.close();
运行代码后,服务器将解析指定的文件并返回提取到的文本。