Lucene keeps on blowing my mind, but find how to do rudimentary things with it is not too simple.
Suppose you want the index to no longer show a doucment that you deleted. As far as I understand – after some research pain – this involves six steps: [of course, there is definitely more than one way to do this, and I am by no means a Lucene expert]
1. Find the document’s id. That is the id Lucene, not you, gave the document.
2. Get an Directory
object for the index directory.
3. Get an IndexReader
for that directory
4. Unlock that directory
5. Delete the document
6. Close the IndexReader
object
Each step is almost its own procedure.
1. Find the document’s id
This is the more elaborate step. You need to search your index for the doucment you wish to delete. To do so, I ran a query against the index.
(This sample query will show you the names and indexs of documents that match on a field called “contents”):
Directory fsDir = FSDirectory.getDirectory(indexDir, false);
IndexSearcher is = new IndexSearcher(fsDir);
Query query = QueryParser.parse(search_term, "contents", new StandardAnalyzer());
Hits hits = is.search(query);
System.out.println("Found " + hits.length() + " document(s) that matched query '" + q + "':");
for (int i = 0; i < hits.length(); i++) {
Document doc = hits.doc(i);
System.out.println(doc.get("filename") + " score: " + hits.score(i) + " id: " + hits.id(i));
}
Finding the id, as you see, involves the Hits
object, which holds the precious id(int hit_position)
method that returns you the id.
Now that you have the id, you can proceed and start the real deletion process:
2. Get an Directory
object
Similar to what we did above, you get a Directory object from the FSDirectory
. That is easy enough.
3. Get an IndexReader
object
The IndexReader
is an abstract class, so in order to get the concrete implementation for it, you instantiate it using a call like:
IndexReader ir = IndexReader.open(fsDir);
where fsDir
is the Directory
object we created in step 2.
4. Unlock the Direcotry
Lucene uses file locks to secure the index and the updates happening to it. To delete a document, you have to first unlock the directory, and the IndexReader
object will be happy to do that for you:
ir.unlock(fsDir);
5. Delete the document
Finally, we ask the IndexReader
to delete the document using the id we found in step 1 - which we intuitively put in a variable called docId
:
ir.delete(docId);
6. Close the IndexReader
object
Nothing will happen unless you close the IndexReader
object - the document will not be deleted. Easy enough, close it then:
ir.close();
Voila.