Categories
Java Web Development

Deleting a document with Lucene

Lucene keeps on blowing my mind, but find how to do rudimentary things with it is not too simple.
Suppose you want the index to no longer show a doucment that you deleted. As far as I understand – after some research pain – this involves six steps: [of course, there is definitely more than one way to do this, and I am by no means a Lucene expert]

1. Find the document’s id. That is the id Lucene, not you, gave the document.
2. Get an Directory object for the index directory.
3. Get an IndexReaderfor that directory
4. Unlock that directory
5. Delete the document
6. Close the IndexReader object

Each step is almost its own procedure.
1. Find the document’s id
This is the more elaborate step. You need to search your index for the doucment you wish to delete. To do so, I ran a query against the index.
(This sample query will show you the names and indexs of documents that match on a field called “contents”):

Directory fsDir = FSDirectory.getDirectory(indexDir, false);
IndexSearcher is = new IndexSearcher(fsDir);
Query query = QueryParser.parse(search_term, "contents", new StandardAnalyzer());
Hits hits = is.search(query);
System.out.println("Found " + hits.length() + " document(s) that matched query '" + q + "':");
for (int i = 0; i < hits.length(); i++) { Document doc = hits.doc(i); System.out.println(doc.get("filename") + " score: " + hits.score(i) + " id: " + hits.id(i)); }

Finding the id, as you see, involves the Hits
object, which holds the precious id(int hit_position) method that returns you the id.

Now that you have the id, you can proceed and start the real deletion process:

2. Get an Directory object
Similar to what we did above, you get a Directory object from the FSDirectory. That is easy enough.

3. Get an IndexReader object
The IndexReader is an abstract class, so in order to get the concrete implementation for it, you instantiate it using a call like:
IndexReader ir = IndexReader.open(fsDir);
where fsDir is the Directory object we created in step 2.

4. Unlock the Direcotry
Lucene uses file locks to secure the index and the updates happening to it. To delete a document, you have to first unlock the directory, and the IndexReader object will be happy to do that for you:
ir.unlock(fsDir);

5. Delete the document
Finally, we ask the IndexReader to delete the document using the id we found in step 1 - which we intuitively put in a variable called docId:
ir.delete(docId);

6. Close the IndexReader object
Nothing will happen unless you close the IndexReader object - the document will not be deleted. Easy enough, close it then:
ir.close();

Voila.

Share

Leave a Reply

Your email address will not be published. Required fields are marked *

 

Share