Categories
Java

Jakarta Commons Net needs Jakarta Oro

I am working on a project that attempts to leverage the FTP capabilities of the Jakarta Commons Net package. The Net package was originally written by an outfit called ORO and most kindly donated to Jakarta for maintenance. When running an demo application I wrote with the Net package, I encountered an error that a class was not found – and it appears that you also need to have the Net package’s sister package, now called Jakarta Oro, in your application classpath.

There, now use this information.

Share
Categories
Java

Cocoon source missing blocks

I am taking an XML/XSLT class and in order to spare myself the need to share the machine with other XSLT newbies such as me, exhausting its resources to death, I decided to go ahead and install Cocoon.

It appears the good folks at Apache decided that providing binary builds is not a great idea so it was me getting the source and building it using the included build.bat (or build.sh) files. I downloaded to considerable file (44-46 MB) and ran the script. To my disappointment, I failed to get anything created as the webapp that Cocoon can become.

I then tried creating the webapp using the
build webapp
instruction that is reported around the Internet.

I then received the ever so cryptic
%COCOON_HOME%\src\blocks\stx\java not found

I failed to find any answers at the time (although I did now) and that was very frustrating.

It appears that the missing folder – blocks\stx\java – is an empty folder. Some decompression programs, merrilly omit expanding the directory if there is nothing there and so I was, totally puzzled. So as the link above suggests, just create a java folder and the problem is solved….

Share
Categories
Java Web Development

POI / TextMining.org Error when extracting text from a Word File

My client was experiencing difficulties when trying to index Word files into Lucene.
I am using the text extraction library from TextMining.org but the issue occurs also when using Apache POI (which TextMining.org is related to).

The exception being thrown is:

Exception while extracting Word file: Invalid header signature

After opening one of the questionable files I found out that they were actually RTF files saved as Word doc files. Only after saving the file under a different name (using Save As…) and explicitly specifying the file to be a Word Document did was the file properly saved and summarily had its text extracted succesfully.

Also, make sure that Word is not using the Fast Save option as it will also cause issues when extracting text.

Share
Share