Friday, August 22, 2008

Importing spreadsheet data into Derby/JavaDB

Adding spreadsheet data to Apache Derby



I have a number of Spreadsheets with data in them, and the trick is to get them into a database so I can make some sense out of them (or be able to tell that they do not make sense, as the case may be). And I figured this would be a decent time to work on a new database. I could have used SQLite that I've used in the past. But I decided to use a server-class database, just because I've never used one. So I choose Derby (because I can still use it as if it were a single-site database). Also, there is some sensitive data, so I needed to password protect it.

Import CSV files

Creating CSV files
The first trick is always getting the data into the database. I opened up the spreadsheet in OpenOffice.org, then copied the row names to text file. That text file turned into a CREATE TABLE statement. The first problem was some of the names were reserved words, in particular 'Case'.

Next, I removed extra rows from the top of the database, including some instructions and the column headers. To make the comma-separated-values text work, I checked the formatting for all the columns. In particular, I formated the dollar figures to remove currency symbols, and insured that fields intended to be numbers were explicitely formated as numbers.


Then I exported the data to text (CSV) with double-quotes (") to delimit text, and semi-colons (;) to delimit fields.

From the CSV file, I removed empty rows and some extraneous text. The big one was where a space " " was in a cell that should have been empty. I got rid of those by searching for [;" ";] where there should have been a [ ;; ] that represented an empty field.

Import CSV files using ij


Finally, was the actual import. I had earlier added the Derby scripting tool, ij, into my path by putting

# add DERBY environmental variables
export DERBY_HOME=/usr/share/javadb
export PATH="$DERBY_HOME/bin:$PATH"


into my .bashrc file. In my script file, I had added the following above the CREATE TABLE SQL text:

connect 'jdbc:derby:Sample;create=true';
DROP TABLE Sampletable;

Then at the bottom, I entered the data into the newly created table by adding the following after the CREATE TABLE statement

CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (null,'Sampletable','Sampledata.csv',';',null,null,0);

At this point, I ran

Select * from Sampletable;

in ij just to make sure it got in.

Setting a password



Next question was could I put a password on this. From within ij I created a database user with password, then set the database to require a password.

call syscs_util.syscs_set_database_property ('derby.user.testuser', 'test');
call syscs_util.syscs_set_database_property ('derby.authentication.provider', 'BUILTIN');
call syscs_util.syscs_set_database_property ('derby.connection.requireAuthentication', 'true');

Exiting from ij, then reconnecting using

connect 'jdbc:derby:Sample' user 'testuser' password 'test';

confirmed that the connection worked.

Connecting through Eclipse



The next task was to ensure that I could use the data through other tools, specifically Eclipse and OpenOffice.org. In Eclipse, I had BIRT installed. It gives a few choices for perspectives and I chose the Choose Report Design perspective. The perspective then gives the choice of 'Databases' or 'ODA Data Sources'. I right-clicked on 'Databases' and then chose 'Derby Embedded Database.' The next screen asked for a connection profile name and description. Next was the driver details. I clicked on the option box with the three dots, and looked at available drivers. These were not set properly, so I set the embedded driver to /usr/share/javadb/lib/derby.jar and a client driver to /usr/share/javadb/lib/derbyclient.jar. Filling in the Username 'testuser' and password 'test', then clicking on 'Test Connection' showed that this succedded.

In Eclipse, under 'Databases', the 'Sample' database shows up. Under 'Sample' was 'Schemas'. Under the 'APP' schema, undert 'Tables' was my 'SampleTable', waiting to be found.


Connecting using OpenOffice.org



Before starting, I opened OpenOffice.org and selected Tools -> Options. In the window that comes up, in the right I opened up the OpenOffice.org tree (click on '+' if it does not show) then selected 'Java'. On the right, there is a button for 'Class Path' Click on that button, and I added the derby.jar and derbyclient.jar archives to the Classpath. These were at /usr/share/javadb/lib/derby.jar and /usr/share/javadb/lib/derbyclient.jar respectively.

Using OpenOffice.org, I selected File -> New -> Database. In the Database Wizard that comes up, I picked 'Connect to an existing database', with 'JDBC' as the chosen option.



In the 'Set up a connection to a JDBC database' screen I entered

Datasource URL: jdbc:derby:/path/to/Sample
JDBC driver class: org.apache.derby.jdbc.EmbeddedDriver

Clicking on 'Test Class' confirmed the driver was in the Classpath as set earlier. The next screen is 'Set up the user authentication' I leave the 'User name' blank, but check the box labeled 'Password required'

On the next screen, 'Decide how to proceed after saving the database', click 'Yes, register the database for me' which allows OpenOffice.org to access the database (or you just have an odb file that includes database connection parameters) and 'Open the database for editing' ('Create tables using the table wizard' is not useful now since there already are datatables).

OpenOffice.org then asks for a name for the .odb file it will create to maintain the connection to the database (as well as any queries created in OpenOffice.org). The database can be accessed by clicking on any icon ('Tables', 'Queries', 'Forms', 'Reports'). OpenOffice.org will then connect to the database (asking for the database username and password along the way. Entering 'testuser' for username and 'test' for password does it). Then the database can be used from the GUI, which includes a graphical Query Design view or the usual SQL.

No comments: