Importing External Data to db4o

2

Written on Monday, December 24, 2007 by Edwin Sanchez

From time to time, we may experience requirements involving exporting data from one source to another. Today, I will show you how an external data can be exported in db4o.

The following technologies will be used:
· Db4o 6.4
· .Net 2.0 using C#


This post will discuss the following:
· Reading a text file using StreamReader
· Extracting data from a line of record into db4o objects
· Use List Find method to check for existence


Data can come from many different sources. It can be an XML file, Excel File, from an RDBMS, from a web service or simply from a text file. I’m going to show you today about reading a text file as our source data. First, let’s take a look at the format of our sample text file:


“Department””Position””Name””Salary”
“Information Technology””Team Leader””Jun Marfil””30000.00”
“Information Technology””Programmer””Dino De Guzman””25000.00”
“Human Resources””Benefits Supervisor””Shyrel Morco””15000.00”

From the data above, we can see that we need a department object, position object and employee object. The column delimiter is the pipe symbol () and it’s obvious that rows are separated by carriage return and line feed.

In order to process this data, we need to read each line one by one. Each line of record should be read based on the column delimiter, interpreted and then written to a db4o database.

Now, in order to read the loaded text file line by line, we need the StreamReader. To parse a line of record, we will use the Split string method to put the data in an array of string columns.

Before we jump into code, let’s define our classes.


public class Department
{
private string _name;

public Department(string name)
{
_name = name;
}

public string Name
get
{
return _name;
}
set
{
_name = value;
}
}

public class Position
{
private string _name;

public Position(string name)
{
_name = name;
}

public string Name
get
{
return _name;
}
set
{
_name = value;
}

}

public class Employee
{
private string _employeeName;
private double _salary;
private Department _department;
private Position _position;

public Employee(string name, double salary,
Department department,
Position position)
{
_name = name;
_salary = salary;
_department = department;
_position = position;
}

public string EmployeeName
get
{
return _employeeName;
}
set
{
_EmployeeName = value;
}

public double Salary
get
{
return _salary;
}
set
{
_salary = value;
}

public Department EmployeeDepartment
get
{
return _department;
}
set
{
_department = value;
}

public Position EmployeePosition
get
{
return _position;
}
set
{
_position = value;
}
}

Now let’s look into the part of our program of parsing text. We assume that the database is already open at this point.

private void ImportTextFile (IObjectContainer db)
{
string employeeName;
double salary;
string employeePosition;
string employeeDepartment;
string record[];
string rawRecord;
List<Position> positionList = new List<Position>();
List<Department> departmentList = new List<Department>();
Employee employee;

try
{
using (StreamReader sr = File.OpenText(“sample.txt”))
{
while ((rawRecord = sr.ReadLine()) != null)
{
// remove the quotes
rawRecord = rawRecord.Replace("\"", "");

// put into array
record = rawRecord.Split(‘"’);

//avoid the header line
if(record[0] == “Department”)
{
//do nothing
}
else
{ //get column values
employeeDepartment = record[0];
employeePosition = record[1];
employeeName = record[2];
salary = Convert.ToDouble(record[3]);
}
positionList = AddToList(positionList, employeePosition, db);
departmentList = AddToList(departmentList, employeeDepartment, db);

employee = new Employee(employeeName, salary, employeeDepartment, employeePosition);
db.set(employee);
}
db.Commit();
}
}
catch(Exception ex)
{
db.rollback();
throw new Exception(“Error processing text file”,ex);
}
finally
{
sr.Close();
db.Close();
}
}


private List<Position> AddToList(List<Position> list, string name,
IObjectContainer db)
{
Position position;

// search if the name being inserted is already in the list
position = list.Find(delegate(Position pos)
{
return pos.Name == name;
});

if (position == null)
{
position = new Position(name);
list.Add(position);
db.Set(position);
db.Commit();
}
return list;
}

private List<Department> AddToList(List<Department> list, string name,
IObjectContainer db)
{
Department department;

department = list.Find(delegate(Department dept)
{
return dept.Name == name;
});

if (department == null)
{
department = new Department(name);
list.Add(department);
db.Set(department);
db.Commit();
}
return list;
}



Now let’s recap. We have made 3 classes for our objects and it’s pretty straightforward. The method we define for processing the file used several objects worth our concern. We used the StreamReader to process our file and the Readline is the one we used reading the file line by line. We did not try to load all the lines in memory since this will become troublesome when our file is several MBs and GBs big. We used the Split string method in order to parse the string based on the pipe symbol delimiter and put it in an array. Then, we store each and every array element in a variable. Another notable method is the AddToList method. We used the generic List<t> instead of IList so we can make use of the Find method. We need this so we avoid inserting duplicate departments and positions. After parsing the values, we are now ready to save an Employee object in our database.

Our code is rather simple at this point. One point of improvement is to put some visuals to the user like a progress bar to indicate the status of the processing. Or instead of the AddToList method is to use db4o callbacks to check for duplicates prior to commit.

That’s how simple it is to import data from an external data source like a text file into db4o. I hope you find something useful in this post.


If you enjoyed this post Subscribe to our feed

2 Comments

  1. Anonymous |

    Great! just what I was looking for - beginner.

     
  2. Ants |

    I ran across your blog entry trying to do a quick survey on options for importing data into db4o. Like you, I'm a db4o fan.

    I encountered the following issues looking over the code you posted:

    1) The two AddToList() methods, return the input list. I think your intent was for it return the item found, or item constructed in the list.

    2) Employee constructor takes (string, double, Department, Position) but you are passing in (string, string, string, string) in the ImportTextFile() function. I assume that the intention was to pass in the department and position found in the list, otherwise a constructor that takes strings for the department and position will end up constructing brand new instances of Department and Position which will be different from those that were constructed (or found) in the lists.

    3) You call sr.Close() in your finally block, but sr is out of scope which will cause a compilation error. Additionally the using(StreamReader sr ...) would have taken care of doing the Close()/Dispose() for you.

    4) You say that you don't read the entire file into memory in case it's multiple megabytes of data, but by keeping a list of positions and departments, you are effectively going to have multiple megabytes of data if all the positions and departments are unique strings causing AddToList() to always construct a new object to keep in the respective lists.

     

Post a Comment