Performance Guide to Tune your Application and db4o Database for Report Generation

0

Written on Monday, January 14, 2008 by Edwin Sanchez

Reports drive businesses to be better. Behind those humongous data are processes that will need to generate summaries, listings, graphs and more. In addition to what we know on how to supercharge our db4o database to top speed, below are the two main questions I ask myself when I am designing my application using db4o database for speed of generating reports:

  • Will the report be produced real-time or not?

    This is a question I ask my users before I finalize the data source of the report. If this will be a periodic report (weekly, monthly, etc.), what I find the best design strategy for speed is to persist only the needed information in the particular report in one object, have an automated process run at night (or weekends depending on the requirement) to update the data. When reporting comes, my application will query the summarized data based on user criteria. This way, the user will perceive a faster generation of report based on lots of data. However, if the report is needed on demand real-time based on the latest data, that’s the only way I will need to query directly from the raw objects.
  • Will the volume of raw data be very large?

    Upon initial design stage of a system, I try to find out if the volume of data can be very large that querying it whether native queries or SODA will still be slow and unacceptable to the user. When I perceive this to be the case, I will design my db4o database into separate files. This will increase processing and loading time since the data store is smaller. (Designing which goes to what file is something to be taken seriously, though. It can make things better or worst depending on the design) One example is when you need to generate a monthly report out of humongous raw objects. You can design separate files each based by month and year. If separate monthly files are your way to go, I have found using the following to be effective in my case:

    FlushFileBuffers(false) – “Wait a minute! Isn’t this considered a dangerous practice?” Yes, it is. This is included in the Dangerous Practices in the Reference Documentation. You may not agree with me but here’s the idea why. The monthly report files are generated separately each month. So a damage one will not affect the other. Added to that, when it gets corrupted due to some factors, you can still reproduce it by triggering the data processing for the month and year concerned.
    Enable Field Indexes on Key Fields – This should be clear why. This will increase performance when you retrieve your summary data later. But remember, just the key fields only. You may degrade processing performance by adding unnecessary indexes.
    Disable Query Evaluation on Fields Not Used for Querying – as the documentation states, all fields are evaluated by default. Specifying which of the fields will not be evaluated will increase performance
    Unicode(false) – if your report is in plain English and not using double byte characters, turn Unicode support off. This is on by default.
    Query Evaluation Mode to SnapShot – after processing has taken place, it’s time for the users to view the output. Snapshot is best for the client-server setting if your memory.
    Blocksize(8) – I have found following this recommended setting increases performance. Changing it to a higher value makes processing slower.
    When Processing, Open or Create the File Using OpenFile. Use OpenServer to Open it later for shared use – Single-user access is still the fastest when processing data. And since the user doesn’t need the output at processing time, OpenFile will be just right. When the time comes that the users need to view the report in your application, that’s the only time this will be hosted using OpenServer.

    Separating data in different files will simplify my query unlike one database for everything. And therefore will increase the performance. A more complex query will more likely to run slowly. Backing up the monthly data is also simpler in this setup. And chances of reaching the database file size limit will be low.

I have presented here my general considerations in this type of scenario which is reporting. There are other factors to consider that may only be applicable to your situation. I hope this general guide will help you.

If you enjoyed this post Subscribe to our feed

No Comment

Post a Comment