After discovering DuckDB by chance and giving it a try it was time to get a real project done.
I did some experiments and found a way to integrate DuckDB with FileMaker, what was the topic of my presentation for the dotfmp conference in Berlin two weeks ago. After the resonance was quite exciting and feedback by other FileMaker developers was encouraging, I gave it a try to adapt an existing solution to the possibilities of DuckDB.
Get ready for a deep dive
In my first experiments with DuckDB I just touched the surface. Now it was time for a real deep dive…
A few months ago I implemented a project with FileMaker to solve a strategic question in a business case that was unanswered yet.
It was a really complex scenario for analyzing a huge dataset. The approach includes
- unpacking nested groups of labels
- normalizing multiline keys
- aggregating on missing relational data
- generating data on if-then-else logic
- calculation of statistical weights upon aggregated results of the whole dataset
- normalizing the weights upon subsets of the data in several dimensions
- and finally extrapolation of the whole dataset by a cross product of labels, weights and several measures
The original analysis I created in FileMaker required a rather complicated logic that included
- generating meta data by scripts
- multiple loops upon all dimensions and millions of records
- calculation of intermediate results
- writing back status flags to the data for being able to filter aggregations
When it finally worked, it took more than 10 hours of processing time. The final results then had to be manually transferred into a huge Excel document to display multiple pivot tables with the results in different dimensions.
The impact of the results were really worth it. But now the decision makers wanted to have this report on a regular basis.
This was an ideal candidate to find out if DuckDB could fulfill the promises I had read about. It took me some days to understand the logic of the approach to this task and to discover more and more of the rich toolset DuckDB offers. Finally the new project got traction and I learned some really neat tricks that were only possible by the powerful and flexible SQL language that is unique to DuckDB and extends the classic SQL by some awesome features like logic for structures, conditionals, nesting, pivoting, regular expressions, string functions, date functions and much more.
Unleashing the power
To make a long story short: Not only was it possible to rework the entire project within a short period of time, but the end result was far more advanced than I could ever have dreamed of.
FileMaker | DuckDB |
---|---|
12 Tables, 25 TOs
|
1 Query
|
Using DuckDB, it was possible to turn all the data processing into one giant SQL statement consisting of a series of SELECT commands to import all the existing data from CSV files, generate the calculated data and output the results in exactly the form that was needed. Thanks to its in-memory capabilities, the processing time has been reduced from 10 hours to less than 4 seconds(!!).
Really amazing!
FileMaker | DuckDB |
---|---|
10 hours
|
4 seconds
|
With this power under the hood, I finally went one step further and had DuckDB generate a new dataset consisting of a cross product of the original data with the sliced subdivisions of labels and weights that were calculated as a temp table during the aggregation process. This new dataset consists of about 30 million rows of data, and it only took less than a second to create it, and less than 3 seconds to write 2.8 gigabytes of data to a new CSV file. That said, I did not use a super computer or some scaled out cloud service, but all this happened on my local machine, a MacBook Pro with an M1 CPU.
Awesome!
Go one better…
Now, with the new result set, there is a foundation to build another project on top of this processing pipeline to bring further enhancements to our existing reporting. I had built an OLAP tool in our FileMaker solution, to make sense of transactional data by being able to filter and aggregate by more than 20 dimensions. But since we collected about 25 years of transactions, which means tens of millions of FileMaker records, the processing time in FileMaker got slower and slower over the years. To compensate we migrated the server hardware every few years to the fastest machines that were available at that time.
A typical report needs some minutes to select a subset of the data by find-request and then doing the calculation of the results with summary fields and sorted subsummary-layouts. Which is indeed a nice and flexible way of doing this in FileMaker. But as processing time grew over the years, these powerful features are used less and less by the users who are in need for the results to make business decisions on the data. We ended up creating some scripted reports that are processed overnight once a month and automatically sent to our controlling team as a PDF file.
Now we see some opportunities to advance our OLAP tool for “real time” analytics with the help of DuckDB. It could then produce final results instantly by the push of a button in the FileMaker user interface. That would be a huge advancement in delivering an interactive tool to find answers on strategical questions on the fly.
It comes down to the simple fact, that big CSV files with shadow tables of all data can serve as a database by its own, as DuckDB is able to process the files in blazing speed. It would be nice to bring the integration of DuckDB and FileMaker some steps further for more elegant interactions but the discovered possibilities by REST API are fair enough to unfold the magic of analytical power and build new ways of interaction with the business data that already exists inside the FileMaker solution.
Conclusion
FileMaker is a great tool that enables us to build critical business solutions by storing and managing data. This is why we often call it a “Database”. In fact, we know it is much more than that, but it has its roots in this category of database software. If you think of DuckDB as just another database, you will miss the point. Yes, it’s true that DuckDB also is in the category of database software, but it has a whole different approach on dealing with data.
While FileMaker is optimized for reading and writing rows of data. DuckDB is optimized for processing and aggregating columns of data, resulting in its unbelievable speed. While FileMaker is optimized for transactions and access by multiple users on a server, DuckDB is optimized for single processes on a single machine, without any features for record locking or network access from multiple users or devices. It just reads, transforms and writes data in very flexible and powerful ways. So each of this two databases has its very own feature set that are ideal in combination because they complement each other nicely.
FileMaker and DuckDB is a winning team! There are many more use cases to be discovered, the journey has only just begun.
Further reading:
- Discover DuckDB’s feature set in the documentation
- Integrating DuckDB with FileMaker
- DuckDB in Action (Book Preview)