Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers Pat Patterson

Unlocking Proprietary Data
with PostgreSQL Foreign
Data Wrappers
Pat Patterson
Principal Developer Evangelist
[email protected]
@metadaddy
Agenda
 Foreign Data Wrappers
 Writing FDW’s in C
 Multicorn
 Database.com FDW for PostgreSQL
 FDW in action
Why Foreign Data Wrappers?
 External data sources look like local tables!
– Other SQL database
• MySQL, Oracle, SQL Server, etc
– NoSQL database
• CouchDB, Redis, etc
– File
– LDAP
– Web services
• Twitter!
Why Foreign Data Wrappers?
 Make the database do the work
– SELECT syntax
• DISTINCT, ORDER BY etc
– Functions
• COUNT(), MIN(), MAX() etc
– JOIN external data to internal tables
– Use standard apps, libraries for data analysis,
reporting
Foreign Data Wrappers
 2003 - SQL Management of External Data (SQL/MED)
 2011 – PostgreSQL 9.1 implementation
– Read-only
– SELECT-clause optimization
– WHERE-clause push-down
• Minimize data requested from external source
 Future Improvements
– JOIN push-down
• Where two foreign tables are in the same server
– Support cursors
FDW’s in PostgreSQL
 ‘Compiled language’ (C) interface
 Implement a set of callbacks
typedef struct FdwRoutine
{
NodeTag type;
/* These functions are required. */
GetForeignRelSize_function GetForeignRelSize;
GetForeignPaths_function GetForeignPaths;
GetForeignPlan_function GetForeignPlan;
ExplainForeignScan_function ExplainForeignScan;
BeginForeignScan_function BeginForeignScan;
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
/* These functions are optional. */
AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
FDW’s in PostgreSQL
 Much work!
• CouchDB FDW
• https://github.com/ZhengYang/couchdb_fdw/
• couchdb_fdw.c > 1700 LoC
Multicorn
 http://multicorn.org/
 PostgreSQL 9.1+ extension
 Python framework for FDW’s
 Implement two methods…
Multicorn
from multicorn import ForeignDataWrapper
class ConstantForeignDataWrapper(ForeignDataWrapper):
def __init__(self, options, columns):
super(ConstantForeignDataWrapper,
self).__init__(options, columns)
self.columns = columns
def execute(self, quals, columns):
for index in range(20):
line = {}
for column_name in self.columns:
line[column_name] =
'%s %s' % (column_name, index)
yield line
Database.com FDW for PostgreSQL
 OAuth login to Database.com / Force.com
– Refresh on token expiry
 Force.com REST API
– SOQL query
• SELECT firstname, lastname FROM Contact
 Request thread puts records in Queue, execute()
method gets them from Queue
 JSON parsing – skip embedded metadat
 < 250 lines code
Demo
Conclusion
 Foreign Data Wrappers make the whole world look like
tables!
 Writing FDW’s in C is hard!
– Or, at least, time consuming!
 Writing FDW’s in Python via Multicorn is easy!
– Or, at least, quick!
 Try it for yourself!
Resources
 http://wiki.postgresql.org/wiki/SQL/MED
 http://wiki.postgresql.org/wiki/Foreign_data_wrappers
 http://multicorn.org/
 https://github.com/metadaddy-sfdc/Database.comFDW-for-PostgreSQL