ETL

Data extraction transformation and loading is a required feature of a data quality solution.  An effective data quality solution should be able to extract data from multiple sources, integrate this data and transform it before applying various data quality processes. Once the data has been standardized it can now be loaded into multiple target data systems such as a CRM system, Datawarehouse and other upstream/downstream applications.  

 

OpenDQ is completely integrated with one of the most comprehensive extraction, transformation and loading solutions in the market.

ETL features of the OpenDQ solution:

 

 

Source data systems:

  • Fixed width, Delimited data files
  • Excel plug-in to read from Excel worksheets
  • Read XML files
  • Relational databases include all major databases with JDBC/ODBC connectivity
  • Unstructured data files
  • Adaptor to read unstructured data from PDF, MS Word and PowerPoint
  • Xbase Input
  • Optional EDI plug-in for X12 and EDIFACT message formats

Target data systems:

  • Excel plug-in to write to Excel spreadsheets
  • Output XML files
  • Fixed width, Delimited data files
  • Capability to write to all Relational Data bases includes all the major databases with JDBC/ODBC connectivity

Transformation features:

  • Over 70 pre-built functions for the most commonly used transformations
  • Java Script based editor for custom transformation logic
  • Look up reference data using database tables, CSV files, Excel worksheets
  • Call stored procedures and functions in all major databases
  • Database joins sort, sequence numbers, constants, group by, filter, etc.
  • Row de-normalizer, row flattener, value mapper, select values
  • Option to use external Java programs via API calls

Unstructured Data:

  • Extract data from text, log, email files and other documents
  • Uses advanced natural language processing techniques to extract data of interest
  • Integrates the data extracted from the unstructured data sources into other structured streams seamlessly.

Work flow, Scheduling and Notification features:

  • Work flow engine allowing a series of jobs to be executed
  • Sequence, prioritizing jobs based on the business rules
  • Comprehensive logging of number of rows, error rows, exceptions, etc.
  • Notifications on Success/Failure of each overall process via email
  • Option to attach log files associated with the jobs and the process
  • Capability to use conditional processing based on the outcome of a job
  • Capability to run external shell processes such as zip utilities, PGP encryption/decryption
  • Option to run the jobs on a scheduler or in an interactive mode