Managing your Data Pipeline for Analytics and Machine Learning
Test Your Knowledge About Managing Data Along the Entire Pipeline, from Acquisition to Analytics
Let's Get Started
This quiz will take about 2-3 minutes to complete.
QUESTION 1 / 9
Failing to take a holistic approach to your data pipeline can yield dark, unusable data, or worse, it may compel organizations to make critical business decisions based on inaccurate data.
True
False
BACK
QUESTION 2 / 9
Data engineering involves managing a changing array of data sources, establishing repeatable processes at scale, and maintaining control and governance.
True
False
BACK
QUESTION 3 / 9
Data engineering best practices mean that your data pipeline doesn’t need to be reproducible, consistent, and production-izable.
True
False
BACK
QUESTION 4 / 9
To manage your data pipeline effectively, your tools must have the right connectivity to both traditional and emerging sources of structured, semi-structured, and unstructured data. When evaluating potential vendors, which question(s) are important to ask?
Can you connect to a variety of SaaS and packaged applications, as well as cloud data sources?
Can you use a variety of APIs to connect to different web services?
Can you read a variety of industry-standard data streams and feeds, queues, and complex XML files?
All of the above.
BACK
QUESTION 5 / 9
Which of the following is NOT a potential pitfall when evaluating potential data integration vendors?
Vendors may support only MySQL or PostgreSQL and may not support popular options such as Microsoft SQL Server.
Vendors may support only IBM Netezza, Teradata, and possibly Oracle Exadata and may not support bulk loading for newer data warehouses such as SAP HANA and Greenplum Database.
Vendors may support on-premise analytic databases such as Infobright, Greenplum Database, and HP Vertica while skipping cloud databases such as Amazon Redshift.
Vendors may not offer ad hoc analysis via a desktop interface.
Vendors may not connect to schema-less NoSQL stores, Hadoop ecosystems, and in-memory databases.
BACK
QUESTION 6 / 9
Stand-alone data prep tools may lack the flexibility to blend both traditional and unstructured data sources and may also be incompatible with other tools. When evaluating the right end-to-end data pipeline solution which capabilities are important?
Comprehensive data discovery, data access for a range of users, and visual examination.
Quality metrics and statistical analysis.
Automated templates for blending, enriching and shaping data.
Labels for data sets that provide additional context.
All of the above
BACK
QUESTION 7 / 9
Vendors that provide a fixed library of analytics options may not have the flexibility you need. What capabilities might NOT allow you to leverage the best of predictive analytics and to embed analytics into your existing business processes?
Inline visualization along data pipelines
Ad hoc analysis via both a web and desktop interface
Built-in web conferencing and web conferencing tools for improved collaboration
Open APIs to easily embed the data integration and analytics
The ability to analyze virtual data sets
BACK
QUESTION 8 / 9
Which options can help speed up the pipeline from raw data to analytics and business insights?
Capabilities such as monitoring, auditing, automated data blending, and error handling.
Ability to execute the same transformations on micro-batches or streams of data.
An end-to-end solution that includes data ingestion, transformation and analytics.
All of the above.
BACK
QUESTION 9 / 9
Much of the innovation in the last few years around data management, especially with big data, has taken place in the open source community. Accordingly, to stay flexible, consider a vendor’s extensibility and scalability features.