Data profiling jobs
help you assess the composition, organization, and quality of Hadoop
tables. They help you recognize patterns, identify scarcity in the
data, and calculate frequency and basic statistics. Data profiling
can also aid in identifying redundant data across tables and cross-column
dependencies. All of these tasks are critical to optimal planning
and monitoring.
The profile directives
enable you to generate and view reports for one or more Hadoop tables.
The reports display sample data, column information, and measurements
of data quality. You create profile reports with the Profile Data
directive and use the Saved Profile Reports directive to access and
manage profile reports.
Here’s an example
of a profile report: