For most directives
in SAS Data Loader,
data sources are Hive schemas that contain one or more tables. Data
sources are defined in Hive by your Hadoop administrator. If you do
not see the data source or table that you need, contact your Hadoop
administrator. If needed, the administrator can add a new Hive schema
and set appropriate user permissions to read and write data.
In some cases, data
sources are not based on Hive schemas. For example, data sources for
the Copy Data to Hadoop directive are RDBMS connections. Data sources
for the Import a File directive are delimited files that are stored
in the shared folder of the vApp.
When you open a directive
to create a job that runs in Hadoop, you select a data source and
a source table that is contained within that data source. If the directive
produces output tables, you then select a data source and a target
table at the end of the directive.
To protect your data,
target tables do not overwrite source tables. Target tables are not
required to be new tables each time you run your job. You can overwrite
target tables that you created in previous job runs.
As the
data is processed in each task in the job, you can view a sample of
the data that is produced in each task.
A typical Source Table
task includes a graphical view of the tables in the selected data
source.
SAS Table Viewer icon
Click to open the selected
table in the SAS Table Viewer, which provides column information and
sample data for the table.
View Data Sample icon
Click to display the
first 100 rows of source data, as that data has been transformed up
to that point in the job.
and
Click the View List
icon to display data sources or tables as a list. When you view tables,
the list format displays the table name and description, along with
the dates on which the table was last profiled and last modified.
Note: The last modified date is
displayed only when the
Identify each table as "new"
when created or modified setting is selected on the
General
Preferences panel of the
Configuration window.
For more information,
see General Preferences Panel.
Click the View Grid
icon to display data sources or tables in a grid.
Click to view profile
information for the selected table. If a profile exists for a table,
PROFILED appears beneath the table name.
Click to select a source
table from another data source.
Click to choose from
a list of recently used tables. If you select a table from a different
data source, the source table information is adjusted accordingly.
The table that you selected is automatically highlighted.
Enter text in the search
field to filter the list of data sources or tables. The search feature
filters according to name when applied to data sources and according
to name and description when applied to tables.
Click to return to
the top of the page when viewing a long list of data sources or tables.
Tip
If you frequently work with
the same data source across multiple directives, you can have SAS Data Loader
select the most recently used schema automatically. This can help
you select source tables and target tables more quickly. To enable
this feature, click
, select
Configuration, and
complete the following steps:
-
Click General Preferences.
-
Select Automatically select the most recently
selected hive schema.