An SQL procedure query that joins multiple Google BigQuery tables might result in slow performance when the following is true about the query:
- The query includes both inner and outer joins.
- The query uses a LIBNAME statement to reference the BigQuery tables.
Currently, a mix of inner and outer joins is not supported with SAS/ACCESS® Interface to Google BigQuery. As a result, the entire query is not passed to the database. Instead, data is read into SAS, and the query processing is done by SAS. If the system option SASTRACE= is enabled, the following message is written to the location specified with the SASTRACELOC= system option:
SAS_SQL: Engine does not allow mix of an outer join with other kind of join.
Workarounds
There are two possible workarounds.
One workaround is to use explicit pass-through and pass the query to the database, using Google BigQuery specific SQL syntax.
Another workaround is to rewrite the query to replace the inner join of two tables with a sub-query that includes a left join along with a WHERE clause.
Here is an example query:
proc sql;
create table test as
select a.x, b.y, c.z
from
x.t1 a inner join x.t2 b
on a.x=b.x
left join x.t3 c
on a.x=c.x;
quit;
Here is a rewrite of the query using explicit pass-through:
proc sql;
connect to bigquery(...connection options specified here...);
create table test as
select *
from connection to bigquery
(select a.*,b.y,c.z
from t1 a inner join t2
on a.x=b.x
left join t3 c
on a.x=c.x);
disconnect from bigquery;
quit;
Here is a rewrite of the query to replace the inner join with a left join and WHERE clause:
proc sql;
create table test as
select d.x, d.y, c.z
from (select a.*, b.y
from x.t1 a left join x.t2 b
on a.x=b.x
where a.x=b.x) d
left join x.t3 c
on d.x=c.x;
quit;
Operating System and Release Information
SAS System | SAS/ACCESS Interface to Google BigQuery | Linux for x64 | 9.4 | 9.4 | 9.4 TS1M6 | 9.4 TS1M8 |
SAS System | SAS/ACCESS Interface to Google BigQuery (on SAS Viya) | Linux for x64 | V.03.04 | 2021.1.1 | Viya | Viya |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.