Snowflake, R, and dbplyr

Author

Nick DiQuattro

Published

February 3, 2020

Recently we switched our data warehouse at work to Snowflake and have been pretty happy with the speed boost it provides. As an aside, we’ve been less happy with the insistence of ALL CAPS for column names. Anyway, I had already built an internal R package for our Postgres setup, so I ventured to make it work with Snowflake. In short, it took some doing and I wanted to share what I found worked best in case you find yourself in a similar situation.

The Rub

Snowflake does offer a tutorial on connecting R (including dplyr!) to their service, but it depends on rJava and RJODBC, which was overhead I wanted to avoid. Luckily, Snowflake also offers straight ODBC drivers too!

The main challenge with using these drivers has been choosing between the two software packages that are available to maintain ODBC drivers: unixodbc and iODBC. Snowflake suggests using iODBC, but may support unixODBC while R Studio suggests unixODBC, by may support iODBC. After trying out both, I have found that the iODBC route is less painful. While I was able make everything work with unixODBC, it involved editing config files by hand. Which may be fine for you and I, but I have less technical folks at my company to worry about.

Happily, the iODBC route works pretty smoothly, so that is what I will outline here. Note that it apparently causes trouble to have both installed, so you might want to remove unixODBC first if you already have it installed.

For reference, if entering odbcinst -j into your console works then you have unixODBC installed. If iodbctest works then you have iODBC installed.

iODBC

Most of these steps can be done by visiting the relevant websites, downloading the package installers, double clicking, etc., but we’ll outline how to do it from the command line here.

First, we need to install iODBC

brew install libiodbc

Next, we download and install the latest odbc drivers from Snowflake. Note you may need to change the version number of the drivers to match what is in latest/.

# Download file to current working directory
curl -O https://sfc-repo.snowflakecomputing.com/odbc/mac64/latest/snowflake_odbc_mac-2.20.3.dmg

# Run .pkg
hdiutil attach snowflake_odbc_mac-2.20.3.dmg
sudo installer -package /Volumes/snowflakeODBC/snowflakeODBC.pkg -target /
hdiutil detach /Volumes/snowflakeODBC/

Data Source Names

With the drivers installed, we can now store your Snowflake credentials as a Data Source Name (DSN) to make it easy to connect from R in your code.

First, we tell iODBC where to look for your DSNs. Entering the following in your console adds a line to the bottom of your .zshrc that sets a persistent environment variable. If you are using bash, replace .zshrc with .bashrc.

echo 'export ODBCINI=$HOME/.odbc.ini' >> .zshrc
#echo 'export ODBCINI=$HOME/.odbc.ini' >> .bashrc

You will need to restart your terminal for this to take effect.

Create or edit your DSN files.

There should now be a .odbc.ini and .odbcinst.ini in your home directory. If not, create them!

Your .odbcinst.ini should look similar to this:

[ODBC Drivers]
Snowflake = Installed

[Snowflake]
Driver = /opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib

Your .odbc.ini should look like the following. Note that you can change the name in the [] (in the example it’s snowflake) to whatever you want to use to call this connection in your code.

[ODBC Data Sources]
snowflake = Snowflake

[snowflake]
Driver      = /opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib
Description = Typical user settings for Nick
UID         = <your_username>
PWD         = <your_password>
Server      = <your_server>.snowflakecomputing.com
Role        = <your_snowflake_role>
Database    = <your_database>
Warehouse   = <your_warehouse>
Schema      = <your_default_schema>
Locale      = en-US
Tracing     = 0

odbc R Package

The last step is to install the odbc R package from source. The install needs to be from source so that it can catch you are using iODBC and adjust.

install.packages("odbc", type = "source")

At this point you can restart your R session and run the following commands to make sure R sees your connection options.

odbc::odbcListDrivers()
odbc::odbcListDataSources()

Finally, we can connect and try out some queries.

con <- DBI::dbConnect(odbc::odbc(), "snowflake")

Future

In future posts I hope to share some custom dbplyr translations of lubridate / stringr functions I created to work with Snowflake’s functions. Until then, I hope this helps!