Recently we switched our data warehouse at work to Snowflake and have been pretty happy with the speed boost it provides. As an aside, we’ve been less happy with the insistence of ALL CAPS for column names. Anyway, I had already built an internal R package for our Postgres setup, so I ventured to make it work with Snowflake. In short, it took some doing and I wanted to share what I found worked best in case you find yourself in a similar situation.
Snowflake does offer a tutorial on connecting R (including dplyr!) to their service, but it depends on rJava and RJODBC, which was overhead I wanted to avoid. Luckily, Snowflake also offers straight ODBC drivers too!
The main challenge with using these drivers has been choosing between the two software packages that are available to maintain ODBC drivers: unixodbc and iODBC. Snowflake suggests using iODBC, but may support unixODBC while R Studio suggests unixODBC, by may support iODBC. After trying out both, I have found that the iODBC route is less painful. While I was able make everything work with unixODBC, it involved editing config files by hand. Which may be fine for you and I, but I have less technical folks at my company to worry about.
Happily, the iODBC route works pretty smoothly, so that is what I will outline here. Note that it apparently causes trouble to have both installed, so you might want to remove unixODBC first if you already have it installed.
For reference, if entering
odbcinst -j into your console works then you have unixODBC installed. If
iodbctest works then you have iODBC installed.
Most of these steps can be done by visiting the relevant websites, downloading the package installers, double clicking, etc., but we’ll outline how to do it from the command line here.
First, we need to install iODBC
brew install libiodbc
Next, we download and install the latest odbc drivers from Snowflake. Note you may need to change the version number of the drivers to match what is in
# Download file to current working directory curl -O https://sfc-repo.snowflakecomputing.com/odbc/mac64/latest/snowflake_odbc_mac-2.20.3.dmg # Run .pkg hdiutil attach snowflake_odbc_mac-2.20.3.dmg sudo installer -package /Volumes/snowflakeODBC/snowflakeODBC.pkg -target / hdiutil detach /Volumes/snowflakeODBC/
Data Source Names
With the drivers installed, we can now store your Snowflake credentials as a Data Source Name (DSN) to make it easy to connect from R in your code.
First, we tell iODBC where to look for your DSNs. Entering the following in your console adds a line to the bottom of your
.zshrc that sets a persistent environment variable. If you are using bash, replace
echo 'export ODBCINI=$HOME/.odbc.ini' >> .zshrc #echo 'export ODBCINI=$HOME/.odbc.ini' >> .bashrc
You will need to restart your terminal for this to take effect.
Create or edit your DSN files.
There should now be a
.odbcinst.ini in your home directory. If not, create them!
.odbcinst.ini should look similar to this:
[ODBC Drivers] Snowflake = Installed [Snowflake] Driver = /opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib
.odbc.ini should look like the following. Note that you can change the name in the
 (in the example it’s
snowflake) to whatever you want to use to call this connection in your code.
[ODBC Data Sources] snowflake = Snowflake [snowflake] Driver = /opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib Description = Typical user settings for Nick UID = <your_username> PWD = <your_password> Server = <your_server>.snowflakecomputing.com Role = <your_snowflake_role> Database = <your_database> Warehouse = <your_warehouse> Schema = <your_default_schema> Locale = en-US Tracing = 0
odbc R Package
The last step is to install the
odbc R package from source. The install needs to be from source so that it can catch you are using iODBC and adjust.
install.packages("odbc", type = "source")
At this point you can restart your R session and run the following commands to make sure R sees your connection options.
## name attribute value ## 1 Snowflake <NA> <NA>
## name description ## 1 snowflake Snowflake
Finally, we can connect and try out some queries.
con <- DBI::dbConnect(odbc::odbc(), "snowflake")
In future posts I hope to share some custom dbplyr translations of lubridate / stringr functions I created to work with Snowflake’s functions. Until then, I hope this helps!