Azure Databricks Series: Resolving Key-Based Authentication Issues in Azure Storage Access

Access storage account from Azure databricks using Storage account key.

# Using Storage key
from pyspark.sql import SparkSession

# Create a Spark Session
spark = SparkSession.builder.getOrCreate()

# Define the storage account and container details
storage_account_name = '<Storage_Account>'
storage_account_key = '<Storage_Account_key>'
container_name = '<container-name>'

# Define the configuration for the Azure Storage account
spark.conf.set(
    f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net",
    storage_account_key
)

# Define the path to the CSV file
file_path = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/customer/customers.csv"

# Read the CSV file into a DataFrame
df = spark.read.format('csv').option('header', 'true').load(file_path)

# Display the DataFrame
display(df)

Access storage account from Azure databricks using Azure AD Authentication.

# Using App registration and Azure AD
# Import required libraries
from pyspark.sql import SparkSession

storage_account_name = '<Storage_Account>'
container_name = '<container-name>'

# Set up configuration
spark.conf.set("fs.azure.account.auth.type.<Storage_Account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<Storage_Account>.dfs.core.windows.net", 
               "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<Storage_Account>.dfs.core.windows.net", "<your-service-principal-client-id>") #<your-service-principal-client-id>
spark.conf.set("fs.azure.account.oauth2.client.secret.<Storage_Account>.dfs.core.windows.net", "<your-service-principal-client-secret>
") #<your-service-principal-client-secret>
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<Storage_Account>.dfs.core.windows.net", 
               "https://login.microsoftonline.com/<Tenant-ID>/oauth2/token") #<your-tenant-id>

file_path = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/customer/customers.csv"

# Read the CSV file into a DataFrame
df = spark.read.format('csv').option('header', 'true').load(file_path)

# Display the DataFrame
display(df)

Regards;
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Leave a Reply