Archive for January, 2024

How to send the output of SQL query in CSV with TAB as the delimiter using database mail

Posted in Database Mail on January 29, 2024| Leave a Comment »

When I set the @query_result_separator to either space or comma, it outputs the data in one column, separated by the separator defined. To get it to output the data into its original columns, press TAB keyboard as the value for the @query_result_separator.

EXEC msdb.dbo.sp_send_dbmail

    @profile_name = 'profile_integration_errors',

    @recipients = 'x@YXZ.com',

    @query = '	SELECT * FROM ExceptionReport
',

    @subject = 'Exception Report Attached',

    @body = 'Please find the report attached',

    @body_format = 'HTML',

    @query_result_header = 1,

    @query_attachment_filename = 'ExceptionReport.csv',

    @attach_query_result_as_file = 1,

    @query_result_separator = '	',

    @query_result_width = 32767,

    @query_result_no_padding = 1;

Read Full Post »

How to load the data from API into ADLS gen2 using python in Azure Databricks

Posted in Azure Databricks, python, tagged azure, data-engineering, databricks, pyspark, python on January 25, 2024| Leave a Comment »

Instead of hard coding the credentials in the Notebook, we can reference them in Azure Key Vault (AKV).

I am authenticating my API using header and key, so I created 2 secret keys in AKV to store the API header and API key.

I am accessing my ADLS gen 2 using ADLS access key, so I created another secret in AKV to store the ADLS access key.

To establish a connection from Azure Databricks to ADLS, I use Azure Key Vault-backed scope. A scope is collection of secrets identified by a name.

Steps to configure the Azure Key Vault-backed scope for Azure Databricks:

Go to <a href=”https://https://<databricks-instance>#secrets/createScope. This URL is case sensitive; scope in createScope must be uppercase. To get the databricks-instance, launch the Azure Databricks workspace. Copy the URL as highlighted below, which is the databricks-instance.

Replace the databricks-instance with the URL you have just copied.

Enter the Scope Name and set the Mange Principal to All Users.

Go to Azure Key Vault to get the DNS Name and Resource ID. Enter the DNS Name and Resource ID for Azure Key Vault.

Hit the Create button to create the scope.

To test, open a Notebook in Databricks and then enter the python code below:
display(dbutils.secrets.listScopes())

To get the secret from Azure Key Vaults, use the scope to retrieve the secrets. See the python code below in the Notebook:
adls_account_key = dbutils.secrets.get(scope = "akv_databricks_secretscope", key = "adlsAccountKey") apiKey = dbutils.secrets.get(scope = "akv_databricks_secretscope", key = "APIUserKey") apiHeader = dbutils.secrets.get(scope = "akv_databricks_secretscope", key = "APIHeader")

headers= {apiHeader: apiKey}

To access ADLS gen 2, use the python code below. Replace the <access key> with adls_account_key.

spark.conf.set("fs.azure.account.key.<storage-account>.dfs.core.windows.net","<access key>")

To access a container in the ADLS gen 2:

abfss://container_name@storage_account.dfs.core.windows.net/folder_path/file_name

My API config file in JSON is stored in the bronze container of the ADLS. To read this config file and cache the parsed results, use the python code below:

config_json_data = spark.read.json(config_json, multiLine=True, mode="FAILFAST").cache()

My config file looks like below:

{

    "sites": ["corporate", "olympicdam", "porphyry", "thunderbox", "wilkiecreek", "paradigm"],

    "base_url": "https://data.fleet4d.net",

}

To retrieve the base_url from the config file, use the python code below:

var_base_url = config_json_data.select("base_url").collect()[0]["base_url"]

To loop through each site, use the python code below:

for i in range(len(item["sites"])):         

      var_site = item["sites"][i]
      var_url = f'{var_base_url}/{var_site}/masterdata'

To call the API and get its response back:

response=requests.request("GET", var_url, headers = headers)

Convert the response to string:

response_text=response.text

The API response is in XML form. To get each item in the tree and write to ADLS:

root=ET.fromstring(response_text)
#Iterate through each element of the root

for child in root.findall('device'):         

           id = child.attrib.get('id')

           itemid = child.attrib.get('itemid')

           name = child.attrib.get('name')

           description = child.attrib.get('description')

           typeX = child.attrib.get('type')

           subtype = child.attrib.get('subtype')

           ipaddress = child.attrib.get('ipaddress')    

           vendor = child.attrib.get('vendor') 

           enabled = child.attrib.get('enabled')   

           modified = child.attrib.get('modified')



           #Create tuples

           values = [id, itemid, name, description, typeX, subtype, ipaddress, vendor, enabled, modified]

           data.append(values)

           #print(values)
           
           #Create Spark dataframe from a list of tuples

           df = spark.createDataFrame(data, schema)

#Write to ADLS gen 2 as csv

”’

You get multiple files in a folder because spark writes each shuffle partition in-place out to a “part…” file to avoid network I/O. You can use coalesce to bring all the shuffles into a single partition and write it out to a single file but be mindful of the performance implications.

”’

df.coalesce(1).write.mode(‘overwrite’).format(‘csv’).option(“header”, True).save(data_file_path)

#Remove _SUCCESS file

spark.conf.set(“spark.databricks.io.directoryCommit.createSuccessFile”,”false”)

# Set file path

dir_path = os.path.dirname(data_file_path)

success_file_path = f'{dir_path}/_SUCCESS’

# Remove _SUCCESS file

if os.path.exists(success_file_path):

os.remove(success_file_path)

Spark writes split the csv into parts and created _commited, _started and _SUCCESS files and place them all in a folder, as shown below. To prevent it from splitting the csv into parts, use the COALESCE command in your code, as mentioned above.

Read Full Post »

How to access ADLS gen 2 from Azure Databricks

Posted in ADF, Azure Databricks on January 25, 2024| Leave a Comment »

Using Access Keys

When Storage account is created, you get two 512-bit access keys

Access Method

abfss://container_name@storage_account.dfs.core.windows.net/folder_path/file_name

Access via python notebook

spark.conf.set("fs.azure.account.key.<storage-account>.dfs.core.windows.net","<access key>",

References :

ABFS-Driver

Acro names

ABFS (Azure Blob File System)

Databricks Scope

Instead of directly putting your secret value in the code, place key inside a scope. Use the scope name to get the value of the key.

Read Full Post »

Azure Databricks – Secret redaction

Posted in Azure Databricks on January 22, 2024| Leave a Comment »

If you receive a message of [REDACTED] in the output of your code, it means that Databricks replaced the secret values with [REDACTED].

https://docs.databricks.com/en/security/secrets/redaction.html

Read Full Post »

Azure DevOps build pipeline is not picking up the latest Visual Studio version

Posted in Devops on January 17, 2024| Leave a Comment »

I set up a build pipeline on my SQL Server database repository and I am using Visual Studio 2022.

My build pipeline failed to run with the error below:

##[error]C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Microsoft\VisualStudio\v16.0\SSDT\Microsoft.Data.Tools.Schema.SqlTasks.targets(573,5): Error MSB4181: The "SqlBuildTask" task returned false but did not log an error.

It has picked up the MSBuild.exe from the Visual Studio 2019 path instead of from Visual Studio 2022.

To fix this, set the Agent Specification to windows-2022, instead of windows-latest.

If you are using Microsoft-hosted agent windows-latest or windows-2022, with your task, it will use VS2022 by default. If you’d like to use VS2019, change the agent to “Windows-2019”.

Read Full Post »

Power BI – how to get a list of users from the Active Directory

Posted in Power BI, Power Query M on January 10, 2024| Leave a Comment »

Use the Power Query below and then select the user table.

Syntax:

ActiveDirectory.Domains(optional forestRootDomainName as nullable text) as table

Source = ActiveDirectory.Domains("company.internal")

Expand the organizationalPerson to get the user attributes.

https://learn.microsoft.com/en-us/powerquery-m/activedirectory-domains

Read Full Post »

Power BI – enable automatic page refresh

Posted in Power BI on January 8, 2024| Leave a Comment »

I’ve got a Power BI report that is using a direct query to SQL Server database.

According to this article, https://learn.microsoft.com/en-us/power-bi/connect-data/refresh-data,

” Power BI doesn’t import data over connections that operate in DirectQuery mode. Instead, the semantic model returns results from the underlying data source whenever a report or dashboard queries the semantic model. Power BI transforms and forwards the queries to the data source.

Because Power BI doesn’t import the data, you don’t need to run a data refresh. However, Power BI still performs tile refreshes and possibly report refreshes, as the next section on refresh types explains. A tile is a report visual pinned to a dashboard, and dashboard tile refreshes happen about every hour so that the tiles show recent results. You can change the schedule in the semantic model settings.”

Image source : https://learn.microsoft.com/en-us/power-bi/connect-data/refresh-data

So, it’s important to set up an automatic page refresh on the report itself. To set it, open the pbix file then go to the Format properties, as shown below.

https://learn.microsoft.com/en-us/power-bi/create-reports/desktop-automatic-page-refresh

Read Full Post »

ChinChin's space

Just another WordPress.com site

Archive for January, 2024

How to send the output of SQL query in CSV with TAB as the delimiter using database mail

How to load the data from API into ADLS gen2 using python in Azure Databricks

How to access ADLS gen 2 from Azure Databricks

Using Access Keys

Access Method

Access via python notebook

References :

Acro names

Databricks Scope

Azure Databricks – Secret redaction

Azure DevOps build pipeline is not picking up the latest Visual Studio version

Power BI – how to get a list of users from the Active Directory

Power BI – enable automatic page refresh

Recent Posts

Archives

Categories

Meta

Just another WordPress.com site

Archive for January, 2024

Share this:

Share this:

Using Access Keys

Access Method

Access via python notebook

References :

Acro names

Databricks Scope

Share this:

Share this:

Share this:

Share this:

Share this:

Recent Posts

Archives

Categories

Meta