Generating natural language descriptions for charts—also called ‘chart-to-text’—is a growing field in data visualization. The blog post discusses some of the solutions that can automate the summarization of Power BI visuals. While the solutions do not expound on any specific functional use case, a few possible generic ones are accessibility scenarios (for example, reading a report aloud from text generated as a summary, integration with voice bots, and ready-to-consume summary for top management presentations).
Here’s how the following five approaches help in implementing a solution to generate Power BI-based commentary:-
- PBI OOB- Smart narratives
- PBI custom visual- ARRIA
- PBI embedding- Non-ML approach
- Chart2Text- pyTorch approach
- Azure open AI service
The following chart covers the key pros and cons of each of these five approaches.
Figure An Overview of Five Approaches Helpful in implementing a Solution to Generate Power BI-based Commentary
- PBI OOB- Smart narratives
The smart narratives visual is available as a standard visual on the Power BI desktop. The smart narratives visual automatically generates a highly customizable summary of the report's visual that can be easily edited using the text box commands.
By allowing the users to personalize the visuals, the Power BI smart narrative will help them to change any visuals of the published reports to smart narratives during consumption to generate the summary.
Here is the narrative generated for a simple data
The smart narratives visual is available as a standard visual on the Power BI desktop.
The ‘Get Insights’ feature in the Power BI service also generates a summary and delivers deeper insights about the current reporting page or the selected visual. This kind of summary enables the identification of anomalies, trends, and KPI analysis. As of now (Jan-2022) the ‘Get Insights’ feature is available only on premium workspaces.
- PBI custom visual- ARRIA
This also requires the creation of new/dedicated visuals in Power BI to get the summary based on the chosen metrics and dimensions.
ARRIA Power BI is a custom visual available in the marketplace that generates the summary leveraging Arria’s Insight APIs. Hence the user needs to get an Arria account to gain access to the summary generated by the solution.
One of the available solutions, the NLG Apps in ARRIA visual, helps to narrate the selected data attributes in Power BI dataset. The visual generates the narratives after a simple configuration setup.
- PBI embedding- non-ML approach
Power BI reports and dashboards can be embedded within external applications. The idea here is to leverage the embedded APIS and SDK libraries to build a user experience that integrates the interactive Power BI reports with other scripting libraries for generating a customizable visual summary without altering the reports’ visual experience.
Power BI reports and dashboards can be embedded within external applications.
For exploring this option, the powerbiclient Python package is considered with a jupyter notebook. This python package lets us embed Power BI reports in Jupyter notebooks and enables the ability to export data from visuals of the embedded report to the Jupyter notebook for in-depth data exploration.
Once the report is embedded in the notebook, one can interact with it and extract the needed details for generating the summary.
The underlying data can be exported for the data to be available as the Pandas DataFrame. By applying the wide range of available data exploration functions from Pandas, one can generate detailed narratives for the underlying data.
For example, the following figure illustrates data exported from a visual on the right:-
The smart narratives text generated for the same visual is given below for comparison
- Chart2Text- pyTorch approach
There are some AI/ML solutions available for generating natural language explanation from the charts data. One such solution can be found in the following github repo- https://github.com/JasonObeid/Chart2Text
It is a pyTorch-based solution with many training datasets. It also provides a pre-trained model for direct consumption.
Figure 4 Please consider adding a relevant caption This model takes the title and data as input for generating the summary.
The extracted chart data is formatted with the following pattern and passed as input to the model.
"AXIS_LABEL|VALUE|AXIS_TYPE|CHART_TYPE"
Since this method needs significant training to improve the accuracy of the output summary, this training aspect has not been delved deep into. This approach, however, provides a platform-agnostic solution and can address the challenges of automated summary generation for various use cases. Hence, such solutions need to be explored in the future.
- Azure open AI service
The open AI service provides dynamic language models to power applications. It provides options to fine-tune language models to meet specific needs for a variety of use cases, from summarization to content generation. Open AI Service runs on the Azure. As of now, it is available only for private preview.
In a POC done by HCLTech, we have generated the summary in two steps. The first step involves the generation of the descriptive commentary on the tabular data passed and the output of this step is passed to the second level for the final summary.
This solution only needs very few sample datasets with a summary for quick training. The engine will study the summary generation from these few examples and will start generating results for new datasets.
The engine used for the generation of description is- davinci-instruct-beta-v3
The engine used for the generation of the final summary is- davinci
Compared to other approaches, the open AI approach produces results quickly for any unknown dataset by consuming very few training samples. This can be helpful in many use cases, such as delivering summary via chatbot, and for the development of screen narrative apps to describe the insights of BI solutions for visually impaired users.
Conclusion
I hope the above gives the reader a quick glimpse into the five options that I have come across and explored. For the sake of brevity, I have refrained from going into the next-level technical details, leaving them up to the readers to discover. For any data-informed enterprise the ability to generate automated, and hence readily consumable reports, is a huge step toward becoming truly digital. It is important to note that since this is a relatively new area, enterprises would need to let the technology mature further to become production-scale. We are slowly, but surely, headed in the right direction.
We continue to closely watch the plans of Microsoft on exposing the smart narratives/get insights features via API/SDKs, that would eventually help in creating a scalable solution for summary generation use cases with optimal effort.