Usage
pandasai
is developed on top of pandas
api. The objective is to make dataframe conversation
using Large Language Models (LLMs).
Installation
To use pandasai, first install it using pip through PyPi package distribution framework. It is actively developed so be vigilant for versions updates.
pip install pandasai
It is recommended to create a Virtual environment using your preffred choice of Environment Managers e.g conda, Poetry etc
Getting Started
Below is simple example to get started with pandasai
.
import pandas as pd
from pandasai import PandasAI
# Sample DataFrame
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
# Instantiate a LLM
from pandasai.llm.openai import OpenAI
llm = OpenAI(api_token="YOUR_API_TOKEN")
pandas_ai = PandasAI(llm, conversational=False)
pandas_ai.run(df, prompt='Which are the 5 happiest countries?')
Generate openai API Token
Users are required to generate YOUR_API_TOKEN
. Follow below simple steps to generate your API_TOKEN with
openai.
- Go to https://openai.com/api/ and signup with your email address or connect your Google Account.
- Go to View API Keys on left side of your Personal Account Settings
- Select Create new Secret key
The API access to openai is a paid service. You have to set up billing. Read the Pricing information before experimenting.
Demo in Google Colab
Try out PandasAI in your browser:
Examples
Other examples are included in the repository along with samples of data.
Working with CSV
Example of using PandasAI with a CSV file
import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI
df = pd.read_csv("data/Loan payments data.csv")
llm = OpenAI()
pandas_ai = PandasAI(llm, verbose=True)
response = pandas_ai.run(df, "How many loans are from men and have been paid off?")
print(response)
# Output: 247 loans have been paid off by men.
Working is Pandas Dataframe
Example of using PandasAI with a Pandas DataFrame
import pandas as pd
from data.sample_dataframe import dataframe
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI
df = pd.DataFrame(dataframe)
llm = OpenAI()
pandas_ai = PandasAI(llm, verbose=True, conversational=False)
response = pandas_ai.run(df, "Calculate the sum of the gdp of north american countries")
print(response)
# Output: 20901884461056
Plotting
Example of using PandasAI to generate a chart from a Pandas DataFrame
import pandas as pd
from data.sample_dataframe import dataframe
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI
df = pd.DataFrame(dataframe)
llm = OpenAI()
pandas_ai = PandasAI(llm)
response = pandas_ai.run(
df,
"Plot the histogram of countries showing for each the gpd, using different colors for each bar",
)
print(response)
# Output: check out images/histogram-chart.png
Working with multiple dataframes
Example of using PandasAI with multiple Pandas DataFrames
import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI
employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}
salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}
employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)
llm = OpenAI()
pandas_ai = PandasAI(llm, verbose=True, conversational=False)
response = pandas_ai.run(
[employees_df, salaries_df],
"Who gets paid the most?",
)
print(response)
# Output: Olivia gets paid the most.