- Published on
Intro to Data Analysis - Data Reading
- Ondiek Elijah
Originally posted on Lux Tech Academy blog.
With today's technology advances, data is without a doubt the most important component for institutions, organizations, and all other entities. As a result, there is an urgent need to leverage the available data to make a difference.
Data analytics focuses on processing and performing statistical analysis on existing datasets, with a focus on developing techniques to capture and organize data to uncover actionable insights for ongoing problems, as well as determining the best manner to communicate this data.
Data analysis is a type of data analytics that is used in businesses to examine data and draw conclusions. Data gathering, data cleaning, data analysis, and data intercept are the steps taken in data analysis to ensure that you comprehend what your data is trying to communicate.
Source — Stack Overflow
As an introduction to data analysis, this post will teach you how to read data that is offered in various formats such as csv, json, or even as a database file.
Table of Contents
Reading data from a CSV file
To read data from a comma-separated values (csv) file into DataFrame we use the
The read_csv function accepts numerous parameters, the type of which depends on the nature of your dataset or your aim. Among the most frequently used parameters, excluding the mandatory
sep,delimiter,header, index_col e.t.c
Read comma separated file
The sep parameter, which is short for separator, essentially tells the interpreter how to separate the data items in our CSV file.The interpreter assumes that the delimiter used is a comma by default if the sep parameter is not given.
from pyforest import * df = pd.read_csv("cereal.csv") df.head()
Read tab separated file
from pyforest import * df = pd.read_csv("cereal_tab.csv",sep='\t') df.head()
Read semicolon separated file
from pyforest import * df = pd.read_csv("cereal_semicolon.csv",sep=';') df.head()
Reading Data in SQL flavour
This section involves reading data from various SQL relational databases using pandas.
from pyforest import * from sqlalchemy import create_engine # provide a connection string/URL db_connection_str = "mysql+mysqlconnector://mysql_username:mysql_user_password@localhost/mysql_db_name" # produce an Engine object based on a URL db_connection = create_engine(db_connection_str) # read SQL query or database table into a DataFrame. df = pd.read_sql('SELECT * FROM table_name', con=db_connection) # return the first 5 rows of the dataframe df.head()
Source — Stack Overflow
from pyforest import * from sqlalchemy import create_engine # produce an Engine object based on a postgresql database URL engine = create_engine("postgresql:///psql_dbname") # read SQL query or database table into a DataFrame. df = pd.read_sql('select * from "user"',con=engine) # return the first 5 rows of the dataframe df.head()
from pyforest import * from sqlalchemy import create_engine # connect to a database engine = create_engine("sqlite:///database.db") # read database data into a pandas DataFrame df = pd.read_sql('select * from user', engine) # return the first 5 rows of the dataframe df.head()
Reading data from JSON files.
Reading data from a JSON file is as simple as reading data from a CSV file. The
pandas.read_json function transforms a JSON string to a pandas object with ease. The first parameter it accepts is
path_or_bufa, which must be a valid JSON str, path object, or file-like object. This function also has a number of other parameters that it takes.
from pyforest import * df = pd.read_json('cereal_default.json') df.head()
If you enjoyed this article, please leave a comment, like it, share it, and follow me on Twitter @dev_elie.