好书推荐 好书速递 排行榜 读书文摘

Python for Data Analysis

Python for Data Analysis
作者:Wesly McKinney
副标题:Python大数据分析基础教程
出版社:O'Reilly Media
出版年:2013-06
ISBN:9781549329784
行业:计算机
浏览数:3

内容简介

这本书主要是用 pandas 连接 SciPy 和 NumPy,用pandas做数据处理是Pycon2012上一个很热门的话题。另一个功能强大的东西是Sage,它将很多开源的软件集成到统一的 Python 接口。

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.

Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing.

Use the IPython interactive shell as your primary development environment

Learn basic and advanced NumPy (Numerical Python) features

Get started with data analysis tools in the pandas library

Use high-performance tools to load, clean, transform, merge, and reshape data

Create scatter plots and static or interactive visualizations with matplotlib

Apply the pandas groupby facility to slice, dice, and summarize datasets

Measure data by points in time, whether it’s specific instances, fixed periods, or intervals

Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples

......(更多)

作者简介

Wes McKinney 资深数据分析专家,对各种Python库(包括NumPy、pandas、matplotlib以及IPython等)等都有深入研究,并在大量的实践中积累了丰富的经验。撰写了大量与Python数据分析相关的经典文章,被各大技术社区争相转载,是Python和开源技术社区公认的权威人物之一。开发了用于数据分析的著名开源Python库——pandas,广获用户好评。在创建Lambda Foundry(一家致力于企业数据分析的公司)之前,他曾是AQR Capital Management的定量分析师。

......(更多)

目录

Chapter 1 Preliminaries

What Is This Book About?

Why Python for Data Analysis?

Essential Python Libraries

Installation and Setup

Community and Conferences

Navigating This Book

Acknowledgements

Chapter 2 Introductory Examples

1.usa.gov data from bit.ly

MovieLens 1M Data Set

US Baby Names 1880-2010

Conclusions and The Path Ahead

Chapter 3 IPython: An Interactive Computing and Development Environment

IPython Basics

Using the Command History

Interacting with the Operating System

Software Development Tools

IPython HTML Notebook

Tips for Productive Code Development Using IPython

Advanced IPython Features

Credits

Chapter 4 NumPy Basics: Arrays and Vectorized Computation

The NumPy ndarray: A Multidimensional Array Object

Universal Functions: Fast Element-wise Array Functions

Data Processing Using Arrays

File Input and Output with Arrays

Linear Algebra

Random Number Generation

Example: Random Walks

Chapter 5 Getting Started with pandas

Introduction to pandas Data Structures

Essential Functionality

Summarizing and Computing Descriptive Statistics

Handling Missing Data

Hierarchical Indexing

Other pandas Topics

Chapter 6 Data Loading, Storage, and File Formats

Reading and Writing Data in Text Format

Binary Data Formats

Interacting with HTML and Web APIs

Interacting with Databases

Chapter 7 Data Wrangling: Clean, Transform, Merge, Reshape

Combining and Merging Data Sets

Reshaping and Pivoting

Data Transformation

String Manipulation

Example: USDA Food Database

Chapter 8 Plotting and Visualization

A Brief matplotlib API Primer

Plotting Functions in pandas

Plotting Maps: Visualizing Haiti Earthquake Crisis Data

Python Visualization Tool Ecosystem

Chapter 9 Data Aggregation and Group Operations

GroupBy Mechanics

Data Aggregation

Group-wise Operations and Transformations

Pivot Tables and Cross-Tabulation

Example: 2012 Federal Election Commission Database

Chapter 10 Time Series

Date and Time Data Types and Tools

Time Series Basics

Date Ranges, Frequencies, and Shifting

Time Zone Handling

Periods and Period Arithmetic

Resampling and Frequency Conversion

Time Series Plotting

Moving Window Functions

Performance and Memory Usage Notes

Chapter 11 Financial and Economic Data Applications

Data Munging Topics

Group Transforms and Analysis

More Example Applications

Chapter 12 Advanced NumPy

ndarray Object Internals

Advanced Array Manipulation

Broadcasting

Advanced ufunc Usage

Structured and Record Arrays

More About Sorting

NumPy Matrix Class

Advanced Array Input and Output

Performance Tips

Appendix Python Language Essentials

The Python Interpreter

The Basics

Data Structures and Sequences

Functions

Files and the operating system

......(更多)

读书文摘

数组切片是原始数据的视图。这意味着数据不会被复制,视图上的任何修改都会直接反映到源数组上。

records = [json.loads(line) for line in open(path)]

......(更多)

猜你喜欢

点击查看