Python Data Science Stack

Comprehensive data analysis and visualization with pandas, numpy, and matplotlib

# Python Data Science

This document provides comprehensive guidelines for python data science development and best practices.

---

## NumPy Fundamentals

1. **N-dimensional**
   - N-dimensional array operations
   - Implement proper n-dimensional array operations
   - Follow best practices for optimal results

2. **Broadcasting**
   - Broadcasting and vectorization
   - Implement proper broadcasting and vectorization
   - Follow best practices for optimal results

3. **Mathematical**
   - Mathematical functions and linear algebra
   - Implement proper mathematical functions and linear algebra
   - Follow best practices for optimal results

4. **Random**
   - Random number generation
   - Implement proper random number generation
   - Follow best practices for optimal results

5. **Array**
   - Array indexing and slicing
   - Implement proper array indexing and slicing
   - Follow best practices for optimal results

---

## Pandas Data Manipulation

6. **DataFrame**
   - DataFrame and Series operations
   - Implement proper dataframe and series operations
   - Follow best practices for optimal results

7. **Data**
   - Data loading from various sources (CSV, JSON, SQL)
   - Implement proper data loading from various sources (csv, json, sql)
   - Follow best practices for optimal results

8. **Data**
   - Data cleaning and preprocessing
   - Implement proper data cleaning and preprocessing
   - Follow best practices for optimal results

9. **Missing**
   - Missing data handling strategies
   - Implement proper missing data handling strategies
   - Follow best practices for optimal results

10. **Data**
   - Data type optimization
   - Implement proper data type optimization
   - Follow best practices for optimal results

---

## Data Exploration

11. **Descriptive**
   - Descriptive statistics and summaries
   - Implement proper descriptive statistics and summaries
   - Follow best practices for optimal results

12. **Data**
   - Data profiling and quality assessment
   - Implement proper data profiling and quality assessment
   - Follow best practices for optimal results

13. **Correlation**
   - Correlation analysis
   - Implement proper correlation analysis
   - Follow best practices for optimal results

14. **Outlier**
   - Outlier detection and treatment
   - Implement proper outlier detection and treatment
   - Follow best practices for optimal results

15. **Exploratory**
   - Exploratory data analysis (EDA)
   - Implement proper exploratory data analysis (eda)
   - Follow best practices for optimal results

---

## Data Transformation

16. **Filtering**
   - Filtering and querying data
   - Implement proper filtering and querying data
   - Follow best practices for optimal results

17. **Groupby**
   - Groupby operations and aggregations
   - Implement proper groupby operations and aggregations
   - Follow best practices for optimal results

18. **Pivot**
   - Pivot tables and crosstabs
   - Implement proper pivot tables and crosstabs
   - Follow best practices for optimal results

19. **Merging**
   - Merging and joining datasets
   - Implement proper merging and joining datasets
   - Follow best practices for optimal results

20. **Reshaping**
   - Reshaping data (melt, stack, unstack)
   - Implement proper reshaping data (melt, stack, unstack)
   - Follow best practices for optimal results

---

## Time Series Analysis

21. **DateTime**
   - DateTime indexing and resampling
   - Implement proper datetime indexing and resampling
   - Follow best practices for optimal results

22. **Time-based**
   - Time-based operations and rolling windows
   - Implement proper time-based operations and rolling windows
   - Follow best practices for optimal results

23. **Seasonal**
   - Seasonal decomposition
   - Implement proper seasonal decomposition
   - Follow best practices for optimal results

24. **Trend**
   - Trend analysis and forecasting
   - Implement proper trend analysis and forecasting
   - Follow best practices for optimal results

25. **Working**
   - Working with different time zones
   - Implement proper working with different time zones
   - Follow best practices for optimal results

---

## Data Visualization

26. **Matplotlib**
   - Matplotlib for basic plotting
   - Implement proper matplotlib for basic plotting
   - Follow best practices for optimal results

27. **Seaborn**
   - Seaborn for statistical visualizations
   - Implement proper seaborn for statistical visualizations
   - Follow best practices for optimal results

28. **Plotly**
   - Plotly for interactive charts
   - Implement proper plotly for interactive charts
   - Follow best practices for optimal results

29. **Best**
   - Best practices for effective visualization
   - Implement proper best practices for effective visualization
   - Follow best practices for optimal results

30. **Dashboard**
   - Dashboard creation with Streamlit
   - Implement proper dashboard creation with streamlit
   - Follow best practices for optimal results

---

## Statistical Analysis

31. **Hypothesis**
   - Hypothesis testing
   - Implement proper hypothesis testing
   - Follow best practices for optimal results

32. **Confidence**
   - Confidence intervals
   - Implement proper confidence intervals
   - Follow best practices for optimal results

33. **Regression**
   - Regression analysis with statsmodels
   - Implement proper regression analysis with statsmodels
   - Follow best practices for optimal results

34. **ANOVA**
   - ANOVA and chi-square tests
   - Implement proper anova and chi-square tests
   - Follow best practices for optimal results

35. **Distribution**
   - Distribution fitting and testing
   - Implement proper distribution fitting and testing
   - Follow best practices for optimal results

---

## Machine Learning Integration

36. **Scikit-learn**
   - Scikit-learn for traditional ML
   - Implement proper scikit-learn for traditional ml
   - Follow best practices for optimal results

37. **Feature**
   - Feature engineering and selection
   - Implement proper feature engineering and selection
   - Follow best practices for optimal results

38. **Model**
   - Model evaluation and cross-validation
   - Implement proper model evaluation and cross-validation
   - Follow best practices for optimal results

39. **Pipeline**
   - Pipeline creation for reproducibility
   - Implement proper pipeline creation for reproducibility
   - Follow best practices for optimal results

40. **Hyperparameter**
   - Hyperparameter tuning
   - Implement proper hyperparameter tuning
   - Follow best practices for optimal results

---

## Big Data Tools

41. **Dask**
   - Dask for parallel computing
   - Implement proper dask for parallel computing
   - Follow best practices for optimal results

42. **Vaex**
   - Vaex for out-of-core processing
   - Implement proper vaex for out-of-core processing
   - Follow best practices for optimal results

43. **Apache**
   - Apache Spark with PySpark
   - Implement proper apache spark with pyspark
   - Follow best practices for optimal results

44. **Memory**
   - Memory optimization techniques
   - Implement proper memory optimization techniques
   - Follow best practices for optimal results

45. **Chunked**
   - Chunked data processing
   - Implement proper chunked data processing
   - Follow best practices for optimal results

---

## Jupyter Notebook Best Practices

46. **Notebook**
   - Notebook organization and structure
   - Implement proper notebook organization and structure
   - Follow best practices for optimal results

47. **Code**
   - Code cell optimization
   - Implement proper code cell optimization
   - Follow best practices for optimal results

48. **Markdown**
   - Markdown documentation
   - Implement proper markdown documentation
   - Follow best practices for optimal results

49. **Version**
   - Version control for notebooks
   - Implement proper version control for notebooks
   - Follow best practices for optimal results

50. **Reproducible**
   - Reproducible research practices
   - Implement proper reproducible research practices
   - Follow best practices for optimal results

---

## Data Pipeline Development

51. **ETL**
   - ETL pipeline creation
   - Implement proper etl pipeline creation
   - Follow best practices for optimal results

52. **Data**
   - Data validation and quality checks
   - Implement proper data validation and quality checks
   - Follow best practices for optimal results

53. **Automated**
   - Automated data processing workflows
   - Implement proper automated data processing workflows
   - Follow best practices for optimal results

54. **Error**
   - Error handling and logging
   - Implement proper error handling and logging
   - Follow best practices for optimal results

55. **Scheduling**
   - Scheduling with Apache Airflow
   - Implement proper scheduling with apache airflow
   - Follow best practices for optimal results

---

## Performance Optimization

56. **Vectorization**
   - Vectorization over loops
   - Implement proper vectorization over loops
   - Follow best practices for optimal results

57. **Memory**
   - Memory usage optimization
   - Implement proper memory usage optimization
   - Follow best practices for optimal results

58. **Parallel**
   - Parallel processing with multiprocessing
   - Implement proper parallel processing with multiprocessing
   - Follow best practices for optimal results

59. **Cython**
   - Cython for performance-critical code
   - Implement proper cython for performance-critical code
   - Follow best practices for optimal results

60. **Profiling**
   - Profiling and bottleneck identification
   - Implement proper profiling and bottleneck identification
   - Follow best practices for optimal results

---

## Database Integration

61. **SQL**
   - SQL queries with pandas
   - Implement proper sql queries with pandas
   - Follow best practices for optimal results

62. **SQLAlchemy**
   - SQLAlchemy for database connections
   - Implement proper sqlalchemy for database connections
   - Follow best practices for optimal results

63. **NoSQL**
   - NoSQL database integration
   - Implement proper nosql database integration
   - Follow best practices for optimal results

64. **Data**
   - Data warehousing concepts
   - Implement proper data warehousing concepts
   - Follow best practices for optimal results

65. **Cloud**
   - Cloud database connectivity
   - Implement proper cloud database connectivity
   - Follow best practices for optimal results

---

## Deployment & Production

66. **API**
   - API development with FastAPI
   - Implement proper api development with fastapi
   - Follow best practices for optimal results

67. **Containerization**
   - Containerization for reproducibility
   - Implement proper containerization for reproducibility
   - Follow best practices for optimal results

68. **Cloud**
   - Cloud deployment strategies
   - Implement proper cloud deployment strategies
   - Follow best practices for optimal results

69. **Monitoring**
   - Monitoring data pipelines
   - Implement proper monitoring data pipelines
   - Follow best practices for optimal results

70. **A/B**
   - A/B testing frameworks
   - Implement proper a/b testing frameworks
   - Follow best practices for optimal results

---

## Summary Checklist

- [ ] Core principles implemented
- [ ] Best practices followed
- [ ] Performance optimized
- [ ] Security measures in place
- [ ] Testing strategy implemented
- [ ] Documentation completed
- [ ] Monitoring configured
- [ ] Production deployment ready

---

Follow these comprehensive guidelines for successful python data science implementation.
Python Data Science Stack - Cursor IDE AI Rule