- Course Overview
- Course Outline
This course provides learners with in-depth knowledge about data science, by explaining data and how data affect businesses, how data is handled in different ways and how it can be useful or fruitful for prediction purposes.
This course is fruitful for the learners who start at data science and don’t know what it is? This course equally describes data science purpose and its usage with its advantages.
Chapter 1) Introduction to Data Science
Topics Covered:
- Data Science
- Data Scientist
- Roles of Data Scientist
- Learning the application domain
- Communicating with data users
- Seeing the big picture of a complex system
- Knowing how data can be represented
- Data transformation and analysis
- Visualization and presentation
- Attention to quality
- Ethical reasoning
- What is Data?
- History of Data
- Types of Data
- Structured data
- Semi-Structured data
- Unstructured data
- Difference between Structured, Unstructured and Semi-structured data
- Identifying Data Problems
- Approach
Chapter 2) Introduction to Big Data
Topics Covered:
- Big Data
- What Does Big Data Look Like?
- Three V’s
- Volume
- Velocity
- Variety
- Appropriate Data
- Why Big Data
- Gather Data
- Setting The Goal
- Growing of Big Data Sources
- Transportation, logistics, retail, utilities, and telecommunications.
- Health care
- Government
- Entertainment media
- Life sciences
- Video surveillance
- Deep Dive in Big Data Sources
- Financial transactions
- Smart instrumentation
- Mobile telephony
- Importance of Public Information
- Focus Points
Chapter 3) Introduction to Apache Kafka
Topics Covered:
- Introduction
- Why Kafka
- How Organization Handle Data flow: A Mess
- Apache Kafka: A Distributed System
- Kafka Origin
- Why Kafka was developed
- Decoupling Producers and Consumers
- Basics
- Broker Replication
- Producer Basics
- Consumer Basics
- Distributed Consumption
- Topic
- Topic, Partition and Segments
- Logs
- Applications of KAFKA
- Application 1: Royal Bank of Canada (RBC)
- Application 2: Twitter
- Application 3: LinkedIn
- Application 4: Netflix
- Advantages
- Disadvantages
- Kafka Clients
Chapter 4) Introduction to Distributed Data Processing
Topics Covered:
- Processing approaches
- Distributed Data Processing
- Why is DDP Increasing?
- DDP today
- Benefits of DDP
- Drawbacks of DDP
- Reasons for DDP
- Client/Server Architecture (C/S)
- Intranets
- Extranets
- Distributed applications
- Other forms of DDP
- Database types for distributed data
- Networking Implications
- Availability
- Performance
- Trends in Distributed Systems and computing
- The Modern Internet
- Pervasive Networking and The Modern Internet
- Mobile and Ubiquitous Computing
- Example
- Health Care Systems (HCS)
- Issues of HCS
- Distributed Multimedia Systems
- Demands of a Distributed Multimedia Systems
- Distributed Computing As Utility
- Enablers and Advantages
- The precursor to Cloud, Grid
- Open Challenges in Distributed Computing
Chapter 5) Introduction to Machine Learning
Topics Covered:
- Overview of Machine Learning
- Machine Learning
- Machine Learning Process Lifecycle
- Traditional Machine Learning
- Learning Dimensions
- Supervised Learning
- Supervised learning problems
- Unsupervised Learning
- Unsupervised learning problems
- Reinforcement Learning
- Supervised Learning
- Machine Learning Extended
- Classification Task
- Supervised learning classifier
- Unsupervised learning hierarchy
- Unsupervised learning classifier
- Dimensionality Reduction
- Dimensionality Reduction Algorithms
- Ensemble Methods
- Ensemble Methods Algorithms
- Instance-Based Learning Algorithms
- Machine learning Tools and Frameworks
- Machine learning Applications
> Concepts
> References