Saturday, March 5, 2016

Hook - Book Notes

Hook - Notes
  • Trigger -
    • Internal - Emotional
    • External  - Social
  • Action - Behaviour Driven 
    • Motivation - Energy for Action 
    • Ability - Easier action
  • Reward - Variable Kicks
    • Tribe - Social Gratification
    • Hunt - Game Playing
    • Self - Ramification
  • Investment - For next Cycle
    • Next Trigger - Urge to get excitement
    • Store Value - Reputation, Followers

Sunday, February 21, 2016

Digital Architecture

Business Centric Principals: 
  • Driven by Strategy
    • Digital Fitness Assessment 
    • Digital Generation Design 
    • Digital Strategy and Planning
  • Using Technology
    • Digital Asset Management 
    • Digitized Enterprise Operations 
    • Internet of things 
    • Digital Architecture 
  • Experiencing through Designing (redesign) 
    • Service Design 
    • Interaction Design 
    • User Centered Design Thinking 
Customer Centric Principals
  • Interaction
    • Intelligence
    • Integration 
    IT Centric Principals
    • Flexible System and Services  
      • Reactive - Asynchronous Systems
      • Modularization - API and Micro Services Architecture
      • Cloud Elasticity
    • Velocity and Insightful Data 
      • Event Driven - Lambda Arch (Ngnix, Spark, Kafka, NoSQL) 
      • Predictive Modelling - Machine Learning and Deep Learning Systems
    • Agile Engineering Process 
      • Platform Automation - DevOps 
      • Agile Team  

    Monday, December 28, 2015

    ML Algirithms - Cheat Sheets

    Mind-map of Algorithm:

    1. Regression Algorithms

    • Ordinary Least Squares Regression (OLSR)
    • Linear Regression
    • Logistic Regression
    • Stepwise Regression
    • Multivariate Adaptive Regression Splines (MARS)
    • Locally Estimated Scatterplot Smoothing (LOESS)

    2. Instance-based Algorithms

    • k-Nearest Neighbour (kNN)
    • Learning Vector Quantization (LVQ)
    • Self-Organizing Map (SOM)
    • Locally Weighted Learning (LWL)

    3. Regularization Algorithms

    • Ridge Regression
    • Least Absolute Shrinkage and Selection Operator (LASSO)
    • Elastic Net
    • Least-Angle Regression (LARS)

    4. Decision Tree Algorithms

    • Classification and Regression Tree (CART)
    • Iterative Dichotomiser 3 (ID3)
    • C4.5 and C5.0 (different versions of a powerful approach)
    • Chi-squared Automatic Interaction Detection (CHAID)
    • Decision Stump
    • M5
    • Conditional Decision Trees

    5. Bayesian Algorithms

    • Naive Bayes
    • Gaussian Naive Bayes
    • Multinomial Naive Bayes
    • Averaged One-Dependence Estimators (AODE)
    • Bayesian Belief Network (BBN)
    • Bayesian Network (BN)

    6. Clustering Algorithms

    • k-Means
    • k-Medians
    • Expectation Maximisation (EM)
    • Hierarchical Clustering

    7. Association Rule Learning Algorithms

    • Apriori algorithm
    • Eclat algorithm

    8. Artificial Neural Network Algorithms

    • Perceptron
    • Back-Propagation
    • Hopfield Network
    • Radial Basis Function Network (RBFN)

    9. Deep Learning Algorithms

    • Deep Boltzmann Machine (DBM)
    • Deep Belief Networks (DBN)
    • Convolutional Neural Network (CNN)
    • Stacked Auto-Encoders

    10. Dimensionality Reduction Algorithms

    • Principal Component Analysis (PCA)
    • Principal Component Regression (PCR)
    • Partial Least Squares Regression (PLSR)
    • Sammon Mapping
    • Multidimensional Scaling (MDS)
    • Projection Pursuit
    • Linear Discriminant Analysis (LDA)
    • Mixture Discriminant Analysis (MDA)
    • Quadratic Discriminant Analysis (QDA)
    • Flexible Discriminant Analysis (FDA)

    11. Ensemble Algorithms

    • Boosting
    • Bootstrapped Aggregation (Bagging)
    • AdaBoost
    • Stacked Generalization (blending)
    • Gradient Boosting Machines (GBM)
    • Gradient Boosted Regression Trees (GBRT)
    • Random Forest

    12. Other Algorithms

    • Computational intelligence (evolutionary algorithms, etc.)
    • Computer Vision (CV)
    • Natural Language Processing (NLP)
    • Recommender Systems
    • Reinforcement Learning
    • Graphical Models 
    http://antontarasenko.com/2015/12/28/machine-learning-for-economists-an-introduction/

    Cheat Sheet of ML Algorithm:


    http://eferm.com/machine-learning-cheat-sheet/
    http://eferm.com/wp-content/uploads/2011/05/cheat3.pdf 

    Azure ML Cheat Sheet: 

     Machine Learning Algorithm cheat sheet: Learn how to choose a Machine Learning algorithm.
    • https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-cheat-sheet/
    • https://azure.microsoft.com/en-in/documentation/articles/machine-learning-algorithm-choice/ 

    All ML Algo: 

     Others:

    • http://scikit-learn.org/stable/tutorial/machine_learning_map/
    • https://dzone.com/refcardz/machine-learning-predictive
    • http://www.lauradhamilton.com/machine-learning-algorithm-cheat-sheet
    • http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A434612 

    Friday, December 11, 2015

    ML - Course Notes

    2 - Linear Regression:
    • Problem 
      • Given a data set plotted on graph (x,y). 
      • Find out hypothesis which represent the data behavior
    • Hypothesis 
      • f(x) : Represent its behavior
      • f(x) = a + b Xi 
        • a and b controls the prediction
        • Slop of line which fits the data
    • Cost Function 
      • Minimize the Cost Function
        • J(T) = 1/2m * { Sum (Yi - f(Xi)) ^2 } : m is total nu of data set
        • J(T) = 1/2m * { Sum (Yi - (a + b Xi)) ^2 }
    • Gradient Descent 
      • Mechanism to find value of a and b where Cost function value is lowest 
      • Incrementally minimize the multivariate cost function 
      • Leads to Global and Local Minimum 
      • Repeat until convergence
        • Pj := Pj - Alpha * d/dPj (J(T0,T1))
        • P0 := P0 - Alpha * d/dP0 (J(P0))
        • Pj := Pj - Alpha * d/dPj (J(Pj) 
        • Pj := Pj - Alpha * J(Pj) * Pj 
        • Pj : Values of a, b  
        • Alpha : size of step ; Size of Prediction ; Automatically reduces alpha value
        • Update both T0 and T1 
    • Formula 
      • Hypothesis
        • f(x) = T1 X1 + T2 X2 
      • Cost Function 
        • J(T) = 1/2m * {Sum (1-m) (f(x) - Y) ^2}
          • Minimize the Cost Function
      • Gradient Descend: 
        • T1 := T1 - Alpha * d/da (J(T)) ; 
          • Repeat until convergence
          • Calculate all values of T before assigning value 
          • Increment till this assignment is Zero 
    • Terminology 
      • X - Feature Set (No of rooms) 
      • Y - Output Class (Price of House)
      • T - Parameters over Feature Set  
      • f(x) - Hypothesis 
      • J(T) - Cost function 
      • Alpha - Gradient Descent Size
    6 - Logistic Regression 
    • Definition 
      • Classification Problem, also known, Stochastic Regression 
      • Give probability of Hypothesis for a value to be classified between a range 
      • Draws a Decision boundary to classify a set of data of any share, line, circle, oval or any asymmetric shape 
    • Problem 
      • A Data set with classification of each data point 
    • Formula 
      • Hypothesis
        • f(x) = 1 / (1 + e^ - (T ' x) ) 
        • f(x) = - log (f(x)) for y = 1; 
        • f(x) = - log (1-f(x)) for y = 0;
      • Cost Function 
        • J(T) = - (y log (f(x)) + y log (1-f(x)))
        • J(T) = - 1/2m *  Sum (1-m) {Y log (f(x)) + (1 - Y) log (1-f(x)}  
          • Minimize the Cost
      • Gradient Descend: 
        • T := T - Alpha * d/da (J(T)) ; 
          • Repeat until convergence
          • Calculate all value before assigning value 
          • Increment till this assignment is Zero
    7 - Regularization:
    • Type of Irregularity 
      • Under-fitting / High Biased: Learned Hypothesis is too general does not fit well with given training set
      • Over-fitting / High Variance: Learned Hypothesis fits training set very well but fails to generalize on new data point.  
    • Mechanics:
      • Reduced no of Feature 
        • Manually Reduce number of feature
        • Model selection algorithm 
      • Regularize 
        • Keeping feature but minimize their magnitude and values 
        • Works well many features each contributing to y 
      • Regularization Factor 
        • Increase to decrease the influence the higher number of feature / polynomials
        • Decrease to increase the influence the higher number of feature / polynomials
    • Formula 
      • Cost Function: 
        • J(T) = 1/2m * {Sum (1-m) (f(x) - Y) ^2 + Lambda * 1/2m * Sum (1-m) T^2}
        • J(T) = - 1/2m *  Sum (1-m) {Y log (f(x)) + (1 - Y) log (1-f(x)}  
          • Minimize cost function
      • Gradient Descent: 
        • T := T - { Alpha * J(T) * Pj - Lambda* Sum (1-m) T }
          • Repeat until convergence 
    • Terminology 
      • Lambda - Regularization Factor 
    8 - Neural Networks: 
    Representation 

    • Non-Linear Classification: 
      • Features Set are of higher numbers - How to calculate when each Pixel is feature set of a type
      • Neuron Model: Processing unit with logic 
      • Using different activation/weight , combination of feature and degree of
    • Mechanics:
      • Each Algorithm is designed for a certain purpose
      • Objective - To get right combination Values and their Influence (Theta) 
      • Mechanism : Define Feature functions
        • Evaluate (Function values)
        • Analyze (Lowest - Cost Function)
        • Decide (Optimize - Gradient Descend)
    • Neural Network for CV :
      • NN Algorithm Objective
        • To identify a certain type of objects in an Image – Coke Bottle based on one feature Color Intensity
      • Approach 
        • Input Images with Outputs of their Class (X,Y) 
        • Select a Feature > Get its Feature Vector for each Pixel (X)
          • Intermediate Levels - Break the function further finer aspects 
        • Develop Function calculate for Feature Vector and Parameter for each (Y)
      • Input Data Set (X)
        • X – Feature Vector – Feature Value in Vector for Each Pixel
          • Image with Coke bottles and Non-Coke bottles
          • Feature Vector : Colors, Dimension, Position, Sizes - (Relative & absolute) ,
          • Here it is done for one Feature Vector; Same to be done for other Features Set
      • Output Data Set (Y)
        • Y – No of Output Classes
          • Binary Class (Coke bottles - Yes / No)
          • Multi Class (Bottles - Coke, Pepsi, ..)
      • Algorithm
        • Functions
          • f(X) = Theta Transpose X
          • Z2 = g (f(x)) for previous level
          • Each level has Activation Factor Theta to control participation
        • At each level, function calculates feature value contribution of previous level in order to reduce error (improve accuracy) of the function
          • Hidden Units are to break the function further to become more finer on Feature
        • Level Propagation
          • L1 – Works with Bigger size of image with identifying lines, Shades, etc.
          • L2 – Breaking the previously fed image, work with smaller part of it to identify finer features
      • Strategy
        • Collapse & Construct:
          • Collapse the whole image at each level to find finer features
          • Reconstruct the image using identified features
        • Deep Learning
          • For deeper features (pixel level feature deduction), deeper algorithm are designed 
    • Terminology
      • Activation Function – Function at Hidden unit with Factor of participate at each node of layer
      • Error Calculation – Output Value of Function – Y
    Learning: 
    • Formula 
      • Hypothesis
        • f(x) = T1 X1 + T2 X2 + T3 X3
      • Cost Function 
        • J(a) = Sum (1-N) Sum (1-K) (y log x - (1-y) (1 - log x)) + alpha * / n
          • Minimize the Cost
      • Gradient Descend: 
        • a := a - Alpha * d/da (f(x)) ; 
          • Calculate all value before assigning value 
          • Increment till this assignment is Zero 
    • Algorithms 
      • Propagation - Cost function and Gradient Function Optimization Evaluation for each path taken
      • Forward Propagation – Calculating Function Value Forward from Left to Right .
      • Backward Propagation – Calculating Error Function backward from Right to Left
    10 - Tuning Algorithm: 
    • Measures 
      • More Training Set 
        • Over-fitting 
      • Decreasing Feature Set 
        • Over-fitting  
      • Increasing Feature Set 
        • Under-fitting 
      • Decreasing Polynomial 
        • Over-fitting 
      • Increasing Polynomial  
        • Under-fitting 
      • Decreasing Lambda (Regularization) 
        • Under-fitting 
      • Increasing Lambda  (Regularization) 
        • Over-fitting 
    • Learning Algorithm Type 
      • Over-fitting (High Variance)
      • Under-fitting (High Bias)
    • Approach for Training Data Set Division
      • Training and Test Data : 70/30
      • Training, Cross Validation and Test Data: 60/20/20  
    • Algorithm
      • Calculate Cost Functions 
      • Drawing Error Function - Cost function (Error) vs 
        • Polynomial (x + x^2 + x^3) - Increasing / Decreasing 
        • Training data Set - Increasing / Decreasing 
        • Lamda - Increasing / Decreasing 
      • Learning Curve 
        • Error vs Training data set 
    • Neural network 
      • Smaller Network - Under-fitting  - Not Compute Expensive
      • Deeper Network - Over-fitting - Compute Expensive 
    11 - System Design: 
    • Approach 
      • Define Feature Set 
      • Create Feature Vector 
      • Train with Data Set  
    • Error Metrics
      • Precision = True Positive / Predicted Positive  
      • Recall = True Positive / Actual Positive  
      • Threshold of Qualifying for Positive Results 
      • To predict with high confidence > Lower the Threshold > High Precision and Low Recall 
      • To predict with high coverage > Higher Threshold > Low Precision and High Recall 
    • Amount Of Data 
      • Use many Parameter - Jtrain small > Low Bias
      • Use very large Data - Jtrain = Jtest > Low Variance
    12 - Support Vector Machine 
    • Find out Hyper plane maximum margin from closest data points

    13 - Unsupervised Learning: 
    • Definition 
      • Ability of a function to find clear pattern based on distance and density   
      • Clustering Algorithm Approach 
    • K-Mean Algorithm 
      • Optimization Objective 
        • Minimize mean distance of assigned data points from Centriod
      • Algorithm
        • To Select K Random Centriod in Space 
        • Assign X points based on closeness to Centriod 
        • Calculate Mean of X for Centriod's points 
        • This becomes new Centriod for Selected
        • Repeat
      • Terms 
        • K - Randomly selected cluster Centriod points in space 
        • Mue - Mean distance of all assigned data points 
        • Ci - Index of Centroid closest to Xi
        • Mue k - Average/mean of point assigned to cluster K 
      • Cost Function - 
        • = Sum (1-m) Mode (X - Mue) ^2  
      • Selecting K Value 
        • Elbow Method - K vs Cost Function
    14 - Dimentionality Reduction:
    • Definition 
      • To reduce Features Set (not vector)
    • Approach  
      • Express the reducing function on to remaining dimension with objective to reduce variance of expressed dimension 
      • Continue till the objective Meets
    • Algorithm - PCA (Principal Component Analysis) 
    • Note 
      • PCA (Perpendicular to dimension function) vs Linear Regression (Perpendicular to Axis)
    15 - Anomaly Detection:  
    • Definition 
      • To find out-liner in data-set
      • Use Case: 
        • Malfunctioning of Critical Systems (Aircraft) 
        • Fraud Detection of Public Systems (Bank, ) 
        • Error Condition Detection (Computer Hardware, Data center) 
    • Algorithm 
      • Choose a Feature Set which could be indicative of Anomaly 
      • Calculate Gaussian Function - Talks about Central Tendency 
        • Mean - 1/m Sum 1-m (X) 
        • Standard  Deviation - 1/m Sum 1-M (Mean - X) ^ 2
        • Variance - SD ^2 
        • Gaussian Function GF (X, Mean, Variance)=
        • = Prod 1-N P (X, Mean, Variance) 
      • GF < e (normal error) - Anomaly Detection 

    Appendix:

    NLP vs NN CV:
    • ML Algorithm Objective
      • Same thing with NLP, but you can define these variations
      • It used Entropy based Decision Tree Algorithm
    • Input
      • Labeled Data
    • Feature Function for NLP
      • Semantic position of different words within a sentences
      • Sentiment algo define negative work position with ref to other construct of sentences 
    Naive Bayes Algorithm:
    • Classifier Algorithm - 
      • Fast Convergence 
      • Independent Feature Simulation
      • Document Classification
    • Based on Posterior Probability Formulation
      • Based on prior probability 
      • Ability to classify a new data being in territory 
    • Process
      • Calculate Frequency of Each Term in Document 
      • Calculate Probability of Each Term of being in a Class 
      • Calculate Naive Bayes Equation for Each Class 
    • NLP Apply
      • P (Document Belong to a Class / Words in Document) = P (Word Belong to Class)  * P (Prior Probability of Class) / P (Prior Probability of Word)
    • Ref 
      • http://www.analyticsvidhya.com/blog/2015/09/naive-bayes-explained/
      • http://stackoverflow.com/questions/10059594/a-simple-explanation-of-naive-bayes-classification
    Decision Tree:
    • Entropy 
      • In Decision tree, to evaluate the Significance of a feature, Probability calculation is being done at that feature node.
      • To build decision logic at each node of a tree 
    • Information Gain : 
      • Decision Tree > Ability to decide the feature to take decision to lead it to class 
    • Random Forest 
      • Randomly divide dataset into multiple trees and then Ensembe them. It uses mode of the classes (classification) or mean prediction (regression) of the individual trees.
      • http://www.slideshare.net/DerekKane/data-science-v-decision-tree-random-forests  
    Ensemble: 
    • Boosting vs Bagging - Ensemble
      • Both are ensemble technique to create weaker Learners in order to create combined Strong Learner. To reduce the variance.
      • Bagging - Using Bootstrapping data set (by different combination with replacement) to train multiple models and Use Voting Method to Select the Output. 
      • Boosting - Using whole/original Dataset for all model learning, Giving weight boosting to weaker models and select average of them. 
      • Stacking - (Like boosting) Applying another model to learn the Weights of each model  
    Terminology: 
    • Terminology 
      • Decision Factor / Attribute 
      • Features Set - Set of all the Features (X Down number)
      • Feature Vector - All value of Feature in Vector (X Up Number)
      • Feature Scaling - Mean Normalization - Normalize Value around Mean of Data Set
    • Gaussian Model 
      • Weight Initialization in Neural Network -

    Monday, November 30, 2015

    Data Pipeline Processing

    Big Data - Tech Stack:
    • Analytical Tool 
      • R Tool 
    • Machine Learning 
      • Apache Mahout 
    • Query System 
      • Hive 
      • Pig
    • Cache System  
      • Memcache 
      • Redis 
    • Data Serialize
      • Apache Avro 
    • Processing System (Batch)
      • Apache Hadoop
        • Map Reduce / HDFS
    • Processing (RT)
      • Apache Spark 
      • Apache Storm
    • Message System 
      • Apache Kafka 
    • NOSQL 
      • Apache Cassandra
      • Apache HBase 
    Storm:
    • Nugget - Distributed Reliable Real Time Data Processing System
    • Meant for Real time Steaming data vs Batch processing of Hadoop
    • Diff with hadoop - Task are continuous vs task with completion
    • Reads the data from Messaging queues
    • Fail over - On failed execution, it restart the task on another node
    • Reliability - Based on Spout - Ability to repeat tuple to bolt 
    • Tuple - Boundless data with Schema
    • Spout - Consumer of data stream from external Source
    • Bolt - Description of topology
    • Spout > Topology - Bolt > Worker > Executor > Task
    • Worker can execute tasks of Bolt and Spout.
    • Parallelism is defined no of executor running for each Bolt and spout. 
    Kafka:
    • Nugget - Distributed Reliable Scalable Messaging System
    • Producer -
    • Consumer -
    • Topics - Topics to which Publisher and Consumer exchange messages in Pub/Sub fashion.
    • Partition - Queue management under a topic. Each Topics will have at least one queue.
    • Broker - Set of Server group service messaging through topics. Has Master and Follow to provide resiliency.
    Rhadoop:
    • R works with all data in RAM. Restricts its scalability.
    • RHadoop 
      • Offering from Revolution analytic to allow scalability to R program processing.
    • Has 3 Components
      • rmr2- Map/Reduce > Streaming > R Functions
      • rHBase - HBase Thrift gateway > HBase
      • rHDFS - HDFS
        •     Allows R to HDFC and R Data Framework to HDFS
    • Hadoop Streaming 
      • Project to facilitate to process MR job of any programing langauges. 

    Solution Architecture - Framework & Structure

    Solution Architecture : Framework 

    Business:
    • Inputs 
      • Requirements  - Problem (VOC) / Future Vision 
    • Structure 
      • Business 
        • Opportunity - Profit, Market share, Position  
        • Pain Point - 
      • KPI 
        • Marketing - Customer 
        • Financial - Bottom line / Top Line 
        • Operations - Engineering / Operations  
    • Approach 
      • Articulation - Current System, Pain Point, Vision, Solution, Benefits (Value Prop ) 
      • Approach - What is; What If, What Wow, What Works (Design thinking)  
    • Output 
      • Business Architecture
    Domain:
    • Input 
      • Requirement - Business 
    • Structure 
      • Processes and Sub Process
      • System and Sub Systems 
      • Business Use Case 
    • Approach 
      • Articulation - Use Case Scenarios / Example 
    • Output 
      • Domain Architecture (Functional)
    Architecture:
    • Inputs 
      • Business and Domain Architecture 
    • Structure 
      • Architecture Diagram - Logical, Application and Data
      • Principals & Consideration - NFR, 
      • Process & System
    • Approach 
      • Articulation - Top Down 
      • Approach - Methodology (Layered; Component) 
    • Output 
      • System Architecture (Logical/Application)  
      • Information Architecture (Data) 
    Technology:
    • Input
      • Requirement - Business and Architecture 
    • Structure
      • Constraint & Criteria  
      • Options Evaluation & Fitment
    • Approach 
      • Articulation - Reason of Being, Evaluation, Benefits
      • Approach - 
    • Output 
      • Technology Architecture  

    Solution Architecture : Structure

    • Process 
      1. Business Strategy 
      2. Business Objectives 
      3. Business Operating Model 
      4. Enterprise Architecture
        1. Business Process 
        2. Business System 
        3. Solution Architecture 
      5. Solution Delivery 
      6. Operation and Management Architecture 
    • Artifacts 
      • Prime Artifacts: 
        • Business Architecture 
        • Information System Architecture 
          • Data Architecture 
          • Solution Architecture 
        • Technology Architecture  
      • Extended Artifacts 
        • Implementation Architecture 
        • Management and Operations Architecture 
    • Solution Architecture - Views
      • Core 
        • Business View - Purpose
        • Functional View - Stakeholders
        • Data View - Entity, Role and interfaces
      • Extended 
        • Technical View - Structure, Operation and Development
        • Implementation View - Artifacts and Execution
        • Operation and Management View - Process, Support and Operation
    • Constrained
      • Parameters for Comparing Options (Spider Chart)
      • Core 
        • Enterprise Architecture 
        • Solution Architecture Views 
        • New / Existing System 
        • Degree of Automation 
      • Extended 
        • Resource 
        • Finance 
        • Timescale 
        • Expected Life 
    • Principals: 
      • Quality Attributes

    Recommendation Systems

    Chap 1: Introduction
    • Algorithmic Personalization:
      • Understanding Customer Preference, Interest and Intend
      • Provide them Relevant Tips to help them
    • Taxonomy: 
      • Data 
        • Limited Data from Experience and Feedback Loop 
      • Compute 
        • Limited Computed for Learn and Adopt simultaneously 
      • Interest 
        • Conflicting interests to understand customer real interests 
      • Action 
        • Incongruity between true desired action and given 
      • Content 
        • Getting relevant contents to serve up intentions of customers   
    • Differences
      • Prediction - Suggestion for actions (Rating) on certain past actions 
        • Help quantify items
      • Recommendation - Suggestion of Items (Top n) of certain Category 
        • Provide good choice to start with
    Non Personalized Recommendation:

    Chap 2: Information Retrieval 
    • Concept 
      • Item to Item Attributes
        • Similarity, Adjacency
      • Customer to Customer Attributes 
        • Similarity, Adjacency
    • Algorithm 
    • Formula 
    • Pros 
    • Cons 
    Chap 3: Content Based Filtering (CBF)
    • Concept 
      • Based building attribute around product and their preferences for Customer 
      • Build using like, purchase, clicks 
    • Algorithm 
      • Calculate Affinity of customer for each traction
      • Assign weight to each attribute (Positive or Negative)  
      • Decay old profile, keep introducing rating new with higher weight
    • Formula 
      • TFIDF (Term Frequency and Inverse Document Frequency)
        • Term Frequency - No of occurrences of a term appeared in document
        • Inverse Document Frequency - How few document contains this term  
          • # of occurrences in doc *  log ( Total # Doc / # Docs with occurrences  )
        • Used to Stop Words and Bringing Core Word Set 
    • Pros 
      • Structured way of building Customer Profile 
      • Based on Content 
      • Simpler Computation  
      • Query Based System ; A Case Based System
    • Cons 
      • Attribute and Weight Factors 
        • Too many attributes could lead to confusion at algorithm 
        • Attribute structure is rigid and require manual filtration
      • Cold Start Problem - Without interaction model, there is nothing to recommend. 
      • Recency Factory -  Does not adjust quickly to changed user behaviour 
      • Computationally Expensive - as it require recalculation at each change in rating or transaction 
      • Cant handle Abstract Concept - It is exact science
    Chap 4: User to User Collaborative Filtering (U-U CF) 
    • Concept
      • Recommend by looking at rating similarity between users for a set of items. 
      • (User * Rating) 
        • To find similar users by looking at their rating patterns around a set of items
        • To predict user rating for a set of items (without rating) 
        • Rows for similarity of rating pattern between users 
      • Based on User to User similarity - Which tends to change frequently  and widely
    • Formula 
      • Sum for Each Customer ((Mean Rating - Rating for Item) * Weighted Factor) / Sum (Rating) 
      • Weighted Factor = 
    • Algorithm 
      • Selecting Customer set using
        • Similarity Factor
        • Neighbor Factor 
    • Pros
      • Suites where there is higher nos of user rating available
    • Cons 
      • Sparsity of Data:
        • Lower nos of users ratings/reviews per product items   
        • Smaller set of user set to predict rating for an item (unless adjustment taken into consideration) 
      • Computationally Expensive:
        • Need to be computed regularly as user behavior changes,
        • higher number of user will  make these
    Chap 5: Evaluation
    • Methods
      • Accuracy Metrics 
        • Mean Absolute Error  = (P - R)/ # Ratings
        • Mean Squared Error = (P - R)^2 / # Ratings
        • Root Mean Squared Error = v/ (P - R)^2/ # Ratings
      • Error Metrics
      • Decision Support 
      • User/Usage Centric Metrics 
    • Prediction vs Top N 
      • Decision Support 
      • Accuracy vs Ranking
      • Focus - Locally vs Comparatively 
    • Accuracy Matrix 
      • Prediction Accuracy - Estimating Preference
      • Decision Support Accuracy  - Finding useful/good things
      • Rank Accuracy  - Estimate Relative Preferences 
    • Testing 
      • Live vs Dead Recs
    Chap 6: Item to Item Collaborative Filtering (I-I CF)   
    • Concept
      • Recommend by looking at rating similarity between users for a set of items. 
      • (User * Rating) 
        • To find similar items by looking at rating patterns around a set of users 
        • To predict Items rating for a set of user
        • Columns for similarity of rating pattern between Items  
      • Based on Item to Item similarity - Which does not change much over time
    • Algorithm 
      •  
    • Formula
      •  
    • Pros
      • Computationally Economical: 
        • Can be Pre-computed and need not to be computed frequency as Item to Item similarity stays
      • Sparsity of Data: 
        • Can work with lesser no of Rating
    • Cons  
      •  
    Chap 7: Dimensional Reduction Recommendation  
    • Concept 
      • Instead of User to Item Rating based Matrix calculation, it works with reduced set of features. 
    • Algorithm 
      • Identity Concept instead Keyword from Information Retrieval
      • Calculate their weight for Equation 
      • Calculate User rating for Concepts (K Feature) 
      • Calculate Item contents for Concepts (K Feature) - Information Filtering
    • Formula 
      • Singular Value Decomposition (SVD): 
        • Breaking the matrix around k feature vectors 
          • User Matrix for K Feature Vector 
          • Item Matrix for K Feature Vector
          • Weight Diagonal Matrix for K Feature
    • Pros 
      • Work on Concept instead of Keyword intensity 
      • Computationally Economical at Run-time
        • Time complexity O(m*n + n^3)
        • Expensive in totality 
    • Cons 
      • Model Refresh Frequency 
        • Dependency on User Rating, will require 
      • Tolerance for Missing Values 
        • Assumes Matrix is full
        • Need to apply Impute - (Mean Value )
    Chap 8: Other Recommended Viewpoint  
    • Context Aware Recommend
      • Types 
        • Personal Context (Mood, preferences, ....)
        • External Context (Weather, office, driving, location)
        • Social Context (People around you)
      • Interface 
        • Live Interaction 
        • Mobile Interfaces 
        • Implicit Behaviour vs Explicit Rating 
      • Technique 
        • User, Item, Context and Rating
        • Pre Context Filtering 
          • Filter on Context 
          • Recommend on U, I, R 
        • Post Context Filtering 
          • Recommend UIR 
          • Filter on Context 
        • Modelled Context Filtering 
          • Considering all four 
          • Building Multi Dimensional Model Processing  
    • Netflix Recommender
      • Learning to Rank 
        • Pairwise and List-wise Approaches
      • Core 
        • Category - Personalized 
        • Rating - Personalized  
      • Function 
        • Popularity 
        • Rating (Implicit and Explicit) 
      • Formula
        • Linear Regression - Determine
          • f(u, v) = w * p(v)
            • w * f ( P * R) 
            • P - Popularity 
            • R - Rating 
          • Weight - Giving preference between axis 
          • Determine Weight - Classification = Logistic Regression
        • Classification -
        • Decision trees -
        • Gradient Dissent - 
      • Notes 
        • Learning from Implicit Action 
        • Explicit Rating r corrupted
    • LinkedIn Recommender
      • Types 
        • Content Filtering 
        • Collaboration Filtering (SVD)
        • Popularity (Trending)
        • Social
      • Approach 
        • Feature Extraction 
        • Entity Resolution 
        • Meta Data Enrichment  
      • Technique 
        • Interaction Splits 
          • Selecting Model based on Features - Using Decision Tree Mechanism 
          • Algorithm Families
            • Decision Tree
            • Simple Tree - Learner Regression
          • Model Coefficient
            • Demographics Models 
      • Evaluation
        • Model Fitting Technique Matrices(Quantitatively) :
          • CV Error (Cross Validation)
          • Precision@ K
          • AUC
          • PR-AUS
          • RMSE
          • Multi Varient Testing
        • A/B Testing
          • Presentation Biased Effect
          • Impression Discounting - Removing not responded to 
          • Effect of other A/B testing
            • Role > Divide A/B Testing
            • Lazy Orthogonal Multi Varient Testing - Quantify the effect of other testing.
          • Novelty Effect:
            • New Algorithm gets spike in interaction - Need to ignore it for Evaluation
            • Burn In Period - Let it get normalized
          • Network Effect:
            • Effect on one cluster of customer getting effected
          • Power Analysis:
            • To determine Duration and Amount of Traffic allocation to run this test.
              • Depending Factors
              • Variance of metrics
              • Sample Size
              • Effect Size
      • Notes 
        • Feature vs Models 
      • Technology
        • Mahout / R Hadoop
    • Dialogue Based Recommended (Critique / Case Based) 
      • Useful 
        • Large Cost Items - Where Purchase Cycle demands Research and longer cycle 
      • Technique 
        • Breaks the Category in Features 
        • Ask user Requirement and Importance for Each 
        • Internally assign weights to interrelations 
        • Based on this, Recommend  
        • Adjusted Recommendation - If not liked by Users 
    Appendix:  
    Terminology: 
    • Unary: 
      • Represented as a series of 1 nos 
      • Analytic Types 
      • Exploratory
      • Descriptive 
    • Vector 
      • Characteristic Variable to show a behaviour - Could be a set of values representing 
    • Model
    Machine Learning
    • Feature - Each Token Occurrence