Using for Business Analytics Matthew A. Lanham ([email protected]) Doctoral Candidate/Merchandise Data Scientist* MatthewALanham.com Department of Business Information Technology *Advance Auto Parts, Inc. Outline State of Business Analytics • What is Business Analytics? • Putting the BA domains together • INFORMS CAP framework What is R and RStudio • Brief history of R • Installing R & RStudio I. Business Problem (Question) Framing II. Analytical Platform Framing III. Data IV. Methodology (Approach) Selection V. Model Building VI. Deployment VII. Model Life Cycle Management Conclusions Resources & References Appendix A: Free R Programming course SEDSI 2015 Copyright © 2015, MatthewALanham.com 15 mins 45 mins Fun on your own Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Business Analytics What is Analytics? • “…the extensive use of data, statistical and quantitative analysis, exploratory and predictive models, and fact-based management to drive decisions and actions (Davenport & Harris, 2007).” • Refers to the skills, technologies, applications, and practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning (Wikipedia, 2014).” Descriptive Analytics Predictive Analytics Prescriptive Analytics What has happened? What will happen? What is the best course of action? Exploratory Data Analysis (EDA) Forecasting/Pattern recognition Optimization/Heuristics Uni- and Multivariate Summaries Classification Simulation Clustering - Segmentation - Profiling Ensemble Modeling Computational Stochastic Optimization SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Integrating the BA Domains Together Big Picture Idea • Example based on my dissertation research and industry work for a Fortune 500 retailer SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Growth in BA Programs Growth in Business Analytics Programs • Since 2010, there have been 35 analytics/business analytics programs established in schools of business, and another 29 analytics/data science programs established in other colleges within universities (Rappa, 2014). Num of New Analytics/Data Science Programs per Year Number of Academic Programs 35 30 25 20 15 10 5 0 2002 2007 2010 2011 2012 2013 2014 2015 Year • Note: These numbers do not include the analytics/data science concentrations or tracks added to legacy programs such as engineering, information systems, and statistics. SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment INFORMS Certified Analytics Professional (CAP) Analytics is a Process • “Analytics doesn’t take anything away from Operations Research (OR). Outside our community, OR seen as a toolkit but Analytics seen as a process.” – Anne Robinson, Verizon (President of INFORMS) • Similar to CRISP-DM INFORMS 7 CAP Domain Areas I. Business Problem (Question) Framing CRISP-DM II. Analytics Platform Framing Business Understanding III. Data Data Understanding Data Preparation IV. Methodology (Approach) Selection Modeling V. Model Building VI. Deployment VII. Model Life Cycle Management Evaluation/ Deployment https://www.informs.org/Certification-Continuing-Ed/Analytics-Certification/Candidate-Handbook SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment INFORMS Certified Analytics Professional (CAP) INFORMS 7 Domain Area Framework Created via Job Task Analysis 1. Domain = 7 different domains (Industry independent) 2. Tasks = things that analytics professionals need to do 3. Knowledge = things that professionals need to know to do those tasks Idea • Create T-shaped professionals (1990s) rather than I-shaped professionals • The “T” represents 1. Breadth of skills across the top 2. Depth in one area represented by the vertical bar Analytics & Technologies Assortment Planning (Dissertation Topic) SEDSI 2015 T-shaped professionals can 1. more easily work in interdisciplinary teams than those with less breadth 2. can be more effective than those without depth Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment About the R Language What is R? • R was developed by New Zealand Professors Robert Gentleman and Ross Ihaka who wanted a better statistical software platform for their students (Ihaka & Gentleman, 1996). • It’s more than just statistics today!! Can my students use it? • R is an open-source (i.e. FREE!!) and freely accessible software language under the GNU General Public License, version 2 (Free Software Foundation, 1991) • R works with Windows, Macintosh, Unix, and Linux operating systems. • It has a nice balance of object-oriented and functional programming constructs, and unlike most commercial software, the majority of packages contain many knobs to allow for tuning and customization of a procedure (Hornick & Plunkett, 2013). What makes it better than the others? • As of 2014 there were 5800 available user-developed packages (also referred to as libraries) (Cortez). • There are 72 different packages offering libraries that have functions to do nearly any machine learning methodology. You will not find many of these techniques in the commercial packages. • There is also a growing community of research on prescriptive (optimization) analytics (Cortez). As of 2014, there are more than 60 available packages for optimization and mathematical programming (Theussl, 2014) SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment R Popularity TIOBE Programming Community index • an indicator of the popularity of programming languages. • “The TIOBE index lists various of these statistical programming languages available, e.g. Julia (position #126), LabView (#63), Mathematica (#80), MATLAB (#24), S (#84), SAS (#21), SPSS (#104) and Stata (#110). Most of these languages are getting more popular every month. The clear winner of the pack is the open source programming language R. This month it jumped to position 12, while being at position 15 last month.” Source: TIOBE Index for November 2014 http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment R Popularity 2013 KDNuggets.com Survey • “What programming/statistics languages you used for an analytics / data mining / data science work during the past year?” • R (60.9%), Python (38.8%), and SQL (36.6%) were the top three languages used by practitioners. Percentage of Respondents 0.70 Languages used Over Past Year by Analytics Professionals “hybrids” 0.60 0.50 0.40 0.30 “BDA” 0.20 0.10 0.00 SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Technologies in Demand O’Reilly’s 2013 Data Science Salary Survey • “What commercial languages and software have you used for an analytics, big data, data mining, or data science during the past 12 months for a real project” Language/Software Used for Real Projects Over Past Year SAS/SPSS Language/Softwaare Used Ruby Mahout D3 Tableau Data role JavaScript Non-data role Network/Graph Java Hadoop Excel Python R SQL 0% 20% 40% 60% 80% Respondent Percentage • Similar to KDnuggets annual survey on analytics software, SQL, R, and Python were the top three most popular[source: O’Reilly] SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Download and Install R and RStudio 1. Download and install the latest version of R from the CRAN-R website (http://cran.rproject.org/). • • Video for Windows Users (https://www.youtube.com/watch?v=LII6of-5Odw) Video for Mac Users (https://www.youtube.com/watch?v=xokJUwn0mis) 2. Next, download the popular R IDE, RStudio (http://www.rstudio.com/products/rstudio/download/). • Video for Windows/Mac (https://www.youtube.com/watch?v=7rFMLnm3sAE) We will be using the Windows version but the functionality in Mac is expected to be the same. This may not be the case for Linux users. SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Brief Intro to R Programming • Show the basics of R in RStudio SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Understanding and Discussing the Problem Business Problem (Question) Framing • Try to make the assignments or projects big picture focused • Taught well in TQM, Software Engineering, and Project Mgmt. courses • I spend a lot of time here and revisit this domain area regularly to make sure we are creating a solution for the real problem(s) • Recently interviewed several graduates from statistics and computer science to work with me and most had a difficult time with this SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Understanding and Discussing the Problem Retail Example Every period a category manager (a.k.a. Merchant) for an auto parts retailer will review their product category by removing SKUs that are not selling as expected and add SKUs that have higher potential to sell. Historically the CM has used a forecast generated from the company’s commercial inventory and planning system that was designed for grocery and clothing retailers to gauge selling potential for all products. New Inventory Added Unproductive Inventory Removed The CM believes the forecast is too basic, does not account for unique aspects of the auto parts business, and thus is not providing her adequate decision support. She believes she could improve her category performance if she could analyze SKU demand in other ways, but is at a loss at where to begin. SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Understanding and Discussing the Problem Retail Example Some unique aspects about auto parts retailers as compared to other retail segments such as grocery stores and clothing stores • the products do not expire, • there are few seasonal trends among products, • product lines do not have major adjustments over seasons, and most interestingly • it is common for many products stocked within a store to sell only one or two units per year, especially for products stocked in the backroom of the store. Tasks 1/2 – Requirements elicitation/Stakeholders • She would like have another measure that will gauge the selling potential of a SKU within a particular store that more easily differentiates one SKU from another. • She is the project owner as this is her category and we are only focusing on her “Wigit” category. However, stores and commercial customers are also impacted by this project. Tasks 3 – Decisions?/ Data? • She has control over all SKUs in the Wigit category and which stores each Wigit will be stocked in. • She does not control how many SKUs are stocked in a store. That is a firm decision supported by the firm’s commercial inventory and replinishment system. • Measures about a store’s demographics, SKU’s characteristics, previous POS behavior, etc. is available and is analyzed by her (the domain expert) in helping her make stocking decisions SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Understanding and Discussing the Problem Tasks 4/5 – General problem specification • Decisions – Which SKU (yes/no) should be added to a particular store? • Objectives – She wants to increase the Wigit category’s sales • Constraints/Parameters • She does not want to add SKUs to a store that have “low” potential to sell • She as fixed initial budget ($) for wigits • Stores have a finite amount of shelf space dedicated for wigits • Business KPIs/Benefits – Improve the wigit assortment based by reducing non-working inventory (NWI) and increasing category store sales Tasks 6 – Stakeholder agreement • We review the stated tasks as we understand them as she agrees we should proceed to developing an analytical solution SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Analytical Platform Framing Tasks 1 – Analytics problem formulation • Generate probability estimates for store-SKU combinations Task 2 – Drivers? • Higher probability predictions will lead to a greater chance of being incorporated in the wigit assortment for a store Task 3 – Key assumptions? • Store-SKU probability estimates generated will be similar to historical proportions that actually sold • Not all store-SKU combinations exist historically, but we can estimate them based on a validated model of actual historical store-SKU combinations that were stocked for at least one year Task 4 – Success metrics? • Higher probability predictions will lead to a greater chance of being incorporated in the wigit assortment for a store Task 5 – Stakeholder agreement • She believes having a store-SKU probability is exactly what would help her with her assortment planning decision and wants us to proceed SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data Tasks 1 – Data requirements • Obtain variables believed to be important to probability to sell based on her domain knowledge • Examples: Application counts, demographics Task 2 – Acquire data Task 3 – Get data ready for analysis • Store-SKU probability estimates generated will be similar to historical Task 4 – Document & Report findings • Higher probability predictions will lead to a greater chance of being incorporated in the wigit assortment for a store Task 5 – Refine business problem • She believes having a store-SKU probability is exactly what would help her with her assortment planning decision and wants us to proceed SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data Task 3 – Get data ready for analysis SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data In Rstudio • Double-click data and it will show you the first 1000 records in a table SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data In Rstudio • This is nice if you just want to visually initially inspect the data without using the head() function. SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data In Rstudio • Generic summary() function • Five number summary() for a specific variable • Generating a quick basic plot SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data Reproducible Research in R Idea - make analytic data and code available so that others may reproduce findings 1. Make data available 2. Make computations/code available These allow others to validate and come to the same conclusions you reached • This can be difficult in Excel unless you make sure to script everything using VBA and do not point-andclick on anything • Golden rule of reproducible research: Script everything!! • Think about your own research process… or working with multiple people (Excel is a nightmare to me in these situations) knitR • Developed by http://yihui.name/knitr/ Pros • A variety of documentation languages can be used (RMarkdown, LaTex, HTML) • Uses the R programming language, but others are allowed • Can export to PDF, HTML, Word • Can be used to create manuals, short/medium-length technical documents, tutorials, reports (especially if generated periodically), and data preprocessing documents/summaries Cons • Not good for long research articles • Complex time-consuming computations • Documents that require precise formatting SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data knitR • File -> New -> R Markdown What’s going to happen here 1. You write the RMarkdown document (.Rmd) 2. knitr produces a Markdown document (.md) 3. knitr converts the Markdown document into HTML (by default) 4. .Rmd .md .html * Note: You only change the .Rmd file • All we are going to do is paste our code chunks within these things ```{r} #our code here ``` SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data knitR • Here we can .html document of our analysis so far. • Setting echo=FALSE will hide the R code from being displayed in the file but still run the code and return the results. echo=TRUE is default • • Your code will not be “knitted” if you have an error, thus ensuring reproduciblity Also, you can customize your knitR file and make it look really nice for documentation and presentation purposes SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data RPubs • a FREE place to publish your analysis (https://rpubs.com/) • • This might be a great place for you to check your student’s work or project and they can just send you a URL Also, this is a good venue for students to begin a portfolio of analytics projects to show to employers SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data Back to our dataset • You can create your own functions in R. I created one called DataQualityReport.R and will use that to check our dataset quality • Once you load your function, it is available to you during your current R session. It will show up in RStudio in your global environment • So we can see there are some issues so far in our dataset. We need to make sure R knows what the types of our variables are. DC is really a factor/categorical variable. It’s numeric value means nothing. Same for STORE_NUMBER SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data Defining the correct data type • We only had to change a handful of these. Other options you might use as.integer(), as.ordered(), as.character(), as.numeric() SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data Creating a couple custom functions that use R packages • Some variables have missing data so lets impute these values • Quickly walk them through the basics of the Impute.R code SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Data Task 4 – Document & Report findings • Higher probability predictions will lead to a greater chance of being incorporated in the wigit assortment for a store • Using knitR 1. Create a data definitions table and review this with the domain expert or other stakeholders to make sure the data is measuring what you expect 2. Impute values, remove oddities, delete or correct observations as needed 3. Summarize the dataset with tables, statistics, and graphical summaries Task 5 – Refine business problem • She believes having a store-SKU probability is exactly what would help her with her assortment planning decision and wants us to proceed SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Methodology Selection Task 1 – Identify possible methods • Here we are doing predictive modeling so something under this domain • Several possible classification algorithms are available Task 2 – Select software tools • Here we chose R because it provides all the functionality we need to provide the CM the solution she needs • We will take advantage of the caret package in R Task 3 – Test approaches • We will perform cross-validation using a verification & validation using an 80/20 rule (80% to train, 20% to test) Task 4 – Select approaches • Lets pick a couple basic classification methods such as Logistic regression and a classification tree • We will choose the model that has the been test accuracy and use the probabilities from that model SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Model Building Task 1 – Identify model structures • Here we are doing predictive modeling so something under this domain • Several possible classification algorithms are available Task 2 – Run and evaluate models • Here we chose R because it provides all the functionality we need to provide the CM the solution she needs Task 3 – Integrate the models • We will perform cross-validation using a verification & validation using a 60/40 rule (60% to train, 40% to test) Task 4 – Document and communicate findings with stakeholders • Lets pick a couple basic classification methods such as Logistic regression and a classification tree • We will choose the model that has the been test accuracy and use the probabilities from that model SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Model Building Caret package • The caret package provides users one unifying framework using just one function to predict without have to specify all the possible options. • Currently 180 different R modeling packages have been integrated with caret • Check out http://topepo.github.io/caret/index.html for more information SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Model Building Caret package • Partition data into training and testing sets • For an example, I next rebalance the training data as follows. There are various ways one might want to rebalance. SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Model Building Caret package • Now lets fit a logit model using the training data • Fit a classification tree SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Model Building Caret package Pretty horrible results – but just for demonstration purposes. The data isn’t real Logit results SEDSI 2015 CART results Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Deployment Task 1 – Perform business validation of model • Does not mean that we provide false answers or provide a simpler model, but we might need charts, tables, words, etc. so that you allow them to process the information you are trying to convey • May require revisiting the business problem framing and analytics problem framing • Ensure answer is tied to the question • Requires knowledge of client & their business • Provide explanations in their language/context Task 2 – Deliver report with findings • Ensure the findings are reference to the problem • State/display findings so as to be understood – not our language but the stakeholder’s language • Use appropriate visual techniques to reinforce Task 3 – Create production requirements Task 4 – Create production model/system • We will use the Shiny package here SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Deployment Task 4 – Support deployment tasks • The job isn’t done here - How has the model helped them or hindered them? • Stakeholders may request a laundry list of things they want here and negotiate what is necessary • Provide them a roadmap showing them that in month 1 you’ll get A, month 2 you’ll get B and C, and so on • Make allies, not adversaries • Present strategy • Ensure understanding of how this relates to original problem statement • Make sure the answers you continue to provide them are useful to them • Identify those affected by deployment • Periodically review with client SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Deployment Shiny package in R • Shiny allows one to create web applications without having to know all the typical web development stuff such as javascript, CSS, etc. • Shiny uses a reactive programming paradigm, meaning reactive expressions are used that update values whenever their reactive values are changed • Default style is based on Twitter’s bootstrap theme, but can be customized • Check out the examples to get an idea of what your students could be able to do SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Deployment Walk through one basic example Shiny applications need only 2 components: 1) A user-interface script (ui.R) controls the layout and appearance of your app Specify the type of inputs and outputs as well as positions 2) A server script (server.R) contains the instructions that your computer needs to build your app Data processing functions, the outputs, and any reactive objects ui.R server.R SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Deployment Basic Shiny Example Moving the value will dynamically change the histogram SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Deployment Basic Shiny Example SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Deployment Lets create our platform Steps 1. Create a folder for your web app called “webapp” 2. Put server.R and ui.R files in this folder 3. In R studio set R’s working directory to where this folder is located via setwd("C:/URBA/webapp/") 4. You can load the Shiny package in R via library(shiny) 5. Run the web app using the runApp() function runApp(webapp) SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Business Analytics R & RStudio Business Framing Analytical Framing Data Methodology Model Building Deployment Deployment Check these out!! • http://shiny.rstudio.com/gallery/ SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Life Cycle Mgmt. Conclusions Resources A: R Programming B: Find packages Life Cycle Management Task 1 – Document initial structure • Very important to client & to future business to document initial structure • Prepare client instructions • Ensure client understanding Task 2 – Track model quality • Follow-up, review to ensure model is returning same answer • Review the question as well • Investigate and make chances if answer is Task 3 – Recalibrate and maintain the model • Set a schedule to review and refit model(s) as new data become available, new attributes are found that have predictive power, etc. Task 4 – Support training activities • Deliver documented model • Ensure understanding of all affected • Don’t just ask people if they understand – most people will say yes even if they don’t really • Ask open ended questions and get them to say what their understanding is • Develop training/informational presentations (something more detailed than this presentation) SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Life Cycle Mgmt. Conclusions Resources A: R Programming B: Find packages Life Cycle Management Task 5 – Evaluate the business benefits • Follow-up to ensure that model is consistent • Investigate any changes in model response • Ensure that changes are include d in documentation & training • Say specifically what has changed and what they can expect to see • Over-communicate and expect to repeat yourself (just like talking to students) SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Life Cycle Mgmt. Conclusions Resources A: R Programming B: Find packages Life Cycle Management Task 1 – Document initial structure Here we are doing predictive modeling so something under this domain • Several possible classification algorithms are available Task 2 – Track model quality • Here we chose R because it provides all the functionality we need to provide the CM the solution she needs Task 3 – Recalibrate and maintain the model • We will perform cross-validation using a verification & validation using an 80/20 rule (80% to train, 20% to test) Task 4 – Support training activities • Lets pick a couple basic classification methods such as Logistic regression and a classification tree Task 5 – Evaluate the business benefits • Lets pick a couple basic classification methods such as Logistic regression and a classification tree SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Life Cycle Mgmt. Conclusions Resources A: R Programming B: Find packages Conclusions Pros 1. Many resources • http://www.r-bloggers.com/ • Springer’s USE R! series 2. Great graphics capabilities • ggplot2 (not Tableau or SAS Visual Analytics, but pretty nice) 3. Descriptive-Predictive-Prescriptive-BDA Cons 1. Memory • R works in memory so if you’re working with large data sets (>5 gb) you’ll probably need more than 8 GB of RAM • I have 24 GB of RAM and rarely have issues 2. Package Quality • ~85 percent of the packages I use are high quality • Some authors code inefficiently which may cause the user to run out of memory or increase run time even on small data sets SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Life Cycle Mgmt. Conclusions Resources A: R Programming B: Find packages References & Resources The books outlined in red are free online (click the book to go to the url). Descriptive & Predictive Analytics Prescriptive Analytics R books on anything from Springerlink’s UseR! Series R Programming (free .pdf downloads through university) SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Life Cycle Mgmt. Conclusions Resources A: R Programming B: Find packages References & Resources R blogs • http://www.r-bloggers.com/ R books • So many, Google “Springer UseR!” SEDSI 2015 Copyright © 2015, MatthewALanham.com Using R for Business Analytics Life Cycle Mgmt. Conclusions Resources A: Some packages B: R Programming C: Project Ideas R Programming • If you like video instruction – check out Coursera’s (free) P programming course https://www.coursera.org/course/rprog Data Structures Subset Data • Partial matching • Remove missing values Vector Operations • Partial matching • Remove missing values Control Structures • If • For • While • Repeat • Next • Return Functions • If • For Scoping Rules • If • For SEDSI 2015 Dates and Times Loop function • Partial matching • Remove missing values Debugging • Partial matching • Remove missing values Profiling • Partial matching • Remove missing values Simulation • Partial matching • Remove missing values Copyright © 2015, MatthewALanham.com Using R for Business Analytics
© Copyright 2025