Before a dataset can be analysed in R, its often manipulated or transformed in various ways. For years manipulating data in R required more programming than actually analyzing data. That has improved dramatically with the dplyr package. It provides programmers with an intuitive vocabulary for executing data management and analysis tasks. Hadley Wickham [-@dplyr], the original creator of the dplyr package, refers to it as a Grammar of Data Manipulation.
You can lean R with the dataset it comes with when you install it in your machine. But sometimes you want to use the real data you or someone gathered already. One of critical steps for data processing is to import data with special format into R workspace.Data import refers to read data from the working directory into the workspace. In this chapter you will learn how to import common files into R.
The the National Examinations Council of Tanzania publishes Primary and Secondary Education Examination Results. But the National Library Services archieve this results. While a fantastic resource for history primary and secondary school results, these records are painful to analyze using software because of the grades results is organized is untidy and in messy.
You need to work on this column of the result to have a clean and right format dataset for exploration and modelling.
tidyverse While the base R packages includes many useful functions and data structures that you can use to accomplish a wide variety of data science task, the add–on tidyverse package supports a comprehensive data science workflow as illustrated in figure 1.
Figure 1: Schematic drawing of the data science workflow Tidyverse is a coherent system of packages designed to address specific component of the workflow. Most of the package in the tidyverse were developed by Hadley Wickham [-@tidyverse], and many other contributors.
Introduction This chapter provides brief explanations of the fundamental vector model. You will get familiar with the theory behind vector model and the disciplines in which they predominate, before demonstrating its implementation in R.
Vector is the most basic data structure in R. It is a sequence of elements of the same data type. if the elemenets are of different data types, they be coerced to a commontype that can accomodate all the elelements.