by Sachin Mehta, Wesley Lee, and Sam Crow
View the Project on GitHub CSE512-16S/fp-sacmehta-samcrow-wesleytlee
This project visualizes admission data from the department of Computer Science and Engineering.
Team members: Sachin Mehta, Wesley Lee, and Sam Crow
Of the approximately 3000 students which take CSE 142 each year, 2000 (67%) continue to CSE 143 and only 750 (25%) end up applying to the CSE major. It is of interest to gain a better understanding of these dynamics: which students continue in this pipeline and how do they differ from students who do not? Our visualization serves as a framework to explore these questions and provide preliminary answers by combining flow charts of students through the CSE pipeline with statistical tests for mean differences in covariates between groups.
A standard HTTP server can provide the files that the visualization uses.
One option is to execute python -m SimpleHTTPServer 9000
and open a web browser to http://localhost:9000/
.
The left half of the page shows the filters, Sankey diagram, and summary table for group 1. The right half of the page shows group 2.
The proposed visualization contains two types of filters: (i) Global filters and (ii) Local filters. Global filters are applied on overall data. For example, if you want to compare the admission statistics for Male and Female students, then you would select Male in Group 1 filters and Female in Group 2 filters. Global filters would return only the relevant records for each group, male records for Group 1 and female records for Group 2. Once you apply global filters, you might be interested in comparing one particular stage. For example, you want to compare the different variables for both Male and Female between CSE 142 and CSE 143. In that case, you would select the particular stage in both groups using local filters.
With filters, users can select subsets of the data to show in groups 1 and 2.
Global filters are adaptive. Users can select the filter as per their requirement and then add it to the Visualization. To add a filter:
Local filters are listed below Sankey Diagram. Local filters are encoded in a form a drop-down list. Users can select the stage in both the groups and then compare the data. Data in summary tables and difference statistics will update dynamically.
To show the difference between group 1 and group 2, we compute t-test statistics for each variable. Since there are multiple variables, we use Bonferroni correction to account for the impact of each variable. With t-test score, user can easily identify which variables are significant and which are less significant between two groups.
t-test scores are represented by bar chart. We encode the t-test scores to help user identifying the key variables. Color encoding for t-test was suggested during poster demonstration. High values (a value above a particular threshold) are encoded with dark color and low values are encoded with lighter shades. Each group is represented with a different color.
Further, we provide summary statistcs for each group. Summary statistics contain mean and standard deviation for each variable in a group. This might be useful when we are trying to identify why people dropped at a particular stage. For example, looking at the mean of the survey questions at a given stage would help in understanding the reasons why people dropout.
Please note that bar chart and summary tables are updated dynamically.
Click here for poster.
Click here for paper.