My project is about the Dream of the Red Chamber which is a great novel, maybe the greatest one in the long history of China for thousands of years. The book is also shrouded in mystery. No one knows who wrote the last 40 chapters. Some believe that it was written by Cao Xueqin, who is also the author of the previous chapters, while others believe that the last 40 chapters was written by another author, whose name is Gaoe. Not only do I want to find out the true author of the last 40 chapters, but I also want to analyze the characters’ relationships. There are hundreds of people in the great novel, so I think it is worth embarking upon a quantitative analysis. To visualize the frequency of words in the novel, I ran the LDA Topic Models from my course, Data 88: Language Modeling and Text Analysis. From the results we can see the significant differences which support the theory that they are not written by the same author. I also use Principal Component Analysis (PCA) to build function word vectors for the corpus. These tools and methods may help indicate the different writing styles. Lastly, I use Gephi to build the relational network by uploading the resulting edge list from the course's Jupyter Notebooks into the software.

Screenshots