Ritika Pandey

Doctoral Candidate | Data Scientist

I am a PhD candidate in the Department of Computer Science at Indiana University Purdue University Indianapolis with the research focus in developing data science and machine learning techniques aimed at modeling social harm data.

LinkedIn Github ResearchGate
Resume

Education

PhD Candidate

(Aug 2020 - Aug 2023)

Major: Computer and Information Science
Indiana University Purdue University Indianapolis
Advisor: Dr. George Mohler

MS

(Jan 2018 - Aug 2020)

Major: Computer and Information Science
Indiana University Purdue University Indianapolis
Advisor: Dr. George Mohler

centered image

BTech

(Aug 2013 - May 2017)

Major: Computer Science and Engineering
Bipin Tripathi Kumaon Institute of Technology

Experience

Indiana University Purdue University Indianapolis

Research Assistant

February 2018 - Present

  • Design, develop and improve novel machine learning models aimed at social harm & criminal justice applications.
  • Impact: Investigated role of topic modeling & suggested key metrics (topic coherence, gini coefficient) for detecting crime hotspots allowing for more targeted police intervention.
  • Mentoring: Guided & collaborated with Undergraduate Research Interns (REU) to analyze Reddit data on insights into modern drug culture & provide tools with potential applications in combating opioid crises.
  • Tools/Stacks: Python, Text Mining, Graph Mining, Tableau, statistical analysis, data visualization, LDA, NMF.

Roche

Data Science Intern - Research and Development

Summer 2022, Summer 2021

  • Ideate and apply innovative analytics & machine learning techniques to assess additional component for blood glucose system which can be helpful in therapy management for diabetic patients.
  • Modeling: Build boosted neural network for multi-class classification & perform feature engineering to derive valuable insights for model optimization.
  • Tools/Stacks: JMP, Python, Boosted Neural Networks, Feature Engineering, JSL, hyperparameter tuning, DoE, data visualization.

Navient

IT Intern (Data Analytics)- Infrastructure

Summer 2019

  • Built a server based analytical model facilitating prediction of application & chargeback associated with servers keeping human in the loop.
  • Resolving inconsistencies: Mined and analyzed server information from various data sources & synchronized it across all platforms.
  • Worked closely with application development team & influenced the development trajectory in migrating from spreadsheets to front-end application.
  • Tools/Stacks: Python, Heidi SQL, SCCM, NEAR (Navient Enterprise Application Repository).

Projects

Rewiring police officer training networks to reduce forecasted use of force

January 2021 - Present

Research has shown that police officer involved shootings, misconduct and excessive use of force complaints exhibit network effects, where officers are at greater risk of being involved in these incidents when they socialize with officers who have a history of use of force and misconduct. In this work, we first construct a network survival model for the time-to-event of use of force incidents involving new police trainees. The model includes network effects of the diffusion of risk from field training officer (FTO) to trainee. We then introduce a network rewiring algorithm to maximize the expected time to use of force events upon completion of field training. We study several versions of the algorithm, including constraints that encourage demographic diversity of FTOs. Using data from Indianapolis, we show that rewiring the network can increase the expected time (in days) of a recruit's first use of force incident by 10\%. We then discuss the potential benefits and challenges associated with implementing such an algorithm in practice.

MBTI Personality Prediction

August 2020 - December 2020

Over the past few years, there has been a lot of attention given to social media and online communities by psychologists and psychological studies. This is due to the fact these platforms attract so many users, and there are so many different behaviors that can be expressed through what users post on these platforms. Although there are various methods to predict personality types, Myers-Briggs personality prediction is considered the most reliable and popular method. In this study, we perform personality prediction using machine learning and deep learning techniques that may aid psychologist and the private sector in gaining better insights into different personality types of interest and potential hires to better the organization's culture respectively.The objective of this research was to see if the 16 personality types could be successfully given by machine learning techniques. Using different models and techniques, there were promising results, with some models achieving scores over 75 percent. With future work, there is a potential that these models could achieve higher accuracy and be used in other behavioral studies.

Homicide Investigation Analysis

November 2019 - June 2020

Homicide investigations generate large and diverse data in the form of witness interview transcripts, physical evidence, photographs, DNA, etc. Homicide case chronologies are summaries of these data created by investigators that consist of short text-based entries documenting specific steps taken in the investigation. A chronology tracks the evolution of an investigation, including when and how persons involved and items of evidence became part of a case. In this article we discuss a framework for creating knowledge graphs of case chronologies that may aid investigators in analyzing homicide case data and also allow for \textit{post hoc} analysis of the key features that determine whether a homicide is ultimately solved. Our method consists of 1) performing named entity recognition to determine witnesses, suspects, and detectives from chronology entries 2) using keyword expansion to identify documentary, physical, and forensic evidence in each entry and 3) linking entities and evidence to construct a homicide investigation knowledge graph. We compare the performance of several choices of methodologies for these sub-tasks using homicide investigation chronologies from Los Angeles, California. We then analyze the association between network statistics of the knowledge graphs and homicide solvability.

Addiction Analysis

Summer 2018

Increasing rates of opioid drug abuse and heightened prevalence of online support communities underscore the necessity of employing data mining techniques to better understand drug addiction using these rapidly developing online resources. In this work, we obtained data from Reddit, an online collection of forums, to gather insight into drug use/misuse using text snippets from users narratives. Specifically, using users’ posts, we trained a binary classifier which predicts a user’s transitions from casual drug discussion forums to drug recovery forums. We also proposed a Cox regression model that outputs likelihoods of such transitions. In doing so, we found that utterances of select drugs and certain linguistic features contained in one’s posts can help predict these transitions. Using unfiltered drug-related posts, our research delineates drugs that are associated with higher rates of transitions from recreational drug discussion to support/recovery discussion, offers insight into modern drug culture, and provides tools with potential applications in combating the opioid crisis.

Crime Topic Modeling

January 2018 - April 2018

Non-negative matrix factorization (NMF) topic modeling has recently been introduced for the categorization and analysis of crime report text. Topic modeling in this context allows for more nuanced categories of crime compared to official UCR categorizations. In this project, we suggest two metrics for the evaluation of crime topic models: coherence and spatial concentration. The importance of space comes into play through Weisburd’s law of crime concentration, that states a large percentage of crime occurs in a small area of a city. We investigate the extent to which topic models that improve coherence lead to higher levels of crime concentration. Through analyzing a dataset of crime reports from Los Angeles, CA, we find that Latent Dirichlet Allocation (LDA) generates crime topics with both higher coherence and crime concentration. While NMF improves the coherence compared to UCR categorization, the spatial concentration is not as high. These findings have important implications for hotspot policing.

Publications

  • Ritika Pandey, Jeremy Carter, James Hill, George Mohler, "Rewiring police officer training networks to reduce forecasted use of force", 2023. Under Review. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '23). Association for Computing Machinery, New York, NY, USA.
  • Ritika Pandey, P. Jeffrey Brantingham, Craig D. Uchida and George Mohler, "Building knowledge graphs of homicide investigation chronologies", 2020. International Conference on Data Mining Workshops (ICDMW), Sorrento, Italy, 2020, pp. 790-798, doi: 10.1109/ICDMW51313.2020.00115.
  • John Lu, Sumati Sridhar, Ritika Pandey, Mohammad Al Hasan, and George Mohler, "Investigate Transitions into Drug Addiction through Text Mining of Reddit Data", 2019. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19). Association for Computing Machinery, New York, NY, USA, 2367–2375. https://doi.org/10.1145/3292500.3330737
  • Ritika Pandey, George Mohler, "Evaluation of crime topic models: topic coherence vs spatial crime concentration", 2018. IEEE International Conference on Intelligence and Security Informatics (ISI), Miami, FL, USA, 2018, pp. 76-78, doi: 10.1109/ISI.2018.8587384