PI: Christos Faloutsos
Co-PI(s): N/A
University: Carnegie Mellon University
Industry partner: Marinus Analytics
Given millions of escort advertisements, how can we spot human trafficking (HT) and other organized behavior? The key insight is that ads for multiple trafficking victims are often similar, since the average controller posts ads for 4-6 victims at once. However, text similarity also picks up some other Modus Operandi (M.O.s) in escort ads, such as spam, scammers, and massage parlors. How can we separate these behaviors, and are there any other insights we can glean about traffickers by inspecting escort ad data? We are continuing our collaboration with Marinus Analytics, a Pittsburgh-based, woman-owned company that creates tools to enable HT detection. In 2020, thanks to PITA funding, we jointly developed InfoShield: an algorithm that finds groups of similar advertisements based on text, which was showcased in the news through CMU and WXPI. Marinus has hired an intern to incorporate InfoShield in their pipeline. In 2021, we developed TrafficVis: an interactive tool that enables domain experts to quickly label clusters of ads as spam, scam, trafficking, and more, providing an estimated 86x speedup over manual labeling. TrafficVis won the Best Poster Honorable Mention at IEEE VIS 2021. Our work on both projects has been showcased at many talks Marinus has given to crime analysts, including for the International Association of Crime Analysts, the National Cyber-Forensics and Training Alliance, and the Association of Crime and Intelligence Analysts. Through our long-standing collaboration with Marinus Analytics, we've worked on various datasets and subproblems towards HT detection. To best serve Marinus' needs, we are currently focusing on the intersection between data mining and visualization since the results of our algorithms are only actionable if they can be understood by investigators. Through our weekly meetings, we have jointly distilled the following future research tasks: (a) M.O. detection, i.e. separating spam, scammers, and HT clusters; (b) Spatio-temporal analysis, i.e. finding multiple traffickers moving in tandem; (c) Visualization of time-evolving evidence graphs, enabling law enforcement to more quickly see patterns; and (d) Synthetic data generation, so that other researchers can collaborate on HT without disclosing PII of possible victims.