Identifying Anonymous Online Message Senders: A Proposal Toward a Linguistic Fingerprint Biometric Database (LFBD)

55 Pages Posted: 27 Dec 2017

See all articles by Refat Aljumily

Refat Aljumily

Freelance Researcher and Lecturer

Date Written: December 26, 2017

Abstract

The field of cyber forensics is an emerging one, and it looks to me that its scope can profit from some contribution. Criminals and terrorists tend to use online messaging services through the internet or various websites to commit illegal activities or actions that would put people life at risk. Online anonymity allows any internet user to message someone without any track back to him/her, especially if registered with false or fake names and other information. This paper proposes to establish a database for the identification of anonymous online message senders based on a representation of the aspects of the writing. The proposed database can be used in conjunction with other forensic tools to support the activity of a digital forensic investigator by generating ideas and hypotheses about online anonymous message senders. It hopes that developing such a database will not only help identify anonymous online message senders so that they can be traced or prosecuted but will also result in a database containing information which may be fairly useful in solving different online criminal activities. To test the applicability of the proposed database, the author designed a simple database of 221 participants under extreme conditions, such as short sample sizes about 2 to 3 lines/35-40 words long, single topic and genre data sets, and large number of participants. The author used syntactic, word-based, and character based identifiers to represent and define the linguistic profile of each participant. The author also experimented with various data analytical and adjustment methods, length adjustment, standardization, dimensionality reduction, clustering models. For style identifiers selection, the author applied term-frequency IDF, or TF.IDF technique except when centroid analysis was used. Further, the author validated the test results at each stage of the analysis, and found that the stylometric test was able to identify authorship of anonymous online messages with an accuracy of 60% for function word usages and 50% for parts of speech frequencies. Although the result doesn’t enable a persuasive support for the proposed database, there still is a need for more thorough testing with an expanded profile size that contains more than 100 words long for each participant.

Keywords: Biometrics, Forensic Stylometry, Authorship Identification, Style Identifiers, Hierarchical Modeling, SOM U Matrix

Suggested Citation

Aljumily, Refat, Identifying Anonymous Online Message Senders: A Proposal Toward a Linguistic Fingerprint Biometric Database (LFBD) (December 26, 2017). Available at SSRN: https://ssrn.com/abstract=3093279 or http://dx.doi.org/10.2139/ssrn.3093279

Refat Aljumily (Contact Author)

Freelance Researcher and Lecturer ( email )

131 Ashfield close
Newcastle upon Tyne
NEWCASTLE UPON TYNE, Tyne and Wea NE4 6RL
United Kingdom

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
151
Abstract Views
1,179
Rank
350,945
PlumX Metrics