Tool Icons
Getting Started with MONK

If this is your first time using MONK, please have a look at the tutorials. If you just want a quick look around, you can log in as "guest" with the password "guest" (but your work can be changed by the next person to log in). For a private workspace, sign up to create a login account, and log in to create, compare, and analyze worksets. Look for context-sensitive help to guide you during your exploration.

An Introduction to MONK

This instance of the MONK Project includes approximately 525 works of American literature from the 18th and 19th centuries, and 37 plays and 5 works of poetry by William Shakespeare. The American literary texts have been generously provided by libraries at Indiana University, the University of North Carolina at Chapel Hill, and the University of Virginia; the Shakespeare texts were provided by Martin Mueller. A larger collection, including about a thousand works of British literature from the 16th through the 19th century, provided by The Text Creation Partnership (EEBO and ECCO) and ProQuest (Chadwyck-Healey Nineteenth-Century Fiction), will be available in July of 2009 to users at CIC (Big Ten) institutions.

MONK provides these texts along with tools to enable literary research through the discovery, exploration, and visualization of patterns. Users typically start a project with one of the toolsets that has been predefined by the MONK team. Each toolset is made up of individual tools (e.g. a search tool, a browsing tool, a rating tool, and a visualization), and these tools are applied to worksets of texts selected by the user from the MONK datastore. Worksets and results can be saved for later use or modification, and results can be exported in some standard formats (e.g., CSV files).

Each of these texts is normalized (using Abbot, a complex XSL stylesheet) to a TEI schema designed for analytic purposes (TEI-A), and each text has been "adorned" (using Morphadorner) with tokenization, sentence boundaries, standard spellings, parts of speech and lemmata, before being ingested (using Prior) into a database that provides Java access methods for extracting data for many purposes, including searching for objects; direct presentation in end-user applications as tables, lists, concordances, or visualizations; getting feature counts and frequencies for analysis by data-mining and other analytic procedures; and getting tokenized streams of text for working with n-gram and other colocation analyses, repetition analyses, and corpus query-language pattern-matching operations. Finally, MONK's quantitative analytics (naive Bayesian analysis, support vector machines, Dunnings log likelihood, and raw frequency comparisons), are run through the SEASR environment.