Pina Gondaliya: 09/20/21

Hello Readers!

Welcome to my blog. This blog is about thinking activity of Digital Humanities. We have one unit about Digital Humanities in which we will have done course of Introduction of Digital Humanities from EDX HARVARD. After that we have to done one thematic activity from CLIC Activity book. This task was given by Dr. Dilip Barad Bhavnagar University.

Chapter 1

Library collection and the research journey part1

A library is often the first place, or among the first places, that you might turn to when looking for data and other sources that you want to incorporate into your research. Libraries face many new and exciting challenges today as they grapple with how to help researchers produce digital scholarship, and how they will store and disseminate that digital scholarship when it's finished.

We want to introduce you to some of the ways in which libraries approach these questions and challenges. At the same time, and as you will see, these questions are often approached collaboratively and in conversation between library staff and researchers. These videos show a conversation between Laura Wood, Associate University Librarian for Research and Education, and Stephen Osadetz, Assistant Professor of English. Think about the ways in which you might call upon library staff or library collections as you pursue your own research interests.

Art museum collection part one

Similar to libraries, art museums face many unique and interesting questions about how to create and store digital scholarship. As you will see in the following videos, many times the digital scholarship conducted within a museum is done in the spirit of creating new meaning or providing new connections between viewers and art objects. In many cases, data about museum collections provides new meaning, or reveal otherwise hidden meaning, about an individual piece of art or an entire collection of art.

What is Data?

A simplified picture of the traditional way academics work with information is that we have our sources, which we collect, study, and interpret. Then, we write the results, which we publish as articles or books. A digital scholarship workflow looks similar. You have sources in the form of data that you have to acquire, analyze, and interpret. Then, you present the results in the form of books, but also website visualizations or some other electronic medium.

Digital methods are not without their challenges, though. Oftentimes, for instance, researchers will start a project and only realize part-way into their work that they need to visualize their data digitally. For people in this situation, acquiring data can be a problem if the data they use is not in a format that can easily be translated to something digitally useful.

When it comes to data science work, the vast majority of your time is spent managing data. Likewise, researchers embarking on a digital scholarship project often find that much of their time is spent cleaning up data and ensuring that they accurately understand what the data is representing. For instance, if a researcher is dealing with a million rows of tabular data that have been collected over years, there are often differences between entries that will need to be reconciled in order to perform large-scale analyses. Examples of this could include the formatting of dates or the abbreviation of names.

Chepter 2

👉Introduction to lesson 2

DIGITAL HUMANITIES PROJECTS, TOOLS, AND QUESTIONS THEY SUPPORT

List tools of data analysis that can be applied to text in any language, space, networks, images, and statistical analysis. Evaluate existing digital platforms based on features that can be used for data analysis within such fields as literature, history, art, and music.

Product visualizing Boardway

This lesson's major focus is showing you in more detail how specific people created and delivered their own digital humanities projects. The examples in this section include a discussion about each project and then a demonstration of some of the tools that were used to create that project.

As you learn more about these projects and their corresponding tools, keep in mind tools or ideas that you might find useful to your own work and interests.

Tool for Network Analysis-Gephi

Derek Miller will now show and discuss one of the tools that he used in his project Visualizing Broadway, which is called Gephi. This tool is used to perform network analysis on a dataset. Network analysis is one very common digital humanities research method.

Tool for text analysis

In this series of videos you will learn about the Text Encoding Initiative External link External link(TEI), and the XML framework that allows for the mark-up and display of text in various online formats. TEI and XML will be described in the context of medieval manuscripts, which can be analyzed in many ways once they are encoded digitally using XML tagging.

One of the main characteristics of this type of tagging is the format for the "open" and "closed" tags. A closed tag is a repeat of the open tag with a "/" at the beginning. Because these tags do not include additional instructions that tell a computer how to display the content within the tags, additional lines of code, called "CSS," are added to control how the text looks on a given webpage. CSS stands for cascading style sheets. Keep an eye out for this structure within the videos. XML files will also be discussed further in Lesson 3.

Tools for Geospatial Analysis

Humanists make maps for the same reason those in public health, public policy, and urban development make maps. They make maps because they believe that the “where” matters and our job as humanists is to explain why where matters. This section will give you a sense of the process of taking humanistic information sources that you're used to working with, such as letters, documents from archives, imperial decrees, travel accounts, postcards, historical maps themselves, and transforming that familiar material into data, and specifically into spatial data. This process often strikes many humanists as the least familiar and perhaps the least comfortable part of the process. And that's because you're taking something that is complex and nuanced and you are separating it out into its constituent parts and it can feel a bit reductive. But you can also think of it as the process of distillation or a disaggregation, of identifying all of the pieces that make your source complex and identifying them separating them out so that we can either put them right back together again in the same way or build something new with them.

The process is complex and it's important to keep in mind three major rules of data organization. These are transparency, consistency, and the most important, the golden rule, one piece of information in each cell of your spatial data. If you follow those rules and if you're consistent and transparent in your method you will create beautiful spatial data and you will eventually find answers to the question, “Why does where matter?”

More Digital humanities projects examples

Cleveland Historical

Cleveland Historical External linkis a free mobile app that showcases historic people, places, and moments in Cleveland, Ohio. The app incorporates layered, map-based, multimedia presentations. And the tours have been created by a variety of groups and individuals all contributing their curated tour to the app. This mobile app is one of a few projects created by the Cleveland State University Center for Public History + Digital Humanities External link.

Digital Transgender Archive

The purpose of the Digital Transgender Archive (DTA) External link is to increase the accessibility of transgender history by providing an online hub for digitized historical materials, born-digital materials, and information on archival holdings throughout the world. Based in Worcester, Massachusetts at the College of the Holy Cross, the DTA is an international collaboration among more than fifty colleges, universities, nonprofit organizations, public libraries, and private collections. By digitally localizing a wide range of trans-related materials, the DTA expands access to trans history for academics and independent

Chapter 3

👉Introduction to Lesson 3

ACQUIRING, CLEANING, AND CREATING DATA

Understanding Shapes of Data

You may be thinking about this idea "the shapes of data" for the first time. All this really means is that the structure around words and numbers can enable different types of analysis by either a human or a computer reading the text. Read on to learn about the three major categories: unstructured, structured, and semi-structured data.

Unstructured Data

Unstructured data is data that is not organized into distinct, pre-defined semantic units. Typically, this means textual data, which could be a written account, literary work, newspaper article, or anything else represented as text. Human beings can process unstructured data by reading it, understanding its contents, and making inferences using context and common sense – all tasks which are extremely difficult for computers. Unstructured data is hard for computers to process in meaningful ways.

Example:

George Washington was the first President of the United States, serving from April 30th, 1789 through March 4, 1797. He was succeeded by John Adams, who remained president until 1801, standing down on March the 4th.

Structured Data

Structured data is data organized according to some particular data model, which explicitly defines the structure of the data. For example, key data about the presidents of the United States might be structured according to a model requiring that, for each president, we record:

Because this data is structured, any software processing can – and should – assume that each row of the table represents one individual president. Additionally, software can assume that the data in the “Took office” column is always one single date (as opposed to say a number, or name, or description, or time of day) representing the date that the president took office. This makes structured data easy to process. For instance, we could immediately calculate the length of time each president was in office by subtracting the date “Took office” from “Left office” in each row. This simple operation would be far harder to perform computationally on unstructured data, even if it contained all of the same information.

DataSemi-structured Data

Semi-structured data is data which does not conform to a formal data model, but uses formal constructs to indicate separate semantic elements within the data. One method of indicating semantic elements is by adding special codes, or “markup,” to what is otherwise unstructured text. For example, we could add pairs of codes like “<name>” and “</name>” around each occurrence of a proper name, and “<date>” and “</date>” around each occurrence of a date to make some aspects of an otherwise unstructured document more easily processable computationally. This approach can be used to avoid ambiguities and appeals to context and common sense when processing content written in natural languages like English. In the following example, note how the “<date>” codes add an additional element representing each date in a common standardized form:

<name>George Washington</name> was the first president of the <placeName>United States</placeName>, serving from <date when="1789-04-30">April 30th, 1789</date> through <date when="1797-03-04">March 4, 1797</date>. He was succeeded by <name>John Adams</name>, who remained president until <date when="1801">1801</date>, standing down on <date when="1801-03-04">March the 4th</date>.

Although not as simple to process as structured data, semi-structured data can be used to add information to unstructured data that allows for effective processing in many applications

Shape of Data Summary

There are three broad categories of data: structured, unstructured, and semi-structured. Data that is organized according to an explicit, well-defined data model is called structured data. Unstructured data has no distinct, pre-defined units. Data that does not conform to a data model, but uses formal constructs, such as mark-up codes to indicate internal semantic elements, is called semi-structured data.

Unit 3.3 File types and definition

Understanding file types

File Types and Definitions

Data can be stored in a variety of different file types. Read on to learn about the major file types that you will encounter in Digital Humanities projects, along with the advantages and limitations of using each one. Specifically, we will cover plain text, CSV, Text, JSON, HTML, XML, Binary, MP3, and WAV file types.

HTML

Hypertext Markup Language (HTML) is a format used to represent complex documents. For instance, the vast majority of web pages are primarily represented in HTML. Based upon the foundation of plain text, and unlike data-centric formats such as JSON, HTML uses a document-focused approach in which markup, meaning special codes representing additional information, is added to what is essentially a plain text document. In HTML, these codes primarily consist of tags, which are pieces of code providing additional data to elements of a document.

Tags in HTML are always enclosed between the two symbols “<” and “>”, and normally come in pairs: an opening tag and a corresponding closing tag. The sequence of characters appearing immediately after the opening or closing symbols specifies the type of element being described. Different elements are used to express different types of information. In HTML, elements are often closely used to convey formatting information. For example, the “b” element causes whatever text that appears within these two tags to be displayed in bold. Opening tags can also contain additional data in attribute-value pairs. Closing tags always have the same element type as the corresponding opening tag. In addition to representing general formatting information, an important use of tags in HTML is to provide machine-readable references to other documents. In particular, hyperlinks, represented using the “a” element, contain an attribute specifying a reference to another web document, and are displayed by a web browser as clickable links directing the user to the specified location.

Binary Files

File types can be separated into two classes: text-based formats, and binary (i.e. non-text-based) formats. Text-based formats represent their contents on the most basic level as a sequence of characters; binary formats typically use a mixture of representations for storing data. The two are easily distinguished in practice: when viewed in a text editor, the contents of any text-based format file will appear as a coherent mixture of characters and a relatively small number of codes. By contrast, any file in a binary format when opened in a text editor will typically display large amounts of garbled, unintelligible material, if it can be opened at all.

Binary files represent their contents using methods other than character encodings. For example, whereas a text file containing the content “123” will consist of numbers representing a sequence of characters “1”, then “2”, then “3” using an encoding such as UTF-8, resulting in a sequence of numbers each representing one of these three characters, a binary file will instead often represent the numeric value 123 directly as a number. Because this representation is different from any character encoding, a text editor typically cannot display the content in a meaningful way regardless of which character encoding is used. A binary file generally requires special software designed to process the specific format of the binary file.

Unit 3.4 Way of creating data

Introduction to Digital Data creation

Digital Humanities data is often generated from physical, analog sources. For instance, if you wanted to digitize a rare book from the 16th century, you might retrieve the physical book, photograph the pages, and perform optical character recognition (OCR) to recognize written characters and transform the images into a text file format. In this lesson, we will explore several different modes of digitization that are common within the Digital Humanities, ranging from text digitization to archaeological artifact digitization, along with examples of their use.

Digitizing Objects

Many humanities practitioners, such as archaeologists and art historians, study the broader world of objects that humans create, as well as their texts. Such objects may include the pictures people take, the physical artworks they produce, or the everyday utilitarian objects they use. Digital Humanities practitioners may want to digitize objects for a variety of reasons. For instance, some practitioners seek to understand the properties of objects at an aggregate or statistical scale via computational strategies. In this case, digitization of objects might involve collecting tabular data that contains measurements about an artifact and digitally representing it in a relational database.

Others digitize objects so as to present them to broader audiences. The objects they seek to represent, for instance, may be at risk of being destroyed and necessitate digital preservation at the very least. Additionally, objects in museums may only be accessible to a small group of people, making it necessary to digitize the objects in order to share them more broadly with the world. For the purposes of presentation, it is often most effective to take high resolution 2-dimensional images, or 3-dimensional scans of the objects, so they may be experienced in similar ways as the original analog artifact.

Digitizing Audio/visual information

Whether they study live music, or seek to visually capture the human experience, other digital humanities practitioners digitally record information about the world around them via images, video, and audio. It might also be necessary to convert analog forms (i.e. paper photographs, tapes, or vinyl records) to digital in order to allow for unique digital analyses of these different forms. Digital forms also facilitate the presentation to (and preservation for) new audiences through popular services such as YouTube. Furthermore, high-resolution scans of images, or digital representation of audio and video may allow researchers to teach machine learning models to identify particular features in the data that the human eye/ear could not detect or broad-scale patterns that would be time-consuming to identify manually.

Example:

One example of digitizing audio/visual information is making a documentary film, to record visual and audio information, or podcast, to record audio, about an experience or topic. Alternatively, digitizing a historical film to preserve the film and present it to new audiences on the YouTube platform is another example of digitizing audio/visual information.

Unit 3 . 5 Getting Data

Different way of getting data

There are a variety of different ways to get Digital Humanities data. In this unit of the course, we will explore several of the most common approaches. For instance, we will discuss how to identify online data repositories and download data from them. We will also discuss how you might access data stored in relational databases, as well as data accessible via Web APIs. For those cases where online data is not easily accessible via APIs or repository downloads, we additionally discuss the use of web scraping to get data. Finally, at the end of the unit, you will have the opportunity to practice getting structured data about books via the Google Books API.

Unit 3.6 Digital Ethics

Copyright

All researchers should understand the basics of copyright and intellectual property, and the following videos aim to provide those basics. From the perspective of data and other source material that you want to use in your digital humanities project, to how you want to publish or share your project with others, there are a wide variety of copyright considerations that will be helpful for you to know about.

Chepter 4

👉Introduction to Lesson 4

THE COMMAND LINE

A Command-Line Interface or command language interpreter (CLI), also known as command-line user interface, console user interface and character user interface (CUI), is a means of interacting with a computer program where the user (or client) issues commands to the program in the form of successive lines of text (command lines). A program which handles the interface is called a command language interpreter or shell.

A shell prompt indicates that the terminal is ready to receive a command. This prompt is typically shown as a dollar sign, $. You type a command after the shell prompt.

A command is an action that you want your computer to carry out, and it is often abbreviated using just a few letters, such as mkdr for the command to make a new directory.

A directory is a folder or taxonomy of files saved in a specific location on your computer. Files saved on your computer's desktop, for example, are saved in the desktop directory.

A file system refers to the way in which files are named, stored, and retrieved within your computer. File systems can differ across Mac, Windows, and Linux-based operating systems. File systems also include metadata about files such as date created, date modified, last date of access, last backup, file size, and access permissions.

Unit 4.2 Introduction to command line functions

Optional installing a virtual machine

If you would like to experiment using the command line, one option is to install a virtual machine running Ubuntu, a free Linux operating system (for more information about the operating system, consult the online manual here). The videos in this lesson are based on this Ubuntu operating system. If you try to use these command line functions within a different operating system (Mac or Windows, for example) some of the necessary commands may be different than what you see in the videos.

This step is not required for answering most of the graded questions in this lesson. Nearly all graded questions can be answered by watching the videos. In previous versions of this course, we attempted to guide all students through the Ubuntu installation so that the command line interface would be the same for everyone. Due to updates in the software, however, this became overly complicated.

If you use a Mac operating system, you can access the terminal window by searching for "terminal" on your computer.

Chepter 5

👉Introduction to Lesson 5

Working with tools voyant

Introduction to Voyant

The command line and other programmatic tools are useful for automating batch operations, manipulating files, automatically writing to spreadsheets, customizing summary statistics, and any number of programmatic tasks. However, there are easier ways to calculate standard summary statistics and create great visualizations for your text analyses.

Voyant Tools (voyant-tools.org), for instance, is a free online tool that automatically calculates summary statistics for large bodies of text data. Voyant also provides an interface for extending these calculations with visualizations, such as word clouds and bar charts, to display your findings. While not as customizable as the command line or another programmatic interface, Voyant provides most of the tools you need for an exploratory analysis of text data, either at the individual text or corpus level.

Q. 2 Complete one thematic activity from CLIC Activity book. Write your Interpolations.

CLiC (Corpus Linguistics in Context) is a web app specifically designed for the corpus linguistic study of literary texts. While CLiC shares much of its functionality with other corpus tools — similarly to what is described in the Programming Historian’s lesson ‘Corpus Analysis with AntConc’ — it also contains additional features that are particularly relevant to literary analysis. These include the ability to search subsets of the text – such as character speech – and a sorting function that goes beyond alphabetic sorting: the ‘KWICGrouper’, which this post focuses on. The CLiC web app has been developed as part of the CLiC Dickens project for the analysis of patterns in 19th century fiction, particularly novels by Charles Dickens. CLiC currently contains 15 Dickens novels and 29 novels by other 19th century authors and a corpus of 19th century children’s literature will soon be added.

👉Activity 13.1 The social importance of the fire-place

1. Search for fire in DNov (Dickens’s Novels) in “All text”. This should give you over

1700 hits – too many to analyse in detail so that it’s best to use the

KWICGrouper for narrowing down the search.

2. Use the KWICGrouper and left-right sorting to search through the

concordance. First set the “Search in span” from L5 to L1 by dragging the

slider. Next “Search for types” by typing in back. Hit Return.

3. You should see a highlighted set of concordance lines showing phrases such

as back to the fire, like the one below.

Let's start with a simple concordance of Fire in Dickens 's novels in CLIC. This gives us an overview of how the word is used : see concordance 1. (Fireplace is, curiously, much less frequent and used in a different way!)

This will give you a set of concordance lines. In this activity we come to know about the words and it's frequency.

Citation

Course. edX. (n.d.). Retrieved September 20, 2021, from https://learning.edx.org/course/course-v1:HarvardX+DigHum_01+2T2021/home.

Google. (n.d.). Digital humanities kirschenbaum_ade150.pdf. Google Drive. Retrieved September 20, 2021, from https://drive.google.com/file/d/1Iq5T2ghtV15YdarHhlRIDME1Opliktdu/view?usp=sharing.

Wikimedia Foundation. (2021, September 8). Digital humanities. Wikipedia. Retrieved September 20, 2021, from https://en.m.wikipedia.org/wiki/Digital_humanities.

Pina Gondaliya

Monday, 20 September 2021

Thinking Activity:Digital Humanities