Content analysis 1 What is content The content

  • Slides: 42
Download presentation
Content analysis

Content analysis

1. What is content? The content that is analysed can be in any form

1. What is content? The content that is analysed can be in any form to begin with, but is often converted into written words before it is analysed. The original source can be printed publications, broadcast programs, other recordings, the internet, or live situations. All this content is something that people have created. You can’t do content analysis of (say) the weather – but if somebody writes a report predicting the weather, you can do a content analysis of that.

Content analysis is a research tool used to determine the presence of certain words,

Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i. e. text). Using content analysis, researchers can quantify and analyze the presence, meanings and relationships of such certain words, themes, or concepts. Researchers can then make inferences about the messages within the texts, the writer(s), the audience, and even the culture and time of surrounding the text.

Definition 1: “Any technique for making inferences by systematically and objectively identifying special characteristics

Definition 1: “Any technique for making inferences by systematically and objectively identifying special characteristics of messages. ” (from Holsti, 1968) Definition 2: “An interpretive and naturalistic approach. It is both observational and narrative in nature and relies less on the experimental elements normally associated with scientific research (reliability, validity and generalizability) (from Ethnography, Observational Research, and Narrative Inquiry, 1994 -2012). Definition 3: “A research technique for the objective, systematic and quantitative description of the manifest content of communication. ” (from Berelson, 1952)

All this is content. . . Print media Newspaper items, magazine articles, books, catalogues

All this is content. . . Print media Newspaper items, magazine articles, books, catalogues Other writings Web pages, advertisements, billboards, posters, graffiti Broadcast media Radio programs, news items, TV programs Other recordings Photos, drawings, videos, films, music Live situations Speeches, interviews, plays, concerts Observations Gestures, rooms, products in shops

Another way of looking at content is to divide content into two types: media

Another way of looking at content is to divide content into two types: media content and audience content. Just about everything in the above list is media content. But when you get feedback from audience members, that’s audience content. Audience content can be either private or public. Private audience content includes: • open-ended questions in surveys • interview transcripts • group discussions.

Public audience content comes from communication between all the audience members, such as: •

Public audience content comes from communication between all the audience members, such as: • letters to the editor • postings to an online discussion forum • listeners’ responses in talkback radio.

Content analysis has six main stages: 1. Selecting content for analysis 2. Units of

Content analysis has six main stages: 1. Selecting content for analysis 2. Units of content 3. Preparing content for coding 4. Coding the content 5. Counting and weighting 6. Drawing conclusions

Selecting content for analysis Content is huge: the world contains a nearinfinite amount of

Selecting content for analysis Content is huge: the world contains a nearinfinite amount of content. Even when you do analyse the whole of something you will usually want to generalize those findings to a broader context. Content analysis involves sampling. The body of information you draw the sample from is often called a corpus – Latin for body. (100 and 2000 items)

The need for a focus An example of a focus is: "We’ll do a

The need for a focus An example of a focus is: "We’ll do a content analysis of a sample of programs (including networked programs, and songs) broadcast on Radio Lukole in April 2019, with a focus on describing conflict and the way it is managed. "

Units of content To be able to count content, your corpus needs to be

Units of content To be able to count content, your corpus needs to be divided into a number of units, roughly similar in size. There’s no limit to the number of units in a corpus, but in general the larger the unit, the fewer units you need. If the units you are counting vary greatly in length, and if you are looking for the presence of some theme, a long unit will have a greater chance of including that theme than will a short unit.

Units of media content Depending on the size of your basic unit, you’ll need

Units of media content Depending on the size of your basic unit, you’ll need to take a different approach to coding. The main options are (from shortest to longest): A word or phrase. If you are studying the use of language, words are an appropriate unit (perhaps can also group synonyms together, and include phrases). Though a corpus may have thousands of words, software can count them automatically.

A paragraph, statement, or conversational turn: up to a few hundred words. An article.

A paragraph, statement, or conversational turn: up to a few hundred words. An article. This might be anything from a short newspaper item to a magazine article or web page: usually between a few hundred and a few thousand words. A large document. This can be a book, an episode of a TV program, or a transcript of a long radio talk.

The longer the unit, the more difficult and subjective is the work of coding

The longer the unit, the more difficult and subjective is the work of coding it as a whole. Consider breaking a document into smaller units, and coding each small unit separately. However, if it’s necessary to be able to link different parts of the document together, this won’t make sense.

Units of audience content The types of audience content most commonly produced from research

Units of audience content The types of audience content most commonly produced from research data are: 1. Open-ended responses to a question in a survey (usually all on one large file). 2. Statements produced by consensus groups (often on one small file). 3. Comments from in-depth interviews or group discussions. (Usually a large text file from each interview or group. )

Preparing content for coding Before content analysis can begin, it needs to be preserved

Preparing content for coding Before content analysis can begin, it needs to be preserved in a form that can be analysed. For print media, the internet, and mail surveys (which are already in written form) no transcription is needed. However, radio and TV programs, as well as recorded interviews and group discussions, are often transcribed before the content analysis can begin.

Transcribing recorded speech If you’ve never tried to transcribe an interview by writing out

Transcribing recorded speech If you’ve never tried to transcribe an interview by writing out the spoken words, you’d probably don ’t think there’s anything subjective about it. But as soon as you start transcribing, you realize that there are many styles, and many choices within each style. What people say is often not what they intend. They leave out words, use the wrong word, stutter, pause, and correct themselves mid-sentence. At times the voices are inaudible. Do you then guess, or leave a blank? Should you add "stage directions" - that the speaker shouted or whispered, or somebody else was laughing in the background?

Ask three or four people (without giving them detailed instructions) to transcribe the same

Ask three or four people (without giving them detailed instructions) to transcribe the same tape of speech, and you’ll see surprising differences. Even when transcribing a TV or radio program, with a professional announcer reading from a script, the tone of voice can change the intended meaning.

The main principle that emerges from this is that you need to write clear

The main principle that emerges from this is that you need to write clear instructions for transcription, and ensure that all transcribers (if there is more than one) closely follow those instructions. It’s useful to have all transcribers begin by transcribing the same text for about 30 minutes. They then stop and compare the transcriptions. If there are obvious differences, they then repeat the process, and again compare the transcriptions. After a few hours, they are coordinated.

Conversion into computer-readable form If your source is print media, and you want a

Conversion into computer-readable form If your source is print media, and you want a text file of the content (so that you can analyse it using software) a quick solution is to scan the text with OCR software. Live coding If your purpose in the content analysis is very clear and simple, an alternative to transcription is live coding. For this, the coders play back the tape of the radio or TV program or interview, listen for perhaps a minute at a time, then stop the tape and code the minute they just heard. This works best when several coders are working together.

Coding content Coding in content analysis is the same as coding answers in a

Coding content Coding in content analysis is the same as coding answers in a survey: summarizing responses into groups, reducing the number of different responses to make comparisons easier. Thus you need to be able to sort concepts into groups, so that in each group the concepts are both: • as similar as possible to each other, and • as different as possible from concepts in every other group.

Example 2: Newspaper coverage of asylum seekers The project’s purpose is to evaluate the

Example 2: Newspaper coverage of asylum seekers The project’s purpose is to evaluate the success of a public relations campaign designed to improve public attitudes towards asylum seekers. The evaluation is done by "questioning" stories in news media: mainly newspapers, radio, and TV. For newspaper articles, six sets of questions are asked of each story:

Examples that demonstrate content questioning Example 1: TV violence An "interview" with a violent

Examples that demonstrate content questioning Example 1: TV violence An "interview" with a violent episode in a TV program might "ask" it questions such as: How long did you last, in minutes and seconds? What program were you shown on? On what channel, what date, and what time? Was that program local or imported? Series or one-off? What was the nature of the violent action? How graphic or realistic was the violence? What were the effects of the violence on the victim/s? Who did the violent act: heroes or villains? Men or women? Young or old? People of high or low social status? Who was the victim/s: heroes or villains? Men or women? (etc. ) To what extent did the violent action seem provoked or justified?

1. Media details: The name of the newspaper, the date, and the day of

1. Media details: The name of the newspaper, the date, and the day of the week. 2. Exact topic of the news story: Recorded in two forms: a one-line summary - averaging about 10 words, and a code, chosen from a list of about 15 main types of topic on this issue. Codes are used to count the number of occurrences of stories on each main type of topic. 3. Apparent source of the story This can include anonymous reporting (apparently by a staff reporter), a named staff writer, another named source, a spokesperson, and unknown sources. If the source is known, it is entered in the database.

4. Favourability of story towards asylum seekers To overcome subjectivity, we can ask several

4. Favourability of story towards asylum seekers To overcome subjectivity, we can ask several judges (chosen to cover a wide range of ages, sexes, occupations, and knowledge of the overall issue) to rate each story on this 6 point scale: 1 = Very favourable 2 = Slightly favourable 3 = Neutral 4 = Slightly unfavourable 5 = Very unfavourable 6 = Mixed: both favourable and unfavourable When calculating averages, the "6" codes are considered equivalent to "3". The range (the difference between the highest and lowest judge) is also recorded, so that each story with a large range can be reviewed.

5. How noticeable the story was This is complex, because many factors need to

5. How noticeable the story was This is complex, because many factors need to be taken into account. However, to keep the project manageable, we consider just three factors. For newspapers, these factors are: • The space given to the story (column-centimetres and headline size) • Its position in the issue and the page (the top left of page 1 is the ideal) • Whethere’s a photo (a large colour one is best). Each of these three factors is given a number of points ranging from 0 (hardly noticeable at all) up to 3 (very noticeable indeed). The three scores are then added together, to produce a maximum of 9. We then can add 1 more point if there’s something that makes the story more noticeable than the original score would suggest (e. g. a reference to the story elsewhere in the issue, or when this topic is part of a larger story).

Quantitative software for content analysis Several software packages are available that have been written

Quantitative software for content analysis Several software packages are available that have been written specifically for content analysis of documents, particularly when the units are small and clearly separated. These programs count: Word frequencies: how often each word occurs in a document. Often, the commonest words (stopwords)can be excluded: words like in and the which add little meaning but get in the way of the analysis. Also commonly included is lemmatization, which means combining words with the same stem, such as intend, intended, intends, intending, intention, intentions, etc. Category frequencies: synonyms are first grouped into categories, and the program shows how many times each category occurs in the document.

Concordance, or KWIC (key word in context) showing each word in the document, in

Concordance, or KWIC (key word in context) showing each word in the document, in alphabetical order - and its context in the document. Cluster analysis, which groups together words used in similar contexts. Co-word citation - looks at the occurrence of pairs of words. The US government's Echelon Project, which tries to spy on the entire world's email, uses this method. If you write "bomb" and "chocolate" in the same sentence, no problem - but writing "bomb" near "Moslem" may get you rendered by the US military. (Oops!)

Among the widely used quantitative programs for content analysis are: General Inquirer from Harvard

Among the widely used quantitative programs for content analysis are: General Inquirer from Harvard - the first software of this type: very powerful, but perhaps not as userfriendly as some successors VBPro - still a DOS program, but widely used. There are several other VB Pros (Visual Basic and Visitor's Book) but this is the text analysis one by Mark M Miller. Wordsmith. This is Mike Scott's Wordsmith Tools, one of several software packages of this name.

Textpack - a system for computer-aided quantitative content analysis, originally designed for the analysis

Textpack - a system for computer-aided quantitative content analysis, originally designed for the analysis for open-ended questions in surveys, but extended to cover many aspects of content analysis. Windows only, in English and Spanish versions. From ZUMA in Cologne. TACT - Text Analysis Computing Tools. A text analysis and retrieval system, with 16 separate programs, designed for literary studies - but also useful for social research. Good old DOS software. Textstat- a simple text analysis tool, from the Dutch Linguistics group at the Free University of Berlin. A freeware program for the analysis of texts; it runs under Windows. It can read a website and put the pages into its own corpus. Then it can do word frequencies and concordance.

The key point about these programs is that, as long as your unit of

The key point about these programs is that, as long as your unit of content is very short (a word, or a small group of words) these programs do the content analysis for you. You do no coding (coding frames either don't apply or are built in), you need no judges, and the tedious (tiresome) drudgery (hard work) of largeunit content analysis just doesn't apply. All you have to do is interpret the results. So use these programs when possible - but the problem is that for any sort of subtlety (delicacy) in content analysis, there's no substitute for human judgement - and that takes time.

Qualitative software When your content units are large chunks of text, and codes may

Qualitative software When your content units are large chunks of text, and codes may overlap (as in transcripts of interviews, and magazine articles) you need quite a different type of software. This is not automatic, because part of the coding job is to decide where each theme begins and ends "units" are not used in this context. This software displays the text (from a computer file) on screen. This can be coded in two ways:

(1) by coding chunks of text right on the screen (2) by (a) printing

(1) by coding chunks of text right on the screen (2) by (a) printing out the text, (b) writing codes on the printout, then (c) going back and coding it on the computer.

Most people prefer the second method - partly because you usually can't see the

Most people prefer the second method - partly because you usually can't see the whole context of a text on a small computer screen. All of this software is fairly complex, and mostly not cheap. It will probably take you at least a full week to learn how to use it. You may need to take a course, lasting for several days. Though this software saves time in the long run, it's probably not worthwhile to buy it and learn it for a single content analysis project - unless that project is a very large one, with coding likely to take a month or more.

Commonly used software of this type includes: Nud*ist and Nvivo (both from QSR) Atlas

Commonly used software of this type includes: Nud*ist and Nvivo (both from QSR) Atlas TI EZ-Text Kwalitan

Software for automatic content analysis There's software that's so smart it will do the

Software for automatic content analysis There's software that's so smart it will do the content analysis for you. Just find a set of text files, already in computer format, start up this software, tell it where to find the coding frame, and relax for a few seconds while the content analysis is done for you. Two examples of this are KEDS/TABARI and CAMEO-(Conflict And Mediation Events Observations). The catch here is the limited scope. CAMEO is designed to study political conflict and might not work very well in a different context.

Heavy-duty database software can also handle simple content analysis, but this software can be

Heavy-duty database software can also handle simple content analysis, but this software can be very difficult to set up - e. g. Microsoft Access and My. SQL. Filemaker Pro and Lotus Approach are the simplest general database programs that has been tried. A hint: don't let computer experts talk you into something that's difficult to use. They might think it's simple, because they've been using it for years. If anybody recommends that you use a program in this "heavy duty" group for content analysis, ask if they will promise to help you with all the problems you encounter. Unless the answer is a very clear Yes, I suggest you use simpler software, such as a spreadsheet.

Links There are several pages on the Web that have many links on content

Links There are several pages on the Web that have many links on content analysis software. Try these. . . Software for Content Analysis - A Review, by Will Lowe: 21 programs compared in detail, grouped into dictionary-based analysis, development environments, and annotation aids (PDF file, 18 pages). Content Analysis Guidebook Online by Kimberly A. Neuendorf.

Coming to conclusions An important part of any content analysis is to study the

Coming to conclusions An important part of any content analysis is to study the content that is not there: what was not said. This sounds impossible, doesn’t it? How can you study content that’s not there? Actually, it’s not hard, because there’s always an implicit comparison. The content you found in the analysis can be compared with the content that you (or the audience) expected - or it can be compared with another set of content. It’s when you compare two corpora (plural of corpus) that content analysis becomes most useful. This can be done either by doing two content analyses at once (using different corpora but the same principles) or comparing your own content analysis with one that somebody else has done. If the same coding frame is used for both, it makes the comparison much

Comparisons can be: chronological (e. g. this year’s content compared with last) geographical (your

Comparisons can be: chronological (e. g. this year’s content compared with last) geographical (your content analysis compared with a similar one in another area) media-based (e. g. comparing TV and newspaper news coverage) program content vs. audience preferences

When used by itself, content analysis can seem a shallow technique. But it becomes

When used by itself, content analysis can seem a shallow technique. But it becomes much more useful when it’s done together with audience research. You will be in a position to make statements along the lines of "the audience want this, but they’re getting that. " When backed by strong data, such statements are very difficult to disagree with.