Motivation and Computer-Assisted Language Learning   Rodney E. Tyson

Studies on East-West Cultures (동서 문화 연구), 2, 137-146.
(Published by the Research Institute on East-West Cultures, Hong Ik University, Seoul). 1994.

Introduction

Language teachers and researchers have probably always felt, intuitively at least, that motivation plays an important part in second language learning. Gardner and Lambert (1959, cited in Gardner, 1988) finally demonstrated through multivariate analysis that both language aptitude and social motivation were related to achievement in second language acquisition (SLA). Many studies have since replicated or built on that finding, and now it is generally recognized that motivation is an important factor in SLA, independent of aptitude.
Although they disagree to some extent about the exact role, several of the major theories of SLA include motivation as either a direct or indirect factor in language learning. This paper begins by discussing the role assigned to motivation in three prominent models of SLA--Krashen's (1978, 1981) Monitor Model, Schumann's (1978, 1986) Acculturation Model, and Gardner's (1979) Socio-Educational Model. The remainder of the paper addresses the question of whether computer-assisted language learning (CALL) can contribute to increasing motivation among second language learners, and speculates as to how CALL activities might be integrated into language programs in order to provide motivation to learners.



The Role of Motivation in SLA

The Monitor Model
The Acculturation Model
The Socio-Educational Model

Does CALL Increase Motivation?


Discussion

It seems that computers, used judiciously in classrooms, can be fun, exciting, stimulating--and therefore motivating--to language learners. Computer games and activities, for example, can perhaps lower affective filters and, therefore, as Krashen and others would suggest, allow more language acquisition (or learning) to take place. Computer communications and pair or group activities centered around computer tasks can sometimes increase the amount of interaction that takes place among learners as well as between individual learners and the language material. On the other hand, these are some of the same arguments that have been made in the past for such new innovations as listening labs, video, and many other approaches, methods, and techniques of all kinds. All of these can help make learning more interesting or even fun, and they certainly have their places in language programs, but obviously none of them, including computers, is an ultimate solution.
Instead, language teachers and administrators have to consider carefully how computers are going to fit into their overall programs, without assuming that computers in themselves are automatically motivating to students. The CALL activities must be well-designed and appropriate for the goals of the course, the students' levels, and even personalities and interests if they are expected to provide motivation. In addition, research needs to be aimed at finding out specifically what it is about computers and CALL that students find motivating so that this can be worked into language programs.
At the same time, there are some features of computers that are motivating for other than purely affective considerations. Word processing, for example, makes it easier for students to manipulate text and revise what they have written, which has already been shown to improve students' attitude toward writing and encourage them to write more and spend more time revising. Well-written CALL materials can also save time by individualizing lessons to a student's personal ability, needs, and interests. They also can present material through several media and at a rate that can be controlled by the student himself or herself. Computers can provide immediate feedback on correctness and can give immediate and individualized help when a student has problems or questions. All of these things are potentially motivating to serious students interested in developing proficiency in a language as quickly as possible, and provide easy ways for learners to put themselves into a number of language learning situations.
All three models of second language acquisition discussed earlier predict that learners who actively seek out opportunities to interact with the target language and/or speakers of the target language will ultimately be more successful. Computers and CALL activities definitely have the potential to make it easier for highly-motivated students to increase interactions--both with the language and with other learners or target language speakers through computer communications. In addition, easy accessibility to computer training that produces noticeable positive results may act to increase the motivation of less-motivated students, as well as provide a practical means for students to maintain their level of proficiency in a foreign language once their formal training ends. Finally, computers may contribute to lowering students' affective filters, and the Monitor Model and the Acculturation Model predict that this in itself should positively influence the actual amount of language acquisition that takes place. Obviously, then, the use of computers in language learning and teaching has much to offer in terms of motivation.

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS

Shakespearean English seems to many arcane and hard to understand. At the core, however, it is still English. Not only that, but it sounds remarkably

These are general directions for beginning to translate one of Shakespeare's plays into modern language and/or a modern setting. This includes choosing a play, deciding how to translate it, and how to start work on the project. The rewriting bit is up to you.

edit Steps

1.
1
Decide which play you want to translate. This could be more difficult than it sounds. You need to like this play a *lot*, because you're going to be rewriting it and that means you're going to be *reading* it over and over and over again - word by word, line by line.
2.
2
Read the whole play through at least twice so you understand the setting and the characters. If you don't understand who they are and why they're doing the things they're doing, you're not going to be able to translate with any degree of success and your finished product won't be recognizable as an 'updated' version of the original.
3.
3
Familiarize yourself with Elizabethan vernacular and slang. You don't have to be exhaustive about this. There are books, and if you ask at the library they'll show you where to find them. Or you can look on the Web by running a search for Shakespeare + slang or Elizabethan + slang. For starters, you can go to the websites listed below and try to get a feel for how the English language was used in Shakespeare's time.
4.
4
Decide what your new setting is going to be. You may or may not have thought about this yet. If you're moving the story up into a more modern era, you're going to have to decide where in that era it would have happened, and to who and how and why. And in order to do that, you're going to have to figure out what the modern equivalent to the play's situation is. Romeo and Juliet is a very easy and obvious one, so it gets redone a lot this way - and for that reason unless you have a really original idea it isn't recommended that you use Romeo and Juliet.
5.
5
Familiarize yourself with the vernacular and slang of the setting you picked. Or in other words, if you're resetting The Merchant of Venice in Spanish Harlem in the 1970's, you need to go figure out how people in Spanish Harlem talked in the 1970's and what their pop-cultural references would have been. This is where accuracy *is* important; your audience has to be able to recognize some things or the revised play won't make sense to them.
6.
6
Start with one scene. You don't necessarily have to start with the first one. Pick a scene that, for you, sets the tone for the rest of the play, then sit down and start playing with it. The easiest way to do this is to picture in your head how the scene would look if you were seeing it performed as per your translation, and then start changing things in the play itself to match your vision. Once you have this first scene converted to your liking, then you can move on to the next one. You don't have to work on them in order.
7.
7
Keep your revisions organized. You can of course do this any way you're comfortable with, including by sitting down with a notebook and pencil and a copy of the play, but at first it might be easiest to copy and paste the scene you're starting with into a word processing program, change the font color to anything but black, and then double or triple-space between all the lines. Type in your revised lines, scene directions and etcetera in black in the space under the originals.

Ads by Google
Become a Translator
Learn to translate into English Study in Auckland, New Zealand
Languages.Unitec.ac.nz
L. S. Vygotsky Research
Online books, journals for academic research, plus bibliography tools.
www.Questia.com/L_S_Vygotsky

edit Tips

* If all you want to do is convert the language but not the play's setting, follow the above instructions but without changing anything but the dialog. You will still have to pick a setting, in your own mind at least, in order to keep your conversion of the language consistent.
* Since Word documents over a certain size can become almost impossible to navigate around in, you might consider making each scene a separate document. It will be easy enough to cut and paste them all back together into one document once you're finished.
* Things to keep in mind about Shakespeare: We think of Shakespeare as being highbrow, a writer for educated people and intellectuals; he was actually writing the equivalent, for his time, of lowbrow sitcoms and pop-culture movies of the week. He was crude and politically incorrect. He was writing for the great unwashed masses, trying to make them pick his theater over the bear-baiting going on down the street. He was also very, very talented and a great storyteller, and that's why his stuff has stayed popular as long as it has. The language may go out of style, but the stories never will.
* Remember, this isn't rocket science! This is Art, and Art is subjective - there is no one 'correct' way your finished product has to look, and you don't have to get it perfect the first time through. Write. Convert. Have fun!

Ads by Google
L. S. Vygotsky Research
Online books, journals for academic research, plus bibliography tools.
www.Questia.com/L_S_Vygotsky
Colleges of a City
Check Notices Posted by Colleges and Other Teaching Centers.
www.MilaCoach.Com
edit Warnings

* Be prepared for the fact that some of your translations - unless you're deliberately cleaning this thing up for "G" audiences - could turn out to be equivalent to words you only hear in Quentin Tarantino movies.

Ads by Google
Colleges of a City
Check Notices Posted by Colleges and Other Teaching Centers.
www.MilaCoach.Com
Shakespeare Backstage
Experience the extraordinary Elizabethan Theatre backstage
www.visitmedford.org

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS

Having trouble reading Romeo and Juliet? Can't understand it?

Shakespearean English seems to many arcane and hard to understand. At the core, however, it is still English. Not only that, but it sounds remarkably intelligent. It thus makes sense that you learn how to speak it. Luckily, this is surprisingly easy.

edit Steps

1.
1
Read a Shakespearean play in the original if you can. Hamlet, Othello, and Romeo and Juliet are good candidates. This will give you an idea of how the language is used and also increase your vocabulary with older forms and uses of words.
2.
2
Replace questions of the form "Can I?" with phrases such as "I do beseech you" or "I prithee". This archaic form sounds particularly Elizabethan, and has the benefit of being more polite.
3.
3
Work on greetings. In modern times, we are satisfied with "Hello" or "How are you". To make this sound more Shakespearean, a simple form may be "Good greetings, my lord/lady" or, if you truly wish to know how the other is doing, try "How now, [Name]?". Feel free to add clauses along the form of "and may you be well". You can respond with "Likewise to you", remembering to refer to "my lord" or "my lady" A more polite and flowery response could be "All of God's greetings upon you".
4.
4
Work on your farewells. Farewells can be much improved from modern "Bye!". A very simple, no-thinking-required approach might be "Fare thee well", but this can be improved further by considering how your conversation ended. Did you say goodbye to someone for a long time? "Fare thee well in your travels, and may by fate we meet again." Similarly modify your goodbyes to fit the situation.
5.
5
Add in more-or-less superfluous adverbs such as "humbly" - they make your speech more flowery, which is the main effect.
6.
6
Shorten "it" to just "'t". For example, "it was" becomes "'twas", "do it" becomes "do't"
7.
7
Master forms of "thou" - use "thy" for possessive ("thine" before vowels or the letter H), and "thee" for an object.
8.
8
Clearly mark off opinions with "methinks" and "forsooth".
9.
9
Refine your cursing. Replace "F*ck" with "Fie, fie on't" and "damned" with "accursed". Other adjectives can be replaced with "traitorous", "lecherous", or "thieving". You can also refer to those of humble origin or anyone acting servant-like as "knavish"
10.
10
Freely use the following words: "Anon", "As you will", "By your leave", "Carouse", "Chide", "Cutpurse", "E"en", "E"er", "Fie", "Grammercy", "Maid or Maiden", "Marry!", "Mayhap", "Morrow", "N"er", "Nonpariel", "Oft", "In Faith", "Perchance", "Poppet", "Pray pardon me", "Pray tell", "Privy", "Stay", "S"wounds!", "Tosspot", "Verily", "Wench", "Wherefore", "Yonder"
11.
11
Fix your verbs: Add "-st" to singular second-person verbs and "-th"/"-eth" to singular third-person verbs. For example, "How dost thee" and "How doth he"
12.
12
Use "shall". It can be used to express obligation, and also in the first person. Remember that when used with "thee" or "thou", "will" becomes "wilt" and "shall" "shalt"
13.
13
If you need to break up with someone, take a few hints from Hamlet (Act 3, scene 1, 114–121).

Ads by Google
Learning Thai is easy
Thai Language Software Learn to read, write and speak Thai
www.seasoftware.co.uk
Shakespeare Bust
Large Variety of Shakespeare Gifts Collectibles.
www.ShakespearesDen.com

edit Tips

* You can improve further by speaking in iambic pentameter, but this is extremely hard to do off the cuff without practice.
* Rhyme is unnecessary and often makes it hard to speak properly. Furthermore, it is often silly and nullifies the effect of sounding smart. Only rhyme if you are sure it is tasteful.

Ads by Google
English language learning
Make your kids learn English Completely free
www.mingoville.com
Learn Thai Language
A new way to learn Thai Language with fun interactive games
ITS4Thai.com
edit Warnings

* Speaking like Shakespeare will require frequent references to God. You don't have to believe in the Christian God, or in any, to use such figures of speech. However, if this offends you in any way, feel free to leave such references out.

Ads by Google
Free French
Vocabulary, Grammar, Phrasebook Start Learning Now!
www.ZapFrench.com
Shakespeare Backstage
Experience the extraordinary Elizabethan Theatre backstage
www.visitmedford.org

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS

How to Read and Understand Romeo and Juliet

Having trouble reading Romeo and Juliet? Can't understand it? Here are steps on how to read it.

edit Steps

1.
1
Go to your local bookstore and purchase a book that tells the story in simpler words. Normally these books are called "Shakespeare Made Easy", and cost around $6.99.
2.
2
Look at the two sides of the book. You'll notice that one side is written in Shakespearean language and the other side is English translated. Read the Shakespearean and when you come along a difficult spot, look to the other side for help.
3.
3
Go to "Cliffs Notes" online if you don't want to purchase the book. They have excellent pages on each act, descriptions of the characters, analyses and quizzes.
4.
4
Read with a partner. Sometimes if you read alone you might miss a deeper meaning. If you read in a group or with a friend you have more thought and insights than your own.
5.
5
Go and see the play or movie. This way you can see what is going on, and get the main idea.
6.
6
Get a little information on Shakespeare before you read the book. This way you will understand his time. (Look up Shakespeare, the Globe Theater, etc.)
7.
7
Be able to recognize poetry forms. William Shakespeare was a genius in the art of writing. With almost every sentence he used some form of structure. If you know what a sonnet is or a couplet, then the book will seem a lot more meaningful. Here is an example of a sonnet, that he used in the beginning to start of this story:

"Two households, both alike in dignity,
In fair Verona, where we lay our scene,
From ancient grudge break to new mutiny,
Where civil blood makes civil hands unclean.
From forth the fatal loins of these two foes
A pair of star-cross'd lovers take their life;
Whose misadventured piteous overthrows
Do with their death bury their parents' strife.
The fearful passage of their death-mark'd love,
And the continuance of their parents' rage,
Which, but their children's end, nought could remove,
Is now the two hours' traffic of our stage;
The which if you with patient ears attend,
What here shall miss, our toil shall strive to mend."

Ads by Google
Free Commodity MCX Tips
Gold | Silver | Copper Free Live Charts & Metal Prices
www.goldsilvertrends.com
The Mousetrap Study Pack
Complete study guide and biography for just $12.99
BookRags.com

edit Tips

* A Sonnet is a 14 lined poem that rhymes in this format: AB AB CD CD EF EF GG.
* If you watch the movie or play it might be changed around a bit, so on an exam you might get an answer wrong. It's always best to read the book.
* If you read the book with Shakespeare Made Easy, do not only read the right side. If you receive a test, you might get a few questions wrong. For example, if they give you a quote from the left side you will have no clue what it says!
* If you work in a group, pick members who won't goof off, that way you will actually spend your time deciphering this story.
* Get a detailed plot outline for the story. If you know what's going to happen, you will have a better time following along.

edit Warnings

* Don't cheat and rely on the cliff notes. This is taking out the fun and appreciation of learning about Shakespeare.

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS

How to Work Through a Novel

Steps

1.
1
Find some time. Set aside a time each day, maybe just a half an hour or a full hour, just for reading. Maybe spend some time reading before you go to bed or on a weekend. Even if it's just that fifteen minute lunch break at work, set aside that little bit.
2.
2
Find a quiet and secluded place to read. This will help you focus on the book and also ensure that you will not be disturbed as you read. Try your local library, they often have corners where one could pull up a chair to enjoy a decent novel.
3.
3
Read your first chapter. Don't just skim through it, read it. Keep a notebook handy and write down the characters introduced and some words that you don't understand if it helps you remember what happened. Or, at the end of each chapter, write a chapter summary so that you can look back and remember what happened last!
4.
4
Take it one chapter at a time, or if that overwhelms you, just take it four or five pages at a time. Split your reading up into sections, three or four pages, a chapter or two, etc. etc. If you get overwhelmed, you're likely to give up all together.
5.
5
Reread, if needed. Don't be ashamed to have to reread the chapter if you forget something that you want to remember! When you reread, be sure to focus on the idea of the chapter or paragraph or page or whatever.

Ads by Google
Unique Authors
Talented Writers that are able to write whatever you need
www.uniqueauthor.com
Manuscript Critiques
Critiquing and structural edits Experienced Professional Affordable
www.linleymaroney.com/critique.html

edit Tips

* Take your time, if you read through a novel in two days, you need to reread it. It takes time to grasp the concept of a book. Not to mention that you won't enjoy it as much if you skim through it!

Ads by Google
Bilingual Edu. Research
Access full-text academic books and journals. Research online @ Questia
www.Questia.com/Bilingual_Education
Looking for a Publisher?
Protect your copyright before submitting. Register online now.
CopyrightRegistrationService.com

Ads by Google
Beyond the Rainbow Flash
Fantasy Adventure for Kids 7-12 Chapter book for fun and learning
www.cnsp.com/DogsbreathPress
Looking for a Publisher?
Protect your copyright before submitting. Register online now.
CopyrightRegistrationService.com

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS

How to Understand and Perform Shakespeare

Steps

1.
1
Get into the right mindset. Feel the mood of the play. Never assume that this is too difficult for you anyway. Going through the text step by step will enable you to understand it.
2.
2
Read the play, or at least a summary. It helps to know what's going on. Look at other performances of Shakespeare; they're plays, they were written to be performed, and you can get ideas on how to say that one line you can't figure out.
3.
3
Get used to the old-fashioned language. Whenever you hear the word "thou" or "thee", that means "you". When you hear "art", that means "are". When you hear anything that ends in "-st", don't freak out. Shakespeare adds "-st" to just about any word, thus "mayst not" = "may not". Shakespeare also likes to take out syllables to make the line flow smoother--example: "o' th' " would translate to "on the".
4.
4
EXAMPLES: "For in my sight, she uses thee kindly, but thou liest in thy throat." translates to: "From what I see, she is kind to you, but you lie." and "No, faith, I'll not stay a jot longer!" translates to "No, I will not stay here a second longer! or "What light through yonder window breaks, it is the east and Juliet is the sun." translates to "Juliet is as radiant with beauty as the sun is radiant with light". Remember, people were people 500 years ago, and even though they talked fancier, they still had emotions like we do.
5.
5
Familiarize yourself with stylistic means used in poetry, like similes, oxymorons, metaphors etc. These are usually discussed in literature classes at school, and knowing what they mean will make the whole process of reading Shakespeare's work less intimidating. Look them up on the internet and try to find examples of them in your text.
6.
6
If you're Performing Shakespeare, be sure to enunciate (speak out entire words rather than a slur of words). The way Shakespeare puts sentences together are very unorthodox, so it will be very easy for the audience to lose track of what you're actually saying. By enunciating clearly and fully, the audience can piece together the words much more easily.
7.
7
When performing, remember that your character is an actual person with actual feelings, not a fancy-talking english person. Your character has feelings just like you.

Ads by Google
London Theatre Tokens
Britains National Gift Token No Fee Vouchers For London & Local Theatre
FirstBookshop.co.uk
Learn English Free
American Accent English 365 Video Lessons for Beginners
Hugosite.com

edit Tips

* Shakespeare is not at all different from modern entertainment. It may be in a unfamiliar dialect but it has sex, violence, drugs, and even low-brow humor. It is not a "higher" form of literature, remember that it was written to entertain mostly uneducated and illiterate peoples.
* Consider watching the movie "Renaissance Man" with Danny DeVito. His character has to teach a bunch of soldiers how to understand Shakespeare. The pieces used in the movie are "Hamlet" and "Henry V". The strategy he uses to do it is quite good and the movie is entertaining, too.
* Learn to enjoy watching Shakespeare plays or movies. One of the easiest ways to understand Shakespeare is to fill your ears with the language and get used to it.
* Watch or read the work with a person who understands the material and have them "translate" for you.
* Spend time deciphering little bits that you like, such as Hamlet's or Macbeth's very famous speeches. It may be difficult at first but it gets easier.
* You will begin to know the background of many of Shakespeare's expressions that have turned into modern cliches, like "Brave New World", "Seachange" (both from "The Tempest").
* Find a good quote and use it. "Neither a borrower nor a lender be", from Polonious' speech in "Hamlet" goes down very well at the bank or stock market.
* Ask a librarian, teacher, professor, or a smart adult to help you decipher the text of Shakespeare.

Ads by Google
London Theatre Tokens
Britains National Gift Token No Fee Vouchers For London & Local Theatre
FirstBookshop.co.uk
Australasian Golf Academy
Learn to play like a professional Get a qualification and go to Uni
www.aga.edu.au/
edit Warnings

* Shakespeare may be a little grown-up for some people.
* Don't get frustrated or confused.
* Never get mad while learning.

Ads by Google
English online. Free
Language exchange - Learn English for Free. Find new friends,practice
InterSpeakers.com
汉语 (Chinese) flashcards
Select from 20000+ Words/Phrases with Audio and Examples
www.trainchinese.com

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS

Anglo-Saxon literature

Poetry

There are two types of Old English poetry: the heroic, the sources of which are pre-Christian Germanic myth, history, and custom; and the Christian. Although nearly all Old English poetry is preserved in only four manuscripts—indicating that what has survived is not necessarily the best or most representative—much of it is of high literary quality. Moreover, Old English heroic poetry is the earliest extant in all of Germanic literature. It is thus the nearest we can come to the oral pagan literature of Germanic culture, and is also of inestimable value as a source of knowledge about many aspects of Germanic society. The 7th-century work known as Widsith is one of the earliest Old English poems, and thus is of particular historic and linguistic interest.

Beowulf , a complete epic, is the oldest surviving Germanic epic as well as the longest and most important poem in Old English. It originated as a pagan saga transmitted orally from one generation to the next; court poets known as scops were the bearers of tribal history and tradition. The version of Beowulf that is extant was composed by a Christian poet, probably early in the 8th cent. However, intermittent Christian themes found in the epic, although affecting in themselves, are not integrated into the essentially pagan tale. The epic celebrates the hero's fearless and bloody struggles against monsters and extols courage, honor, and loyalty as the chief virtues in a world of brutal force.

The elegiac theme, a strong undercurrent in Beowulf, is central to Deor, The Wanderer, The Seafarer, and other poems. In these works, a happy past is contrasted with a precarious and desolate present. The Finnsburgh fragment, The Battle of Maldon, and The Battle of Brunanburh (see Maldon and Brunanburh ), which are all based on historical episodes, mainly celebrate great heroism in the face of overwhelming odds. In this heroic poetry, all of which is anonymous, greatness is measured less by victory than by perfect loyalty and courage in extremity.

Much of the Old English Christian poetry is marked by the simple belief of a relatively unsophisticated Christianity; the names of two authors are known. Cædmon —whose story is charmingly told by the Venerable Bede , who also records a few lines of his poetry—is the earliest known English poet. Although the body of his work has been lost, the school of Cædmon is responsible for poetic narrative versions of biblical stories, the most dramatic of which is probably Genesis B.

Cynewulf , a later poet, signed the poems Elene, Juliana, and The Fates of the Apostles ; no more is known of him. The finest poem of the school of Cynewulf is The Dream of the Rood, the first known example of the dream vision, a genre later popular in Middle English literature . Other Old English poems include various riddles, charms (magic cures, pagan in origin), saints' lives, gnomic poetry, and other Christian and heroic verse.

The verse form for Old English poetry is an alliterative line of four stressed syllables and an unfixed number of unstressed syllables broken by a caesura and arranged in one of several patterns. Lines are conventionally end-stopped and unrhymed. The form lends itself to narrative; there is no lyric poetry in Old English. A stylistic feature in this heroic poetry is the kenning, a figurative phrase, often a metaphorical compound, used as a synonym for a simple noun, e.g., the repeated use of the phrases whale-road for sea and twilight-spoiler for dragon (see Old Norse literature ).

Prose

Old English literary prose dates from the latter part of the Anglo-Saxon period. Prose was written in Latin before the reign of King Alfred (reigned 871-99), who worked to revitalize English culture after the devastating Danish invasions ended. As hardly anyone could read Latin, Alfred translated or had translated the most important Latin texts. He also encouraged writing in the vernacular. Didactic, devotional, and informative prose was written, and the Anglo-Saxon Chronicle, probably begun in Alfred's time as an historical record, continued for over three centuries. Two preeminent Old English prose writers were Ælfric , Abbot of Eynsham, and his contemporary Wulfstan , Archbishop of York. Their sermons (written in the late 10th or early 11th cent.) set a standard for homiletics.

A great deal of Latin prose and poetry was written during the Anglo-Saxon period. Of historic as well as literary interest, it provides an excellent record of the founding and early development of the church in England and reflects the introduction and early influence there of Latin-European culture.

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS

Plagiarism and falsified data slip into the scientific literature: a report Scientific inquiry recently got a big black eye, as fraudulent data and p

The challenges of scientific integrity

Scientific progress is conveyed primarily through peer-reviewed publications. These publications are the primary source of information for everyone involved in scientific research, allowing them to understand the current scientific models and consensus and making them aware of new ideas and new techniques that may influence the work they do. Because of this essential role, the integrity of the peer review process is essential. When misinformation makes its way into the literature, it may not only influence career advancement and funding decisions; it can actually influence which experiments get done and how they are interpreted. Bad information can also cause researchers to waste time in fruitless attempts to replicate results that never actually existed.

Despite the danger represented by research fraud, instances of manufactured data and other unethical behavior have produced a steady stream of scandal and retractions within the scientific community. This point has been driven home by the recent retraction of a paper published in the journal Science and the recognition of a few individuals engaged in dozens of acts of plagiarism in physics journals. Ars has interviewed a number of the people involved in both of these cases, and we discuss their impact on the field and the prospects for preventing similar problems in the future.
Plagiarists run amok

Recently, Ars was informed that a number of papers with a set of overlapping authors were being withdrawn from the arXiv, a repository of publications and drafts in the physical sciences. We confirmed that several papers were no longer available and that their entries now lead to text that states, "This paper has been removed by arXiv administrators because it plagiarizes... " followed by a list of the sources of the plagiarized material (an example is here).

In at least one case, the final publication had been withdrawn but an earlier draft version was still available. Comparisons of the text (PDF) with the sources it was plagiarized from reveal the blatant nature of the fraud. Section 1 of that paper begins with an extensive copying of the introduction of a 2003 paper (PDF; copying starts with the second sentence of the introduction). Section 3 of the fraudulent work begins with a similarly large excerpt from the introduction of a different publication (PDF) that also dates from 2003. Although the arXiv has acted on the plagiarism, the fraudulent publication currently remains available at the Journal of High Energy Physics.

Ars contacted an arXiv administrator, who put us in touch with faculty at the Middle East Technical University in Ankara, Turkey, home of the authors of the fraudulent publications. Dr. Ozgur Sarioglu spoke on behalf of a group of METU faculty that also included Atalay Karasu, Ayse Karasu, and Bayram Tekin. They provided a PDF of the Journal of High Energy Physics article, marked up to reveal the source of much of the text. It contains material from at least a dozen different peer-reviewed works; the original material seems limited to a majority of the abstract and a limited number of mathematical derivations that rephrase equations published elsewhere.

According to Dr. Sarioglu, two of the authors of this paper were graduate students with a prodigious track record of publication: over 40 papers in a 22-month span. Dr. Karasu, who sat on the panel that evaluated their oral exams, became suspicious when their knowledge of physics didn't appear to be consistent with this level of output. Discussions with Dr. Tekin revealed that the students also did not appear to possess the language skills necessary for this level of output in English-language journals (METU conducts its instruction in English).

This caused these faculty members to go back and examine their publications in detail, at which point the plagiarism became clear. "All they had done was literally take big chunks of others' work using the 'copy and paste' technique," Dr. Sarioglu said, "steal from here and there to cook up an Intro which is basically the same stuff in all their manuscripts, carry out some really trivial calculations such as taking derivatives of some simple functions, and write up the results in the format of a paper." The department chair was informed and started an internal investigation; the university's Ethics Committee has since become involved.

In the mean time, the faculty and administration at METU are attempting to do some damage control. The university's president personally sent a letter to the Journal of High Energy Physics requesting that the paper be withdrawn—a request that, as noted above, has yet to be acted upon. Meanwhile, the faculty members mentioned above are working with the arXiv administrators to ensure that any plagiarized work is removed.

How will this impact the field? Professor Paul Ginsparg at Cornell, who helped establish the arXiv, suggests that the impact will be minor. Because the fraudulent work was necessarily so derivative, it did not have a high profile or influence. "There's little effect on science," Dr. Ginsparg said, "since the people who produce high quality work don't need to plagiarize, and the people who do need to plagiarize don't produce high enough quality work to affect anything." Sarioglu is less sure, as the full extent of the plagiarism remains unclear. Most of the publications had additional authors beyond the two graduate students at the center of the scandal, and the investigations are just beginning to explore the larger connections. "All the work they had published on gr-qc [general relativity-quantum cosmology] plagiarizes something. Looking into these things we also found other cases—there are about 20 people who we know are plagiarizers."

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS

The scope of literature

Literature is a form of human expression. But not everything expressed in words—even when organized and written down—is counted as literature. Those writings that are primarily informative—technical, scholarly, journalistic—would be excluded from the rank of literature by most, though not all, critics. Certain forms of writing, however, are universally regarded as belonging to literature as an art. Individual attempts within these forms are said to succeed if they possess something called artistic merit and to fail if they do not. The nature of artistic merit is less easy to define than to recognize. The writer need not even pursue it to attain it. On the contrary, a scientific exposition might be of great literary value and a pedestrian poem of none at all.

The purest (or, at least, the most intense) literary form is the lyric poem, and after it comes elegiac, epic, dramatic, narrative, and expository verse. Most theories of literary criticism base themselves on an analysis of poetry, because the aesthetic problems of literature are there presented in their simplest and purest form. Poetry that fails as literature is not called poetry at all but verse. Many novels—certainly all the world’s great novels—are literature, but there are thousands that are not so considered. Most great dramas are considered literature (although the Chinese, possessors of one of the world’s greatest dramatic traditions, consider their plays, with few exceptions, to possess no literary merit whatsoever).

The Greeks thought of history as one of the seven arts, inspired by a goddess, the muse Clio. All of the world’s classic surveys of history can stand as noble examples of the art of literature, but most historical works and studies today are not written primarily with literary excellence in mind, though they may possess it, as it were, by accident.

The essay was once written deliberately as a piece of literature: its subject matter was of comparatively minor importance. Today most essays are written as expository, informative journalism, although there are still essayists in the great tradition who think of themselves as artists. Now, as in the past, some of the greatest essayists are critics of literature, drama, and the arts.

Some personal documents (autobiographies, diaries, memoirs, and letters) rank among the world’s greatest literature. Some examples of this biographical literature were written with posterity in mind, others with no thought of their being read by anyone but the writer. Some are in a highly polished literary style; others, couched in a privately evolved language, win their standing as literature because of their cogency, insight, depth, and scope.

Many works of philosophy are classed as literature. The Dialogues of Plato (4th century bc) are written with great narrative skill and in the finest prose; the Meditations of the 2nd-century Roman emperor Marcus Aurelius are a collection of apparently random thoughts, and the Greek in which they are written is eccentric. Yet both are classed as literature, while the speculations of other philosophers, ancient and modern, are not. Certain scientific works endure as literature long after their scientific content has become outdated. This is particularly true of books of natural history, where the element of personal observation is of special importance. An excellent example is Gilbert White’s Natural History and Antiquities of Selbourne (1789).

Oratory, the art of persuasion, was long considered a great literary art. The oratory of the American Indian, for instance, is famous, while in Classical Greece, Polymnia was the muse sacred to poetry and oratory. Rome’s great orator Cicero was to have a decisive influence on the development of English prose style. Abraham Lincoln’s Gettysburg Address is known to every American schoolchild. Today, however, oratory is more usually thought of as a craft than as an art. Most critics would not admit advertising copywriting, purely commercial fiction, or cinema and television scripts as accepted forms of literary expression, although others would hotly dispute their exclusion. The test in individual cases would seem to be one of enduring satisfaction and, of course, truth. Indeed, it becomes more and more difficult to categorize literature, for in modern civilization words are everywhere. Man is subject to a continuous flood of communication. Most of it is fugitive, but here and there—in high-level journalism, in television, in the cinema, in commercial fiction, in westerns and detective stories, and in plain, expository prose—some writing, almost by accident, achieves an aesthetic satisfaction, a depth and relevance that entitle it to stand with other examples of the art of literature.

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS

COMPARABILITY OF CONVENTIONAL AND COMPUTERIZED TESTS OF READING IN A SECOND LANGUAGE Paginated PDF version Yasuyo Sawaki University of California, Lo

COMPARABILITY OF CONVENTIONAL AND COMPUTERIZED
TESTS OF READING IN A SECOND LANGUAGE
Paginated PDF version

Yasuyo Sawaki
University of California, Los Angeles

ABSTRACT

Computerization of L2 reading tests has been of interest among language assessment researchers for the past 15 years, but few empirical studies have evaluated the equivalence of the construct being measured in computerized and conventional L2 reading tests and the generalizability of computerized reading test results to other reading conditions. In order to address various issues surrounding the effect of mode of presentation on L2 reading test performance, the present study reviews the literature in cognitive ability testing in educational and psychological measurement and the non-assessment literature in ergonomics, education, psychology, and L1 reading research. Generalization of the findings to computerized L2 assessment was found to be difficult: The nature of the abilities measured in the assessment literature does not necessarily involve language data; mode of presentation studies in the non-assessment literature involving L2 readers are scarce; and there are limitations in the research methodologies used. However, the literature raises important issues to be considered in future studies of mode of presentation in language assessment.

INTRODUCTION

Reading from computer screens is becoming more and more common in our daily lives as the amount of reading material available on line is rapidly increasing. This influence has been seen in the field of language assessment where computerized testing, such as computer-based tests (CBTs) and computer-adaptive tests (CATs, a special case of computer-based testing, where items administered to examinees are tailored to the individual examinee's ability on the construct being measured), are attracting the attention of researchers, language learners, and test users alike, as exemplified by the implementation of CATs at institutional levels in the past 15 years (Kaya-Carton, Carton, & Dandolini, 1991; Larson, 1987; Madsen, 1991; Stevenson & Gross, 1991; Young, Shermis, Brutten, & Perkins, 1996). Regardless of the rapid growth of demand in this area, development and implementation of this new mode of testing is currently in its initial stages. Therefore, sufficient empirical data, which would allow researchers to look into the soundness of computerized language tests with regard to construct validity and fairness, are yet to be available.

One issue which requires prompt investigation is the effect of mode of presentation on comparability of the information obtained from computerized and paper-and-pencil (P&P) tests. In their comprehensive summary of issues surrounding CATs in L2 contexts, Chalhoub-Deville and Deville (1999) point out the scarcity of comparability research in L2 language tests and the importance of conducting comparability studies in local settings to detect any potential test-delivery-medium effect when a conventional test is converted to a computerized test. In terms of L2 reading comprehension tests in particular, the current move toward computerized testing is proceeding without sufficient empirical evidence that reading from a computer screen is the same as reading in print for L2 readers. Since presence of a mode effect on reading comprehension test performance would seriously invalidate score interpretation of computerized reading tests, language assessment researchers have discussed the necessity of examining (a) the degree to which computerized reading comprehension tests measure the same construct as P&P tests and (b) the extent to which results of computerized reading tests can be generalized to other contexts (Alderson, 2000; Bachman, 2000). In order to seek future directions in investigating the effect of mode of presentation on L2 reading test performance, the present study reviews two distinct areas of previous literature: (a) studies that address general construct validity issues of computerized tests in cognitive ability as well as language assessment; and (b) studies that shed light on the effects of mode of presentation on reading performance conducted mainly in ergonomics, education, psychology, and L1 reading research.

ASSESSMENT LITERATURE

In order to support construct validity of computerized tests such that the construct being measured is not being affected by the mode of presentation, the equivalence of corresponding conventional and computerized test forms must be established from various directions. In this section, potential task changes caused by a shift to the computer administration mode will be reviewed first. Then, the criteria that have been used to evaluate cross-mode equivalence of test forms and various psychometric and statistical issues, such as stability of item parameter estimates and linking tests across modes, will be summarized. This section will close with a discussion of the impact of mode of presentation on examinees, namely, the interaction of test taker characteristics with testing conditions and the comparability of decisions made across modes.

Comparability of Tasks Across Modes of Presentation

As the first step in establishing the equivalence of computerized and conventional test forms, the content covered by the two tests should be comparable. To achieve this goal, several promising algorithms to control for content coverage have been implemented in L2 CATs in the last decade (for summaries of recent developments in content balancing algorithms, see Chalhoub-Deville & Deville, 1999, and Eignor, 1999). Even when the content coverage in a given computerized test is carefully controlled to mirror the test content specification, potential "task change" may still occur across modes of presentation, as pointed out by Green (1988). A task change is the possibility that the nature of a test task may be altered when the item is presented in a different mode, which may in turn induce unexpected changes in item difficulty. Green states, "If computer presentation changes tasks, so that the correlation between scores on the computer and conventional versions is low, then validity is threatened" (p. 78).

Greaud and Green (1986) reported low cross-mode correlations in a speeded clerical skills test, which may indicate a task change caused by a shift to the CAT format. They investigated the effect of mode of presentation on the numerical operations (NO) and coding speed (CS) subtests of the Armed Services Vocational Aptitude Battery (ASVAB) administered to applicants for the U.S. military services. Fifty college students took short versions of the two subtests. The CAT versions were completed faster by the subjects, who did better on the CAT versions in general. Moreover, when the average number of correct responses per minute was used as the test score, the between-mode correlation coefficients for the coding speed subtest remained low to moderate when corrected for attenuation, while the within-mode correlations for both subtests and the between-mode correlations for the numerical operations subtest were high. Possible explanations provided by the authors were that (a) "marking a bubble" on an answer sheet in a P&P test and "pressing a button" to enter an answer on a CAT may require different motor skills (p. 33); and (b) keeping track of the location of the items presented as a group was part of the task in the highly-speeded P&P test, while it was not the case for the CAT version, where items were displayed one by one on a computer screen (pp. 31-32).

Results of Mead and Drasgow's (1993) meta-analysis concurred with Greaud and Green's (1986) findings regarding potential presentation mode effects on speeded test performance. In their meta-analysis of 159 correlations obtained in the previous mode of cognitive ability assessment presentation studies, Mead and Drasgow found that, after correcting for measurement error, the estimated cross-mode correlations were .97 and .72 for timed power tests and speeded tests, respectively. Based on these results, the authors concluded that mode of presentation may affect speeded tests but not timed power tests. Susceptibility of speeded tests to presentation mode effects, however, was not supported by Neuman and Baydoun (1998). In their study of mode effects on a speeded clerical test, consistent high cross-mode correlations were found between the P&P and computer modes for the instrument's subtests, and a structural equation modeling suggested that the constructs being measured in the P&P and CBT versions of the tests were equivalent.

Another source of a task change may be differences in test administration conditions across modes of presentation. Spray, Ackerman, Reckase, and Carlson (1989) argued that presentation mode effects on test performance found in previous research may be partly due to differences in the flexibility of test administration conditions. In their comparative study of P&P and CBT versions of three end-of-unit tests for the Ground Radio Repair Course at a Marine Corps Communication-Electronics School, Spray et al. allowed test takers to skip items and to review and change answers after completing the test. This is not permitted on many other computerized tests. As a result, mean scores and cumulative score distributions for the raw scores across modes on this test were not significantly different between the P&P and computerized testing groups. Additionally, no item bias due to presentation mode effects was found. Based on their findings, the authors concluded that P&P and computer-based test results would be equivalent when the same test-taking condition flexibility is maintained across modes.

Psychometric Equivalence of Conventional and Computerized Tests

Criteria for Equivalence Between P&P and Computerized Tests. In response to the growth of interest in converting conventional P&P tests to computerized forms in cognitive ability assessment over the last two decades, the 1985 version of the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1985) raised the concern for the parallelism of test forms when conventional and computerized tests are used interchangeably. A year later, the American Psychological Association published Guidelines for Computer-based Tests and Interpretations (APA, 1986), which set forth the widely used criteria for achieving psychometric equivalence of P&P and computerized tests (Bugbee, 1996; Mead & Drasgow, 1993). The Guidelines specifies the psychometric equivalence of P&P and computerized tests as follows:

When interpreting scores from the computerized versions of conventional tests, the equivalence of scores from computerized versions should be established and documented before using norms or cutting scores obtained from conventional tests. Scores from conventional and computer administrations may be considered equivalent when (a) the rank orders of scores of individuals tested in alternative modes closely approximate each other, and (b) the means, dispersions, and shapes of the score distributions are approximately the same, or have been made approximately the same by rescaling the scores from the computer mode. (p. 18)

In the Guidelines, criterion "a" is considered to be a prerequisite for achieving psychometric equivalence of P&P and computerized tests, while rescaling methods can be used to place the P&P and computerized test scores into the same scale when criterion "b" is not met. Conversely, if "b" but not "a" is met, then test forms cannot be equivalent, despite the fact that the two tests can still be transformed to have similar distributions.

Some variations of these criteria also exist. After reviewing previous mode of presentation studies on CBTs in particular, Bugbee (1996) claimed that the equivalence criteria could be altered, depending on how a CBT is used. For example, if a CBT is used as an alternative for a conventional form, then demonstrating high correlation and nearly equal means and variances between the modes may suffice. If a CBT is to be used as an exchangeable form, however, then satisfying the criteria for parallel tests in the Classical Test Theory, which requires equal means and standard deviations across modes and equal correlations with a criterion measure, should be pursued.

In addition to the criteria suggested by the Guidelines, Steinberg, Thissen, and Wainer (1990) described how structural equation modeling can be used for investigating the equivalence of the number and loading of latent factors across modes for construct validation of CATs. Quite a few studies have utilized factor analysis or structural equation modeling approaches in order to investigate factorial similarity as part of the cross-mode equality requirements (Green, 1988; Moreno, Wetzel, McBride, & Weiss, 1984; Neuman & Baydoun, 1998; Staples & Luzzo, 1999; Van de Vijver & Harsveld, 1994). For example, in a study of the ASVAB, Green (1988) gave P&P and CAT versions of the test to 1,500 Navy recruits and conducted an exploratory factor analysis to compare the underlying factor structure of the two forms of the test. Due to the similarity of the obtained underlying factor structures, Green (1988) concluded that construct validity of the CAT version of the ASVAB seemed to be supported.

Stability of Item Parameter Estimates. Green, Bock, Humphreys, Linn, and Reckase (1984) and Henning (1991) argue that there is no guarantee that item parameter estimates, such as item difficulty and discrimination, will remain constant across modes. A promising strategy would be to recalibrate item parameters when sufficient data become available from a CAT to see if the P&P estimates are invariant across modes. Then, items with unstable parameter estimates can be reconsidered. For example, Stone and Lunz (1994) examined stability of item parameter estimates for multiple-choice items in a medical technologist certification exam administered by the Board of Registry. The item parameter estimates in this study were obtained by using item response theory (IRT), which specifies by a mathematical function the relationship between the observed examinee performance and the unobservable examinee abilities considered to underlie it (Hambleton & Swaminathan, 1985). When the equivalence of item difficulty parameters obtained in the P&P and CAT forms of the certification exam was evaluated in terms of the standardized differences, Stone and Lunz found that, although the text-only items showed a strong trend of parameter estimation equivalence, items with graphics tended to be less stable than text-only items. Further investigation of the items suggested that the significantly different difficulty estimates obtained across modes seemed to be accounted for by different picture quality as well as by image and character sizes used across the CAT mode.

Linking Tests Across Modes. When a computerized test is used as an alternative or replacement for a conventional test, the score relationship between the computerized and conventional test forms must be established. This can be achieved in two steps. First, qualitative and quantitative analyses of equivalence of the construct being measured and psychometric properties between the test forms must be examined. Second, if sufficient evidence based on the analysis supports equivalence of the test forms, then the forms can be placed onto the same scale and appropriately linked using conventional test equating methods (Staples & Luzzo, 1999). A variety of test equating methods have been used in conventional tests to link separate forms of a test built to the same test specifications, but the stringent criteria required for equating are not likely to be satisfactorily met when equating is attempted across modes. A concern here is that a CAT has a different pattern of measurement accuracy from a conventional non-adaptive test. A CAT is often designed to have equal measurement accuracy across a score scale, while a conventional test is not (Green et al., 1984; Kolen & Brennan, 1995). This is one of the main reasons why researchers have questioned the feasibility of equating a CAT to a P&P test. This point was challenged, however, by Green et al. (1984), Wainer (1993), and Wainer, Dorans, Green, Mislevy, Steinberg, and Thissen (1990), who argued that, since ideal test equating, as defined by traditional testing literature, may not be achieved across modes, calibrating rather than equating of test scores should be sought.

Linn's (1993) definition and description of calibrating as a less stringent form of test linking method, as compared to test equating, could be employed for establishing a statistical relationship between conventional and computerized tests. According to Linn, calibrating, unlike equating, can be applied to link tests designed to measure the same construct but not built to the same specification and associated with different score reliability patterns. Moreover, failing to satisfy the stringent equating assumptions does not keep researchers from employing conventional statistical equating methods. The differences between equating and calibrating are that the calibrated forms cannot be exchangeable, and the potential decrease in stability of test linking results across time and samples requires close monitoring of calibration results. By using Linn's criteria above, mathematics achievement scores obtained from statewide tests and the National Assessment of Educational Progress (NAEP) have been successfully calibrated by means of an equating method called equipercentile equating (Ercikan, 1997; Linn & Kiplinger, 1995). Other large-scale testing programs have also calibrated CATs and P&Ps, utilizing various conventional equating methods (e.g., Lunz & Bergstrom, 1995; Segall, 1997). For interpreting these test calibrating results, Pommerich, Hanson, Harris, and Sconing's (2000) guidelines for interpreting linking results between tests built to different test specifications would be useful.

Impact of Introduction of Computerized Tests to Examinees

Interaction of Examinee Characteristics and Testing Conditions. Another concern related to construct validity of CBTs is the effect of examinee backgrounds on test performance and attitudes toward new forms of language tests. Investigation of these issues is important because a test score obtained from a computerized test should reflect the construct of interest only. That is, if the test score represents both language ability and computer familiarity, for example, then valid generalization of test scores across modes is no longer possible. A set of studies has focused on computer familiarity and its potential effects on performance on CBTs and CATs.

Oltman (1994) investigated the effects of complexity of mouse manipulation on performance in reading and math subtests of the Computer-Based Academic Skills Assessments for the Praxis Series: Professional Assessment for Beginning Teachers. Two types of mouse manipulation were required by the tasks involved: "simple" (items which require a single click to mark an answer) and "complex" (items which require more than one click to mark an answer). The reading and math subtests were given to 333 minority (Hispanic, Native American, and Black) and 148 white university students who were not experienced computer mouse users. An ANOVA analysis showed a significant interaction effect of ethnic group and task type, suggesting that minorities, who took longer and scored lower than white students, were affected by the complexity of the task types. However, the interaction effect accounted for only 1.2% of the total variance, which led Oltman to conclude that the difference was statistically significant but not so pronounced as to be considered of practical importance.

Taylor, Jamieson, Eignor, and Kirsch (1998) conducted a large-scale study that investigated the effects of computer familiarity on examinees' performance on the CBT version of the TOEFL after providing examinees with computer familiarity training and making adjustments for the P&P TOEFL ability level. A CBT version of the TOEFL was administered at 12 worldwide sites to a sample of TOEFL examinees, which was comparable to the examinee population of the operational TOEFL. The examinees were classified into either "computer familiar" or "computer unfamiliar" groups based on their responses to a computer familiarity scale (Eignor, Taylor, Kirsch, & Jamieson, 1988; Kirsch, Jamieson, Taylor, & Eignor, 1998). Because of the extremely large sample size used, which makes even a small difference in means statistically significant, the authors evaluated Cohen's (1988) practical importance measure as well as results of statistical tests of significance. Results differed depending on how language ability was treated. Before adjustments were made for ability, differences on performance between the familiarity groups were statistically and practically significant. However, after adjustments were made for ability as measured by the P&P TOEFL, only the examinee background (number of times TOEFL taken) significantly interacted with computer familiarity on the TOEFL reading subtest, barely reaching practical significance. The effect of familiarity estimated by an alternative differential item functioning approach was an average of 1.3 point difference on the TOEFL total score. Thus, the researchers concluded that computer familiarity does not play a major role in CBT TOEFL performance.

In terms of examinees' reactions to new forms of testing, Madsen's (1991) study is one of the few studies that provides details based on a self-report questionnaire. Madsen administered an attitude questionnaire on the CAT version of an ESL placement test of reading, structure and listening at Brigham Young University. He found that although students' reactions to the new test were generally positive, differences in attitudes were observed across language groups. Spanish speakers in his study reported that it was easier to read on the computer screen than in print and that they were interested in and willing to take the CAT in the future. On the other hand, Japanese students' reactions were rather negative. They claimed that it was more difficult to read on the screen and reported anxiety about taking the CAT, even though the Japanese subjects were "more experienced" users of computers than the Spanish-speaking students. Thus, Madsen concluded that experience with computers does not reduce test anxiety, and effects of examinee language background on affect must be investigated more closely.

Comparability of Decisions. Since mode of presentation may also affect decisions made about examinees, comparability of decisions, therefore, must be investigated as part of the effect of mode of presentation as well. Some equating studies for large-scale testing programs have addressed this issue. Segall's (1997) equating study for the ASVAB involved an investigation of differential item functioning across gender and ethnic background of U.S. military applicants, based on equated scores and a calculation of a series of conditional probabilities. The purpose of the study was to see what proportion of female applicants would be affected by selection decisions based on the concurrent use of CAT and P&P forms. Lunz and Bergstrom (1995) also investigated the relationship between cut score and standard errors of IRT ability estimates to calculate how many examinees of the Board of Registry medical technologist certification exam would change their pass/fail status depending on the mode of presentation. It was found that such a small portion of examinees was affected in each case (0.07% and 2.2% in the ASVAB and Board of Registry studies, respectively) that the effect of mode of presentation on the selection and certification decisions based on the tests was not of practical importance.

Examples of such decision-making comparisons can also be seen in language assessment placement testing. Hicks (1986) investigated the comparability of the Multi-level TOEFL (a form of CAT) and the conventional P&P TOEFL. The within-level Pearson correlations between the Multi-level and P&P TOEFL scores after correction for attenuation were high, ranging from .79 to .95, when a strict branching criterion was used. Moreover, placement of examinees into three different levels was highly similar across the two modes. Hicks therefore concluded that the examinees were assigned to their appropriate levels when the items in the Multi-level TOEFL were branched into levels, using P&P TOEFL as the criterion. This also suggests that virtually the same information was obtained by administering the Multi-level TOEFL.

Contrary to Hicks (1986), however, Stevenson and Gross (1991) found that placement decisions were considerably altered for a locally-developed standardized ESL placement test targeted at grade school pupils in the Montgomery County public school district in Maryland. The results of the study showed that the CAT version generally placed the students into higher working levels than the conventional version, while rank ordering of the students was similar across modes. Stevenson and Gross interpreted the observed difference as favorable, attributing the change to the dramatically higher CAT performance of the 6th and 7th graders who were previously disadvantaged by taking a common P&P test, which included items too difficult for all grade levels.

Finally, Fulcher (1999) addressed potential presentation mode effects on placement decisions made for an ESL placement test, which was intended to place candidates into upper intermediate and advanced ESL courses at a UK university. As part of his analysis of the 80-item multiple-choice grammar test given as P&P and Web-based forms, Fulcher utilized an ANCOVA analysis with the P&P score as the covariate. The purpose of the study was to investigate potential biases on CBT performance associated with candidates' computer familiarity, attitudes toward taking tests on the Internet, and background information (age, gender, L1, and field of study). As a result, a significant main effect on the candidates' CBT performance was found only for L1. Meanwhile, separate one-way ANOVA analyses of the P&P and CBT tests with the final placement groups as the independent variable revealed that mean scores of the final placement groups were significantly different on the CBT, while this was not the case for the P&P form. The above findings indicated that the CBT provides better information for placement decisions, but also that the CBT may place certain L1 groups (East Asian students in this case) into lower levels.

Summary of Assessment Literature

Although the criteria used for assessing the equivalence of test forms across modes seem to be sufficiently standardized with the Guidelines (APA, 1986) as the base, the empirical findings as to comparability of conventional and computerized tests are rather mixed. On one hand, the reported stability of parameter estimates and factorial similarity of test forms across modes may suggest that the construct being measured by the tests administered in conventional and computerized forms are comparable. On the other hand, however, the effect of examinees' characteristics, such as computer familiarity, does not seem to manifest itself in test scores. Moreover, linking tests across modes seems to be feasible when sufficient care is taken with regard to the content comparability of tests across modes and interpretation of test linking results. Empirical findings regarding effects of mode of presentation on speeded tests performance, however, are mixed. Since L2 reading tests used as selection, diagnostic, achievement, and placement tests, for example, are often designed as timed power tests, findings related to speeded tests may not be an issue for L2 reading tests. Moreover, inconsistencies can be seen in the impact of computerization of tests on placement decisions based on conventional and computerized tests. The seriousness of such inconsistencies should be evaluated in terms of the stakes of a test in the local context. As Fulcher (1999) argued, potential misplacement of candidates in lower levels may not be a source of great concern in ESL contexts, because such misplacements can often be detected; and necessary arrangements can quickly be made by language instructors.

MODE OF PRESENTATION AND READING

Unfortunately, little empirical investigation on the effects of mode of presentation on reading comprehension has been done in L2 reading research. Yessis (2000) addressed cross-mode L2 reading performance differences in reading rate in an advanced ESL course at a North American university. In his study, 44 undergraduate and graduate students participated in weekly timed and paced reading exercises on paper, while another 9 students performed these exercises on computer. Toward the end of the quarter, the participants read two 1,000-word passages at the 8th grade readability level, one on paper and the other on computer, and answered 10 multiple-choice reading comprehension questions after each passage. A series of mixed model regression analyses showed that when the order of presentation mode and passages were counterbalanced and language ability differences were controlled by entering the participants' ESL placement scores into the equation, the mode differences on comprehension and speed were not significant. Moreover, while the computer practice group read more slowly than the paper practice group on the second occasion, they performed significantly better. Yessis pointed out that the observed performance differences between practice groups might be due to differences in practice conditions. Specifically, the paper practice group followed the pace set by their instructors, while the computer practice group was allowed to set their own pace. This might have led the computer practice group to focus more on the content, as compared to the paper practice group. Based on the participants' responses to a computer attitude questionnaire, Yessis also found that a positive attitude was a significant predictor of better comprehension, but not of reading speed. Finally, a chi-square analysis of pausal protocols of 9 students who participated in a follow-up study showed that frequencies of various reading strategies used by the participants were not significantly different across the modes of presentation.

Although Yessis' (2000) study provides an insight into how L2 reading process may or may not be affected by presentation mode, other empirical L2 reading studies that would allow us to evaluate Yessis' findings are not available in the L2 reading literature.

Accuracy and Speed of Reading in a First Language

An extensive body of literature in ergonomics, education, psychology, and L1 reading has directly compared text information processing of computer-based and paper-based text reading in a first language. The studies to be reviewed here are deemed to have implications for construct validation of CBTs in particular because the computer-based reading items utilized in the studies were not adaptive to participants' reading ability.

Dillon (1992) extensively reviewed ergonomic studies on the effect of mode of presentation (paper vs. computer screen) on reading. He classified numerous studies according to their focus of investigation (outcome or process) as well as factors that potentially accounted for often-reported differences in reading outcome and process across modes. These are outlined in Tables 1 and 2.

Table 1. Factors Previously Investigated in Mode of Presentation Research (Dillon, 1992)

Factors


Definition / Description

Outcome measures

Reading speed


task completion time

Accuracy of reading


accuracy of proofreading (e.g., identification of spelling mistakes)

Fatigue


visual fatigue and eye strain

Comprehension


level of reading comprehension of texts

Preference


paper vs. computer presentation of texts

Process measures

Eye movement


frequency and duration of eye fixation

Manipulation


manipulation techniques (e.g., turning pages with fingers; placing a finger as a location aid; flipping through pages while browsing through a document)

Navigation


devices that let the reader know the present location in the document (e.g., table of contents)

Table 2. Factors That Potentially Account for the Differences in Reading Outcome and Process Across Modes (Dillon, 1992)

Factors


Definition / Description

Basic ergonomic factors

Orientation


orientation of text/screen presentation (e.g., vertical vs. horizontal)

Visual angle


angle created by the length of lines presented on the computer screen and the distance between the screen and the reader's eyes

Aspect ratio


ratio of width to height of computer displays

Dynamics


screen filling style and duration (e.g., rate and direction of text scrolling)

Flicker


frequency of scanning phosphor surface of screen to generate a character that is apparently stable

Image polarity


positive image polarity (dark characters presented on a light background) vs. negative polarity (light characters presented on a dark background)

Display


fonts (e.g., character size, line characteristics spacing, character spacing)

Anti-aliasing


effect of adding various gray levels to individual characters in order to perceptually eliminate the jagged appearance of edges of characters to display sharp continuous characters

User characteristics


degree of user familiarity with computer systems, reading speed, reading strategy and susceptibility to external stress

Interaction of display characteristics


interaction of the above variables

Manipulation facilities

Scrolling vs. paging


scrolling (the ability to move the text up and down on the screen smoothly by a fixed increment to reveal information currently out of view) vs. paging (the ability to move text up and down in complete screens in a manner similar to turning pages of printed texts)

Display size


number of lines that can be displayed on a computer screen at one time

Text splitting across screens


splitting of paragraphs mid-sentence across successive screens

Window format


single vs. multi-window format (whether two windows can be simultaneously presented to display different parts of a single document)

Search facilities


various means of manipulating and locating information in a document (e.g., word/term searches, checking references, locating relevant sections)

Input devices


tracker ball, mouse, function keyboard, joystick, light pen, etc.

Icon design


facilities that allow rapid and easy manipulations of the text as well as access to the document through numerous routes (e.g., boxes, arrows, circles, buttons, etc.)

The main conclusions of Dillon's literature review can be summarized as follows:

1. It is difficult to draw any firm conclusions from empirical findings based on the studies reviewed due to various concerns, such as the limited scope of the studies, the unique nature of the procedures used, the unclear participant selection criteria, insufficient control of variables of interest, and the use of unrealistic reading tasks (e.g., proofreading for misspelling). However, the literature review suggests that reading from computer screens is, in fact, different from reading in print and that reading computer-presented texts generally takes longer than reading printed materials.

2. The effects of mode of presentation on process measures listed in Table 1 are not yet clear because no adequate empirical method to measure reading processes has yet been established.

3. Differences between modes seem to be caused by interactions of individually non-significant effects, and it is, therefore, impossible to attribute differences to any single factor. Moreover, in a long text that does not fit into one screen and therefore requires scrolling or paging, factors that determine the quality of visual image presented to readers as well as availability and quality of text manipulation facilities listed in Table 2 become important.

One of the limitations of the ergonomics studies reviewed by Dillon (1992) is that many of them studied proofreading rather than reading comprehension, while reading comprehension is more relevant to language assessment. Additional empirical studies utilizing reading comprehension tasks have been primarily conducted in psychology, education, and L1 reading research, nine of which are listed in the appendix. These studies were selected for review here because they included (a) experimental conditions for paper-based and computer-based reading conditions without manipulation or navigation facilities available only on computers (which are feasible for relatively long texts and often beyond the scope of language assessment), and (b) reading comprehension and/or reading speed as dependent variables, which are widely studied as outcome measures of information processing in mode of presentation. Most of these studies were conducted in the 1980s. Due to the advancement of computer technology in the past two decades, use of currently available equipment may yield different results from the studies cited here. However, more recent empirical studies meeting the above selection criteria were not available. The studies, therefore, will be reviewed here to provide a historical perspective on mode effects in reading performance. This will also provide baseline information for considering future research designs. Factors of focus, procedures used, and main findings of the studies are summarized in the appendix.

As shown in the appendix, these studies were conducted in widely different conditions, and the following issues should also be remembered when interpreting their results:

1. Ergonomic factors. Various ergonomic issues raised by Dillon (1992), such as display characteristics (e.g., character fonts, size, and line spacing) and features of computer displays (e.g., display size, resolution, image polarity, upper-case only or mixed character use, flicker, and orientation) involved in these studies seem to be varied, but such features were not always reported with sufficient details.

2. Time limit. Heppner et al.'s (1985) research is the only study that conducted the experiment under a timed conventional testing condition. None of the other studies set a time limit.

3. Characteristics of participant. Four of the studies (Feldmann & Fish, 1988; Reinking, 1988; Reinking & Schreiner, 1985; Zuk, 1986) incorporated grade-school children, while the other studies consisted primarily of traditional age college students or older learners. Moreover, participants' backgrounds as to computer familiarity were mixed in these studies; and descriptions of their language background were not provided.

4. Characteristics of reading texts and tasks. Lengths of reading texts ranged from 90-200 words per passage at the shortest (Fish & Feldmann, 1987) to a chapter of an introductory psychology textbook at the longest (McGoldrick et al., 1992). Comprehension tasks involved information search in a textbook chapter (McGoldrick et al., 1992) as well as reading for details and general semantic content in reading passages of conventional lengths often found in reading texts. Moreover, multiple-choice was the preferred item format, although some studies utilized open-ended or short-answer reading comprehension questions (McGoldrick et al., 1992) or form-completion tasks (Feldmann & Fish, 1988; Fish & Feldmann, 1987) as well. Availability of text while answering comprehension questions also differed across studies. Some studies allowed reviewing the text while responding to the questions (Heppner et al., 1985; McGoldrick et al., 1992), while others did not (Belmore, 1985; Reinking, 1988; Reinking & Schreiner, 1985).

5. Characteristics of experimental designs. Only two of the studies (Reinking, 1988; Zuk, 1986) counter-balanced the order of text presentation, while the others either presented the texts in the same order or did not report whether the order effect was controlled.

6. Definition of reading speed. Belmore (1985) and Reinking (1988) measured time spent on reading assigned texts only. Others included time required to complete the reading comprehension tasks as well (Fish & Feldmann, 1987; McKnight et al., 1990; Zuk, 1986).

7. Distracters. The main focus of Zuk's (1986) study was to investigate elementary school children's attention to reading tasks. Thus, a Walt Disney cartoon was played continuously as a distracter while the 3rd and 5th graders worked on the reading tasks in his study.

The findings of these studies are as follows. In terms of the level of reading comprehension, six studies out of the nine reported that comprehension level was similar across the modes (Feldmann & Fish, 1988; Fish & Feldmann, 1987; McGoldrick et al., 1992; McKnight et al., 1990; Reinking, 1988; Zuk, 1986), while one favored paper (Heppner et al., 1985), and two showed interactions -- one with the passage (Belmore, 1985) and the other with the text difficulty and the type of text manipulation (Reinking & Schreiner, 1985). The similarity of reading comprehension level across the modes is consistent with the finding of Dillon's literature review described above. Meanwhile, it is difficult to interpret the results of the two studies that showed interaction effects between mode of presentation and other factors. In Belmore's (1985) study, comprehension of the first set of passages favored print, but the effect disappeared for the second set. As pointed out by Belmore, the fixed order of passage presentation makes it difficult to separate potential order and/or practice effects. In Reinking and Schreiner's (1985) study, 5th and 6th graders scored lower on passages designated to be easier based on standard readability formulas than the other set of passages, which had higher estimates of readability. This may suggest, as pointed out by the authors, that text characteristics not captured by readability formulae may have affected the text difficulty.

Findings on reading speed in the studies are rather mixed. In three studies (Belmore, 1985; McGoldrick et al., 1992; Zuk, 1986), reading took longer on screen than on paper; three others reported that reading rates were not significantly different across modes (Feldmann & Fish, 1988; Fish & Feldmann, 1987; McKnight et al., 1990); and two studies (Belmore, 1985; Fish & Feldmann, 1987) reported a gain in computer-based reading speed as the experiments proceeded, indicating that after a reasonable amount of exposure to the screen-based reading tasks, the effect of mode on reading rate may diminish.

Although only three of the studies in the appendix that investigated reading speed reported that reading from computer screens was slower than reading from print, quite a few studies, including those reviewed by Dillon (1992), replicated the result favoring print in the effect on reading speed. Some studies attempted to explain why this might be the case.

First, Oborne and Holton (1988) attributed the often-reported differences in reading speed to insufficient control of extraneous variables in previous empirical studies. When they controlled orientation of text presentation, retinal distances from the computer screens, image polarity, and page layout, no significant differences were found in either reading speed or in comprehension scores, regardless of the mode and image polarity. However, the strict control of extraneous variables in this study makes it difficult to generalize the results to real-life reading contexts. For example, it is unlikely that in real life readers would use book stands to vertically present the printed text, or to keep an equal distance between their retinas and the texts across modes, as attempted in this study. It has been widely accepted by ergonomists that computer-presented texts are read at greater distances than conventional paper text (Dillon, 1992; Gould, Alfaro, Finn, Haupt, & Minuto, 1987).

Second, limitations in the research methodology used in previous studies may be another source of the observed reading rate differences across modes. Hansen, Doung, and Whitlock (1978) investigated how subjects in their study spent time while taking a computer science computer-based test. Although these results may not be directly applicable to research on reading performance, the authors' explanations on why their subjects took longer to complete the CBT deserve closer attention. In their study, 7 participants took a computer-based test on introductory computer science. Four of them were videotaped. There were two sources of differences in the time spent by the two groups: (a) computer system requirements, that is, time used to go back to the table of contents to select the next task and time taken by the computer to generate problems and display them; and (b) participants' unfamiliarity with computers. These factors may no longer be relevant, considering the powerful computers available and characteristics of computer users in the 21st century. However, it is worth noting that the 4 participants who were videotaped in Hansen's study expressed discomfort with the testing condition and took significantly longer to finish the test than those who were not videotaped. Moreover, when participants' answers were marked on the screen, they were afraid that their answers would be seen by the proctor. The authors suspected that this might have contributed to the longer work time of the videotaped participants, one of whom reported in the post-hoc questionnaire that "…with PLATO you are 'broadcasting' your answer to the world" (Hansen et al., 1978, p. 514). Although the use of videotapes by Hansen et al. provided valuable information as to why the CBT took longer, employment of videotaping must be reconsidered since it might be intrusive for participants; and such discomfort could seriously affect the reliability of the data.

As a possible third explanation, a series of extensive empirical studies that focused on the image quality of text presented on computer screens seem to imply that graphic quality of texts may affect early stages of visual information processing rather than later cognitive information processing; and an improved image quality may, therefore, facilitate reading rate on computers. IBM researchers investigated a wide range of variables, which could account for reading rate differences (Gould, Alfaro, Barnes, Finn, Grischkowsky, & Minuto, 1987; Gould, Alfaro, Finn, et al., 1987). Gould, Alfaro, Barnes, et al. (1987) investigated the effects of potentially important variables, such as task variables (e.g., paper orientation and visual angle), display variables (e.g., dynamic characteristics of CRT displays, quality of CRT displays, image polarity and fonts), and reader characteristics (e.g., familiarity with computer-based reading and age), on proofreading and reading comprehension performance independently in 10 separate quasi-experimental studies utilizing ANOVA designs. They failed to find any single variable that was strong enough to account for the rather sizable reading rate differences of approximately 25%, which were found in previous research when variables were studied separately. In six experiments, Gould, Alfaro, Finn, et al. (1987) focused on independently or simultaneously manipulating image quality factors, which were selected based on the above experiment results, such as character font and size, polarity, anti-aliasing, page layout, screen resolution and flicker. These authors concluded that the combination of positive image polarity, high display resolution, and use of anti-aliasing seemed to have contributed to eliminating the reading rate differences across the modes, suggesting that the image quality may play a crucial role.

The studies by Gould and his associates share the same concern as those reviewed by Dillon, however. Most of the reading tasks used were very short proofreading tasks looking for misspelling. The extent of relevancy of the proposed combination of image quality variables for reading comprehension tasks was thus obscured. For example, Feldmann and Fish's (1988) study reviewed in the appendix provides counter evidence to those of Gould and his associates. In Feldmann and Fish's study, computer-based reading comprehension tasks were presented only in upper case with negative polarity on a then-commercially available computer display, the quality of which was undesirable, according to Gould and his associates. Even under this condition, a rate and comprehension difference across modes was not found.

Furthermore, a study conducted by Ziefle (1998) challenged the position of Gould and his associates that performance differences may diminish when screen resolution is improved. When the same computer monitor was used across experimental conditions and when variables associated with character sets (size and color of fonts and backgrounds) were strictly controlled, Ziefle found that both proofreading speed and accuracy were still superior in the paper condition. Computer monitors that display text of equal or better quality, as compared with those used in the Ziefle et al.'s experiments, are commercially available already. The mixed results of the above studies seem to suggest, however, that even state-of-the art computer technology, where the use of high resolution monitors with negative polarity and anti-aliasing has quickly become a standard, may not provide the comfort of paper-based reading.

Summary of Mode of Presentation and Reading Literature

The general trends found in these studies indicate that comprehension of computer-presented texts is, at best, as good as that of printed texts, and that reading speed may or may not be affected by mode of presentation. Unlike studies conducted in the 1980s, issues such as participants' familiarity with computers and then-current computer system requirements, which made computer presentations of text slow, are quickly becoming less of a concern because of rapid advancements in computer technology. Other explanations that are still pertinent in the 2000s for differential performance between paper-based and computer-based reading proposed in previous studies included insufficient control of extraneous variables, uncomfortable test-taking conditions induced by videotaping during test sessions, and the graphic qualities of text as well as their effects on visual information processing. Although the methodological concerns raised by previous researchers will facilitate the design of future studies, strict control of extraneous variables may limit the generalization of research findings to practical test-taking conditions. Moreover, the mixed results obtained regarding visual explanations suggest that discussion along this line is still inconclusive.

DISCUSSION

Several conceptual and empirical issues raised in the course of the development of mode of presentation research deserve further consideration. Belmore (1985) and Oborne and Holton (1988) explicitly questioned attempts made in previous studies to closely replicate paper-based reading conventions in computer-based conditions. For example, Belmore pointed out that computers are usually introduced in education with the expectation that they would enhance learning and instruction; and computer functions that are not available in text should, therefore, be incorporated whenever existing instructional material is computerized. For example, experiments conducted by Reinking (1988) and Reinking and Schreiner (1985) included two
ABSTRACT

Computerization of L2 reading tests has been of interest among language assessment researchers for the past 15 years, but few empirical studies have evaluated the equivalence of the construct being measured in computerized and conventional L2 reading tests and the generalizability of computerized reading test results to other reading conditions. In order to address various issues surrounding the effect of mode of presentation on L2 reading test performance, the present study reviews the literature in cognitive ability testing in educational and psychological measurement and the non-assessment literature in ergonomics, education, psychology, and L1 reading research. Generalization of the findings to computerized L2 assessment was found to be difficult: The nature of the abilities measured in the assessment literature does not necessarily involve language data; mode of presentation studies in the non-assessment literature involving L2 readers are scarce; and there are limitations in the research methodologies used. However, the literature raises important issues to be considered in future studies of mode of presentation in language assessment.

INTRODUCTION

Reading from computer screens is becoming more and more common in our daily lives as the amount of reading material available on line is rapidly increasing. This influence has been seen in the field of language assessment where computerized testing, such as computer-based tests (CBTs) and computer-adaptive tests (CATs, a special case of computer-based testing, where items administered to examinees are tailored to the individual examinee's ability on the construct being measured), are attracting the attention of researchers, language learners, and test users alike, as exemplified by the implementation of CATs at institutional levels in the past 15 years (Kaya-Carton, Carton, & Dandolini, 1991; Larson, 1987; Madsen, 1991; Stevenson & Gross, 1991; Young, Shermis, Brutten, & Perkins, 1996). Regardless of the rapid growth of demand in this area, development and implementation of this new mode of testing is currently in its initial stages. Therefore, sufficient empirical data, which would allow researchers to look into the soundness of computerized language tests with regard to construct validity and fairness, are yet to be available.

One issue which requires prompt investigation is the effect of mode of presentation on comparability of the information obtained from computerized and paper-and-pencil (P&P) tests. In their comprehensive summary of issues surrounding CATs in L2 contexts, Chalhoub-Deville and Deville (1999) point out the scarcity of comparability research in L2 language tests and the importance of conducting comparability studies in local settings to detect any potential test-delivery-medium effect when a conventional test is converted to a computerized test. In terms of L2 reading comprehension tests in particular, the current move toward computerized testing is proceeding without sufficient empirical evidence that reading from a computer screen is the same as reading in print for L2 readers. Since presence of a mode effect on reading comprehension test performance would seriously invalidate score interpretation of computerized reading tests, language assessment researchers have discussed the necessity of examining (a) the degree to which computerized reading comprehension tests measure the same construct as P&P tests and (b) the extent to which results of computerized reading tests can be generalized to other contexts (Alderson, 2000; Bachman, 2000). In order to seek future directions in investigating the effect of mode of presentation on L2 reading test performance, the present study reviews two distinct areas of previous literature: (a) studies that address general construct validity issues of computerized tests in cognitive ability as well as language assessment; and (b) studies that shed light on the effects of mode of presentation on reading performance conducted mainly in ergonomics, education, psychology, and L1 reading research.

ASSESSMENT LITERATURE

In order to support construct validity of computerized tests such that the construct being measured is not being affected by the mode of presentation, the equivalence of corresponding conventional and computerized test forms must be established from various directions. In this section, potential task changes caused by a shift to the computer administration mode will be reviewed first. Then, the criteria that have been used to evaluate cross-mode equivalence of test forms and various psychometric and statistical issues, such as stability of item parameter estimates and linking tests across modes, will be summarized. This section will close with a discussion of the impact of mode of presentation on examinees, namely, the interaction of test taker characteristics with testing conditions and the comparability of decisions made across modes.

Comparability of Tasks Across Modes of Presentation

As the first step in establishing the equivalence of computerized and conventional test forms, the content covered by the two tests should be comparable. To achieve this goal, several promising algorithms to control for content coverage have been implemented in L2 CATs in the last decade (for summaries of recent developments in content balancing algorithms, see Chalhoub-Deville & Deville, 1999, and Eignor, 1999). Even when the content coverage in a given computerized test is carefully controlled to mirror the test content specification, potential "task change" may still occur across modes of presentation, as pointed out by Green (1988). A task change is the possibility that the nature of a test task may be altered when the item is presented in a different mode, which may in turn induce unexpected changes in item difficulty. Green states, "If computer presentation changes tasks, so that the correlation between scores on the computer and conventional versions is low, then validity is threatened" (p. 78).

Greaud and Green (1986) reported low cross-mode correlations in a speeded clerical skills test, which may indicate a task change caused by a shift to the CAT format. They investigated the effect of mode of presentation on the numerical operations (NO) and coding speed (CS) subtests of the Armed Services Vocational Aptitude Battery (ASVAB) administered to applicants for the U.S. military services. Fifty college students took short versions of the two subtests. The CAT versions were completed faster by the subjects, who did better on the CAT versions in general. Moreover, when the average number of correct responses per minute was used as the test score, the between-mode correlation coefficients for the coding speed subtest remained low to moderate when corrected for attenuation, while the within-mode correlations for both subtests and the between-mode correlations for the numerical operations subtest were high. Possible explanations provided by the authors were that (a) "marking a bubble" on an answer sheet in a P&P test and "pressing a button" to enter an answer on a CAT may require different motor skills (p. 33); and (b) keeping track of the location of the items presented as a group was part of the task in the highly-speeded P&P test, while it was not the case for the CAT version, where items were displayed one by one on a computer screen (pp. 31-32).

Results of Mead and Drasgow's (1993) meta-analysis concurred with Greaud and Green's (1986) findings regarding potential presentation mode effects on speeded test performance. In their meta-analysis of 159 correlations obtained in the previous mode of cognitive ability assessment presentation studies, Mead and Drasgow found that, after correcting for measurement error, the estimated cross-mode correlations were .97 and .72 for timed power tests and speeded tests, respectively. Based on these results, the authors concluded that mode of presentation may affect speeded tests but not timed power tests. Susceptibility of speeded tests to presentation mode effects, however, was not supported by Neuman and Baydoun (1998). In their study of mode effects on a speeded clerical test, consistent high cross-mode correlations were found between the P&P and computer modes for the instrument's subtests, and a structural equation modeling suggested that the constructs being measured in the P&P and CBT versions of the tests were equivalent.

Another source of a task change may be differences in test administration conditions across modes of presentation. Spray, Ackerman, Reckase, and Carlson (1989) argued that presentation mode effects on test performance found in previous research may be partly due to differences in the flexibility of test administration conditions. In their comparative study of P&P and CBT versions of three end-of-unit tests for the Ground Radio Repair Course at a Marine Corps Communication-Electronics School, Spray et al. allowed test takers to skip items and to review and change answers after completing the test. This is not permitted on many other computerized tests. As a result, mean scores and cumulative score distributions for the raw scores across modes on this test were not significantly different between the P&P and computerized testing groups. Additionally, no item bias due to presentation mode effects was found. Based on their findings, the authors concluded that P&P and computer-based test results would be equivalent when the same test-taking condition flexibility is maintained across modes.

Psychometric Equivalence of Conventional and Computerized Tests

Criteria for Equivalence Between P&P and Computerized Tests. In response to the growth of interest in converting conventional P&P tests to computerized forms in cognitive ability assessment over the last two decades, the 1985 version of the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1985) raised the concern for the parallelism of test forms when conventional and computerized tests are used interchangeably. A year later, the American Psychological Association published Guidelines for Computer-based Tests and Interpretations (APA, 1986), which set forth the widely used criteria for achieving psychometric equivalence of P&P and computerized tests (Bugbee, 1996; Mead & Drasgow, 1993). The Guidelines specifies the psychometric equivalence of P&P and computerized tests as follows:

When interpreting scores from the computerized versions of conventional tests, the equivalence of scores from computerized versions should be established and documented before using norms or cutting scores obtained from conventional tests. Scores from conventional and computer administrations may be considered equivalent when (a) the rank orders of scores of individuals tested in alternative modes closely approximate each other, and (b) the means, dispersions, and shapes of the score distributions are approximately the same, or have been made approximately the same by rescaling the scores from the computer mode. (p. 18)

In the Guidelines, criterion "a" is considered to be a prerequisite for achieving psychometric equivalence of P&P and computerized tests, while rescaling methods can be used to place the P&P and computerized test scores into the same scale when criterion "b" is not met. Conversely, if "b" but not "a" is met, then test forms cannot be equivalent, despite the fact that the two tests can still be transformed to have similar distributions.

Some variations of these criteria also exist. After reviewing previous mode of presentation studies on CBTs in particular, Bugbee (1996) claimed that the equivalence criteria could be altered, depending on how a CBT is used. For example, if a CBT is used as an alternative for a conventional form, then demonstrating high correlation and nearly equal means and variances between the modes may suffice. If a CBT is to be used as an exchangeable form, however, then satisfying the criteria for parallel tests in the Classical Test Theory, which requires equal means and standard deviations across modes and equal correlations with a criterion measure, should be pursued.

In addition to the criteria suggested by the Guidelines, Steinberg, Thissen, and Wainer (1990) described how structural equation modeling can be used for investigating the equivalence of the number and loading of latent factors across modes for construct validation of CATs. Quite a few studies have utilized factor analysis or structural equation modeling approaches in order to investigate factorial similarity as part of the cross-mode equality requirements (Green, 1988; Moreno, Wetzel, McBride, & Weiss, 1984; Neuman & Baydoun, 1998; Staples & Luzzo, 1999; Van de Vijver & Harsveld, 1994). For example, in a study of the ASVAB, Green (1988) gave P&P and CAT versions of the test to 1,500 Navy recruits and conducted an exploratory factor analysis to compare the underlying factor structure of the two forms of the test. Due to the similarity of the obtained underlying factor structures, Green (1988) concluded that construct validity of the CAT version of the ASVAB seemed to be supported.

Stability of Item Parameter Estimates. Green, Bock, Humphreys, Linn, and Reckase (1984) and Henning (1991) argue that there is no guarantee that item parameter estimates, such as item difficulty and discrimination, will remain constant across modes. A promising strategy would be to recalibrate item parameters when sufficient data become available from a CAT to see if the P&P estimates are invariant across modes. Then, items with unstable parameter estimates can be reconsidered. For example, Stone and Lunz (1994) examined stability of item parameter estimates for multiple-choice items in a medical technologist certification exam administered by the Board of Registry. The item parameter estimates in this study were obtained by using item response theory (IRT), which specifies by a mathematical function the relationship between the observed examinee performance and the unobservable examinee abilities considered to underlie it (Hambleton & Swaminathan, 1985). When the equivalence of item difficulty parameters obtained in the P&P and CAT forms of the certification exam was evaluated in terms of the standardized differences, Stone and Lunz found that, although the text-only items showed a strong trend of parameter estimation equivalence, items with graphics tended to be less stable than text-only items. Further investigation of the items suggested that the significantly different difficulty estimates obtained across modes seemed to be accounted for by different picture quality as well as by image and character sizes used across the CAT mode.

Linking Tests Across Modes. When a computerized test is used as an alternative or replacement for a conventional test, the score relationship between the computerized and conventional test forms must be established. This can be achieved in two steps. First, qualitative and quantitative analyses of equivalence of the construct being measured and psychometric properties between the test forms must be examined. Second, if sufficient evidence based on the analysis supports equivalence of the test forms, then the forms can be placed onto the same scale and appropriately linked using conventional test equating methods (Staples & Luzzo, 1999). A variety of test equating methods have been used in conventional tests to link separate forms of a test built to the same test specifications, but the stringent criteria required for equating are not likely to be satisfactorily met when equating is attempted across modes. A concern here is that a CAT has a different pattern of measurement accuracy from a conventional non-adaptive test. A CAT is often designed to have equal measurement accuracy across a score scale, while a conventional test is not (Green et al., 1984; Kolen & Brennan, 1995). This is one of the main reasons why researchers have questioned the feasibility of equating a CAT to a P&P test. This point was challenged, however, by Green et al. (1984), Wainer (1993), and Wainer, Dorans, Green, Mislevy, Steinberg, and Thissen (1990), who argued that, since ideal test equating, as defined by traditional testing literature, may not be achieved across modes, calibrating rather than equating of test scores should be sought.

Linn's (1993) definition and description of calibrating as a less stringent form of test linking method, as compared to test equating, could be employed for establishing a statistical relationship between conventional and computerized tests. According to Linn, calibrating, unlike equating, can be applied to link tests designed to measure the same construct but not built to the same specification and associated with different score reliability patterns. Moreover, failing to satisfy the stringent equating assumptions does not keep researchers from employing conventional statistical equating methods. The differences between equating and calibrating are that the calibrated forms cannot be exchangeable, and the potential decrease in stability of test linking results across time and samples requires close monitoring of calibration results. By using Linn's criteria above, mathematics achievement scores obtained from statewide tests and the National Assessment of Educational Progress (NAEP) have been successfully calibrated by means of an equating method called equipercentile equating (Ercikan, 1997; Linn & Kiplinger, 1995). Other large-scale testing programs have also calibrated CATs and P&Ps, utilizing various conventional equating methods (e.g., Lunz & Bergstrom, 1995; Segall, 1997). For interpreting these test calibrating results, Pommerich, Hanson, Harris, and Sconing's (2000) guidelines for interpreting linking results between tests built to different test specifications would be useful.

Impact of Introduction of Computerized Tests to Examinees

Interaction of Examinee Characteristics and Testing Conditions. Another concern related to construct validity of CBTs is the effect of examinee backgrounds on test performance and attitudes toward new forms of language tests. Investigation of these issues is important because a test score obtained from a computerized test should reflect the construct of interest only. That is, if the test score represents both language ability and computer familiarity, for example, then valid generalization of test scores across modes is no longer possible. A set of studies has focused on computer familiarity and its potential effects on performance on CBTs and CATs.

Oltman (1994) investigated the effects of complexity of mouse manipulation on performance in reading and math subtests of the Computer-Based Academic Skills Assessments for the Praxis Series: Professional Assessment for Beginning Teachers. Two types of mouse manipulation were required by the tasks involved: "simple" (items which require a single click to mark an answer) and "complex" (items which require more than one click to mark an answer). The reading and math subtests were given to 333 minority (Hispanic, Native American, and Black) and 148 white university students who were not experienced computer mouse users. An ANOVA analysis showed a significant interaction effect of ethnic group and task type, suggesting that minorities, who took longer and scored lower than white students, were affected by the complexity of the task types. However, the interaction effect accounted for only 1.2% of the total variance, which led Oltman to conclude that the difference was statistically significant but not so pronounced as to be considered of practical importance.

Taylor, Jamieson, Eignor, and Kirsch (1998) conducted a large-scale study that investigated the effects of computer familiarity on examinees' performance on the CBT version of the TOEFL after providing examinees with computer familiarity training and making adjustments for the P&P TOEFL ability level. A CBT version of the TOEFL was administered at 12 worldwide sites to a sample of TOEFL examinees, which was comparable to the examinee population of the operational TOEFL. The examinees were classified into either "computer familiar" or "computer unfamiliar" groups based on their responses to a computer familiarity scale (Eignor, Taylor, Kirsch, & Jamieson, 1988; Kirsch, Jamieson, Taylor, & Eignor, 1998). Because of the extremely large sample size used, which makes even a small difference in means statistically significant, the authors evaluated Cohen's (1988) practical importance measure as well as results of statistical tests of significance. Results differed depending on how language ability was treated. Before adjustments were made for ability, differences on performance between the familiarity groups were statistically and practically significant. However, after adjustments were made for ability as measured by the P&P TOEFL, only the examinee background (number of times TOEFL taken) significantly interacted with computer familiarity on the TOEFL reading subtest, barely reaching practical significance. The effect of familiarity estimated by an alternative differential item functioning approach was an average of 1.3 point difference on the TOEFL total score. Thus, the researchers concluded that computer familiarity does not play a major role in CBT TOEFL performance.

In terms of examinees' reactions to new forms of testing, Madsen's (1991) study is one of the few studies that provides details based on a self-report questionnaire. Madsen administered an attitude questionnaire on the CAT version of an ESL placement test of reading, structure and listening at Brigham Young University. He found that although students' reactions to the new test were generally positive, differences in attitudes were observed across language groups. Spanish speakers in his study reported that it was easier to read on the computer screen than in print and that they were interested in and willing to take the CAT in the future. On the other hand, Japanese students' reactions were rather negative. They claimed that it was more difficult to read on the screen and reported anxiety about taking the CAT, even though the Japanese subjects were "more experienced" users of computers than the Spanish-speaking students. Thus, Madsen concluded that experience with computers does not reduce test anxiety, and effects of examinee language background on affect must be investigated more closely.

Comparability of Decisions. Since mode of presentation may also affect decisions made about examinees, comparability of decisions, therefore, must be investigated as part of the effect of mode of presentation as well. Some equating studies for large-scale testing programs have addressed this issue. Segall's (1997) equating study for the ASVAB involved an investigation of differential item functioning across gender and ethnic background of U.S. military applicants, based on equated scores and a calculation of a series of conditional probabilities. The purpose of the study was to see what proportion of female applicants would be affected by selection decisions based on the concurrent use of CAT and P&P forms. Lunz and Bergstrom (1995) also investigated the relationship between cut score and standard errors of IRT ability estimates to calculate how many examinees of the Board of Registry medical technologist certification exam would change their pass/fail status depending on the mode of presentation. It was found that such a small portion of examinees was affected in each case (0.07% and 2.2% in the ASVAB and Board of Registry studies, respectively) that the effect of mode of presentation on the selection and certification decisions based on the tests was not of practical importance.

Examples of such decision-making comparisons can also be seen in language assessment placement testing. Hicks (1986) investigated the comparability of the Multi-level TOEFL (a form of CAT) and the conventional P&P TOEFL. The within-level Pearson correlations between the Multi-level and P&P TOEFL scores after correction for attenuation were high, ranging from .79 to .95, when a strict branching criterion was used. Moreover, placement of examinees into three different levels was highly similar across the two modes. Hicks therefore concluded that the examinees were assigned to their appropriate levels when the items in the Multi-level TOEFL were branched into levels, using P&P TOEFL as the criterion. This also suggests that virtually the same information was obtained by administering the Multi-level TOEFL.

Contrary to Hicks (1986), however, Stevenson and Gross (1991) found that placement decisions were considerably altered for a locally-developed standardized ESL placement test targeted at grade school pupils in the Montgomery County public school district in Maryland. The results of the study showed that the CAT version generally placed the students into higher working levels than the conventional version, while rank ordering of the students was similar across modes. Stevenson and Gross interpreted the observed difference as favorable, attributing the change to the dramatically higher CAT performance of the 6th and 7th graders who were previously disadvantaged by taking a common P&P test, which included items too difficult for all grade levels.

Finally, Fulcher (1999) addressed potential presentation mode effects on placement decisions made for an ESL placement test, which was intended to place candidates into upper intermediate and advanced ESL courses at a UK university. As part of his analysis of the 80-item multiple-choice grammar test given as P&P and Web-based forms, Fulcher utilized an ANCOVA analysis with the P&P score as the covariate. The purpose of the study was to investigate potential biases on CBT performance associated with candidates' computer familiarity, attitudes toward taking tests on the Internet, and background information (age, gender, L1, and field of study). As a result, a significant main effect on the candidates' CBT performance was found only for L1. Meanwhile, separate one-way ANOVA analyses of the P&P and CBT tests with the final placement groups as the independent variable revealed that mean scores of the final placement groups were significantly different on the CBT, while this was not the case for the P&P form. The above findings indicated that the CBT provides better information for placement decisions, but also that the CBT may place certain L1 groups (East Asian students in this case) into lower levels.

Summary of Assessment Literature

Although the criteria used for assessing the equivalence of test forms across modes seem to be sufficiently standardized with the Guidelines (APA, 1986) as the base, the empirical findings as to comparability of conventional and computerized tests are rather mixed. On one hand, the reported stability of parameter estimates and factorial similarity of test forms across modes may suggest that the construct being measured by the tests administered in conventional and computerized forms are comparable. On the other hand, however, the effect of examinees' characteristics, such as computer familiarity, does not seem to manifest itself in test scores. Moreover, linking tests across modes seems to be feasible when sufficient care is taken with regard to the content comparability of tests across modes and interpretation of test linking results. Empirical findings regarding effects of mode of presentation on speeded tests performance, however, are mixed. Since L2 reading tests used as selection, diagnostic, achievement, and placement tests, for example, are often designed as timed power tests, findings related to speeded tests may not be an issue for L2 reading tests. Moreover, inconsistencies can be seen in the impact of computerization of tests on placement decisions based on conventional and computerized tests. The seriousness of such inconsistencies should be evaluated in terms of the stakes of a test in the local context. As Fulcher (1999) argued, potential misplacement of candidates in lower levels may not be a source of great concern in ESL contexts, because such misplacements can often be detected; and necessary arrangements can quickly be made by language instructors.

MODE OF PRESENTATION AND READING

Unfortunately, little empirical investigation on the effects of mode of presentation on reading comprehension has been done in L2 reading research. Yessis (2000) addressed cross-mode L2 reading performance differences in reading rate in an advanced ESL course at a North American university. In his study, 44 undergraduate and graduate students participated in weekly timed and paced reading exercises on paper, while another 9 students performed these exercises on computer. Toward the end of the quarter, the participants read two 1,000-word passages at the 8th grade readability level, one on paper and the other on computer, and answered 10 multiple-choice reading comprehension questions after each passage. A series of mixed model regression analyses showed that when the order of presentation mode and passages were counterbalanced and language ability differences were controlled by entering the participants' ESL placement scores into the equation, the mode differences on comprehension and speed were not significant. Moreover, while the computer practice group read more slowly than the paper practice group on the second occasion, they performed significantly better. Yessis pointed out that the observed performance differences between practice groups might be due to differences in practice conditions. Specifically, the paper practice group followed the pace set by their instructors, while the computer practice group was allowed to set their own pace. This might have led the computer practice group to focus more on the content, as compared to the paper practice group. Based on the participants' responses to a computer attitude questionnaire, Yessis also found that a positive attitude was a significant predictor of better comprehension, but not of reading speed. Finally, a chi-square analysis of pausal protocols of 9 students who participated in a follow-up study showed that frequencies of various reading strategies used by the participants were not significantly different across the modes of presentation.

Although Yessis' (2000) study provides an insight into how L2 reading process may or may not be affected by presentation mode, other empirical L2 reading studies that would allow us to evaluate Yessis' findings are not available in the L2 reading literature.

Accuracy and Speed of Reading in a First Language

An extensive body of literature in ergonomics, education, psychology, and L1 reading has directly compared text information processing of computer-based and paper-based text reading in a first language. The studies to be reviewed here are deemed to have implications for construct validation of CBTs in particular because the computer-based reading items utilized in the studies were not adaptive to participants' reading ability.

Dillon (1992) extensively reviewed ergonomic studies on the effect of mode of presentation (paper vs. computer screen) on reading. He classified numerous studies according to their focus of investigation (outcome or process) as well as factors that potentially accounted for often-reported differences in reading outcome and process across modes. These are outlined in Tables 1 and 2.

Table 1. Factors Previously Investigated in Mode of Presentation Research (Dillon, 1992)

Factors


Definition / Description

Outcome measures

Reading speed


task completion time

Accuracy of reading


accuracy of proofreading (e.g., identification of spelling mistakes)

Fatigue


visual fatigue and eye strain

Comprehension


level of reading comprehension of texts

Preference


paper vs. computer presentation of texts

Process measures

Eye movement


frequency and duration of eye fixation

Manipulation


manipulation techniques (e.g., turning pages with fingers; placing a finger as a location aid; flipping through pages while browsing through a document)

Navigation


devices that let the reader know the present location in the document (e.g., table of contents)

Table 2. Factors That Potentially Account for the Differences in Reading Outcome and Process Across Modes (Dillon, 1992)

Factors


Definition / Description

Basic ergonomic factors

Orientation


orientation of text/screen presentation (e.g., vertical vs. horizontal)

Visual angle


angle created by the length of lines presented on the computer screen and the distance between the screen and the reader's eyes

Aspect ratio


ratio of width to height of computer displays

Dynamics


screen filling style and duration (e.g., rate and direction of text scrolling)

Flicker


frequency of scanning phosphor surface of screen to generate a character that is apparently stable

Image polarity


positive image polarity (dark characters presented on a light background) vs. negative polarity (light characters presented on a dark background)

Display


fonts (e.g., character size, line characteristics spacing, character spacing)

Anti-aliasing


effect of adding various gray levels to individual characters in order to perceptually eliminate the jagged appearance of edges of characters to display sharp continuous characters

User characteristics


degree of user familiarity with computer systems, reading speed, reading strategy and susceptibility to external stress

Interaction of display characteristics


interaction of the above variables

Manipulation facilities

Scrolling vs. paging


scrolling (the ability to move the text up and down on the screen smoothly by a fixed increment to reveal information currently out of view) vs. paging (the ability to move text up and down in complete screens in a manner similar to turning pages of printed texts)

Display size


number of lines that can be displayed on a computer screen at one time

Text splitting across screens


splitting of paragraphs mid-sentence across successive screens

Window format


single vs. multi-window format (whether two windows can be simultaneously presented to display different parts of a single document)

Search facilities


various means of manipulating and locating information in a document (e.g., word/term searches, checking references, locating relevant sections)

Input devices


tracker ball, mouse, function keyboard, joystick, light pen, etc.

Icon design


facilities that allow rapid and easy manipulations of the text as well as access to the document through numerous routes (e.g., boxes, arrows, circles, buttons, etc.)

The main conclusions of Dillon's literature review can be summarized as follows:

1. It is difficult to draw any firm conclusions from empirical findings based on the studies reviewed due to various concerns, such as the limited scope of the studies, the unique nature of the procedures used, the unclear participant selection criteria, insufficient control of variables of interest, and the use of unrealistic reading tasks (e.g., proofreading for misspelling). However, the literature review suggests that reading from computer screens is, in fact, different from reading in print and that reading computer-presented texts generally takes longer than reading printed materials.

2. The effects of mode of presentation on process measures listed in Table 1 are not yet clear because no adequate empirical method to measure reading processes has yet been established.

3. Differences between modes seem to be caused by interactions of individually non-significant effects, and it is, therefore, impossible to attribute differences to any single factor. Moreover, in a long text that does not fit into one screen and therefore requires scrolling or paging, factors that determine the quality of visual image presented to readers as well as availability and quality of text manipulation facilities listed in Table 2 become important.

One of the limitations of the ergonomics studies reviewed by Dillon (1992) is that many of them studied proofreading rather than reading comprehension, while reading comprehension is more relevant to language assessment. Additional empirical studies utilizing reading comprehension tasks have been primarily conducted in psychology, education, and L1 reading research, nine of which are listed in the appendix. These studies were selected for review here because they included (a) experimental conditions for paper-based and computer-based reading conditions without manipulation or navigation facilities available only on computers (which are feasible for relatively long texts and often beyond the scope of language assessment), and (b) reading comprehension and/or reading speed as dependent variables, which are widely studied as outcome measures of information processing in mode of presentation. Most of these studies were conducted in the 1980s. Due to the advancement of computer technology in the past two decades, use of currently available equipment may yield different results from the studies cited here. However, more recent empirical studies meeting the above selection criteria were not available. The studies, therefore, will be reviewed here to provide a historical perspective on mode effects in reading performance. This will also provide baseline information for considering future research designs. Factors of focus, procedures used, and main findings of the studies are summarized in the appendix.

As shown in the appendix, these studies were conducted in widely different conditions, and the following issues should also be remembered when interpreting their results:

1. Ergonomic factors. Various ergonomic issues raised by Dillon (1992), such as display characteristics (e.g., character fonts, size, and line spacing) and features of computer displays (e.g., display size, resolution, image polarity, upper-case only or mixed character use, flicker, and orientation) involved in these studies seem to be varied, but such features were not always reported with sufficient details.

2. Time limit. Heppner et al.'s (1985) research is the only study that conducted the experiment under a timed conventional testing condition. None of the other studies set a time limit.

3. Characteristics of participant. Four of the studies (Feldmann & Fish, 1988; Reinking, 1988; Reinking & Schreiner, 1985; Zuk, 1986) incorporated grade-school children, while the other studies consisted primarily of traditional age college students or older learners. Moreover, participants' backgrounds as to computer familiarity were mixed in these studies; and descriptions of their language background were not provided.

4. Characteristics of reading texts and tasks. Lengths of reading texts ranged from 90-200 words per passage at the shortest (Fish & Feldmann, 1987) to a chapter of an introductory psychology textbook at the longest (McGoldrick et al., 1992). Comprehension tasks involved information search in a textbook chapter (McGoldrick et al., 1992) as well as reading for details and general semantic content in reading passages of conventional lengths often found in reading texts. Moreover, multiple-choice was the preferred item format, although some studies utilized open-ended or short-answer reading comprehension questions (McGoldrick et al., 1992) or form-completion tasks (Feldmann & Fish, 1988; Fish & Feldmann, 1987) as well. Availability of text while answering comprehension questions also differed across studies. Some studies allowed reviewing the text while responding to the questions (Heppner et al., 1985; McGoldrick et al., 1992), while others did not (Belmore, 1985; Reinking, 1988; Reinking & Schreiner, 1985).

5. Characteristics of experimental designs. Only two of the studies (Reinking, 1988; Zuk, 1986) counter-balanced the order of text presentation, while the others either presented the texts in the same order or did not report whether the order effect was controlled.

6. Definition of reading speed. Belmore (1985) and Reinking (1988) measured time spent on reading assigned texts only. Others included time required to complete the reading comprehension tasks as well (Fish & Feldmann, 1987; McKnight et al., 1990; Zuk, 1986).

7. Distracters. The main focus of Zuk's (1986) study was to investigate elementary school children's attention to reading tasks. Thus, a Walt Disney cartoon was played continuously as a distracter while the 3rd and 5th graders worked on the reading tasks in his study.

The findings of these studies are as follows. In terms of the level of reading comprehension, six studies out of the nine reported that comprehension level was similar across the modes (Feldmann & Fish, 1988; Fish & Feldmann, 1987; McGoldrick et al., 1992; McKnight et al., 1990; Reinking, 1988; Zuk, 1986), while one favored paper (Heppner et al., 1985), and two showed interactions -- one with the passage (Belmore, 1985) and the other with the text difficulty and the type of text manipulation (Reinking & Schreiner, 1985). The similarity of reading comprehension level across the modes is consistent with the finding of Dillon's literature review described above. Meanwhile, it is difficult to interpret the results of the two studies that showed interaction effects between mode of presentation and other factors. In Belmore's (1985) study, comprehension of the first set of passages favored print, but the effect disappeared for the second set. As pointed out by Belmore, the fixed order of passage presentation makes it difficult to separate potential order and/or practice effects. In Reinking and Schreiner's (1985) study, 5th and 6th graders scored lower on passages designated to be easier based on standard readability formulas than the other set of passages, which had higher estimates of readability. This may suggest, as pointed out by the authors, that text characteristics not captured by readability formulae may have affected the text difficulty.

Findings on reading speed in the studies are rather mixed. In three studies (Belmore, 1985; McGoldrick et al., 1992; Zuk, 1986), reading took longer on screen than on paper; three others reported that reading rates were not significantly different across modes (Feldmann & Fish, 1988; Fish & Feldmann, 1987; McKnight et al., 1990); and two studies (Belmore, 1985; Fish & Feldmann, 1987) reported a gain in computer-based reading speed as the experiments proceeded, indicating that after a reasonable amount of exposure to the screen-based reading tasks, the effect of mode on reading rate may diminish.

Although only three of the studies in the appendix that investigated reading speed reported that reading from computer screens was slower than reading from print, quite a few studies, including those reviewed by Dillon (1992), replicated the result favoring print in the effect on reading speed. Some studies attempted to explain why this might be the case.

First, Oborne and Holton (1988) attributed the often-reported differences in reading speed to insufficient control of extraneous variables in previous empirical studies. When they controlled orientation of text presentation, retinal distances from the computer screens, image polarity, and page layout, no significant differences were found in either reading speed or in comprehension scores, regardless of the mode and image polarity. However, the strict control of extraneous variables in this study makes it difficult to generalize the results to real-life reading contexts. For example, it is unlikely that in real life readers would use book stands to vertically present the printed text, or to keep an equal distance between their retinas and the texts across modes, as attempted in this study. It has been widely accepted by ergonomists that computer-presented texts are read at greater distances than conventional paper text (Dillon, 1992; Gould, Alfaro, Finn, Haupt, & Minuto, 1987).

Second, limitations in the research methodology used in previous studies may be another source of the observed reading rate differences across modes. Hansen, Doung, and Whitlock (1978) investigated how subjects in their study spent time while taking a computer science computer-based test. Although these results may not be directly applicable to research on reading performance, the authors' explanations on why their subjects took longer to complete the CBT deserve closer attention. In their study, 7 participants took a computer-based test on introductory computer science. Four of them were videotaped. There were two sources of differences in the time spent by the two groups: (a) computer system requirements, that is, time used to go back to the table of contents to select the next task and time taken by the computer to generate problems and display them; and (b) participants' unfamiliarity with computers. These factors may no longer be relevant, considering the powerful computers available and characteristics of computer users in the 21st century. However, it is worth noting that the 4 participants who were videotaped in Hansen's study expressed discomfort with the testing condition and took significantly longer to finish the test than those who were not videotaped. Moreover, when participants' answers were marked on the screen, they were afraid that their answers would be seen by the proctor. The authors suspected that this might have contributed to the longer work time of the videotaped participants, one of whom reported in the post-hoc questionnaire that "…with PLATO you are 'broadcasting' your answer to the world" (Hansen et al., 1978, p. 514). Although the use of videotapes by Hansen et al. provided valuable information as to why the CBT took longer, employment of videotaping must be reconsidered since it might be intrusive for participants; and such discomfort could seriously affect the reliability of the data.

As a possible third explanation, a series of extensive empirical studies that focused on the image quality of text presented on computer screens seem to imply that graphic quality of texts may affect early stages of visual information processing rather than later cognitive information processing; and an improved image quality may, therefore, facilitate reading rate on computers. IBM researchers investigated a wide range of variables, which could account for reading rate differences (Gould, Alfaro, Barnes, Finn, Grischkowsky, & Minuto, 1987; Gould, Alfaro, Finn, et al., 1987). Gould, Alfaro, Barnes, et al. (1987) investigated the effects of potentially important variables, such as task variables (e.g., paper orientation and visual angle), display variables (e.g., dynamic characteristics of CRT displays, quality of CRT displays, image polarity and fonts), and reader characteristics (e.g., familiarity with computer-based reading and age), on proofreading and reading comprehension performance independently in 10 separate quasi-experimental studies utilizing ANOVA designs. They failed to find any single variable that was strong enough to account for the rather sizable reading rate differences of approximately 25%, which were found in previous research when variables were studied separately. In six experiments, Gould, Alfaro, Finn, et al. (1987) focused on independently or simultaneously manipulating image quality factors, which were selected based on the above experiment results, such as character font and size, polarity, anti-aliasing, page layout, screen resolution and flicker. These authors concluded that the combination of positive image polarity, high display resolution, and use of anti-aliasing seemed to have contributed to eliminating the reading rate differences across the modes, suggesting that the image quality may play a crucial role.

The studies by Gould and his associates share the same concern as those reviewed by Dillon, however. Most of the reading tasks used were very short proofreading tasks looking for misspelling. The extent of relevancy of the proposed combination of image quality variables for reading comprehension tasks was thus obscured. For example, Feldmann and Fish's (1988) study reviewed in the appendix provides counter evidence to those of Gould and his associates. In Feldmann and Fish's study, computer-based reading comprehension tasks were presented only in upper case with negative polarity on a then-commercially available computer display, the quality of which was undesirable, according to Gould and his associates. Even under this condition, a rate and comprehension difference across modes was not found.

Furthermore, a study conducted by Ziefle (1998) challenged the position of Gould and his associates that performance differences may diminish when screen resolution is improved. When the same computer monitor was used across experimental conditions and when variables associated with character sets (size and color of fonts and backgrounds) were strictly controlled, Ziefle found that both proofreading speed and accuracy were still superior in the paper condition. Computer monitors that display text of equal or better quality, as compared with those used in the Ziefle et al.'s experiments, are commercially available already. The mixed results of the above studies seem to suggest, however, that even state-of-the art computer technology, where the use of high resolution monitors with negative polarity and anti-aliasing has quickly become a standard, may not provide the comfort of paper-based reading.

Summary of Mode of Presentation and Reading Literature

The general trends found in these studies indicate that comprehension of computer-presented texts is, at best, as good as that of printed texts, and that reading speed may or may not be affected by mode of presentation. Unlike studies conducted in the 1980s, issues such as participants' familiarity with computers and then-current computer system requirements, which made computer presentations of text slow, are quickly becoming less of a concern because of rapid advancements in computer technology. Other explanations that are still pertinent in the 2000s for differential performance between paper-based and computer-based reading proposed in previous studies included insufficient control of extraneous variables, uncomfortable test-taking conditions induced by videotaping during test sessions, and the graphic qualities of text as well as their effects on visual information processing. Although the methodological concerns raised by previous researchers will facilitate the design of future studies, strict control of extraneous variables may limit the generalization of research findings to practical test-taking conditions. Moreover, the mixed results obtained regarding visual explanations suggest that discussion along this line is still inconclusive.

DISCUSSION

Several conceptual and empirical issues raised in the course of the development of mode of presentation research deserve further consideration. Belmore (1985) and Oborne and Holton (1988) explicitly questioned attempts made in previous studies to closely replicate paper-based reading conventions in computer-based conditions. For example, Belmore pointed out that computers are usually introduced in education with the expectation that they would enhance learning and instruction; and computer functions that are not available in text should, therefore, be incorporated whenever existing instructional material is computerized. For example, experiments conducted by Reinking (1988) and Reinking and Schreiner (1985) included two

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • Twitter
  • RSS