Journal of the Institute for Second Language Development

Journal of the Institute for Second Language Development




March 2006
Gender and Academic Performance in English Communication Courses
July 2007
A Case Study Of A Japanese Learner In The UK
September 2007
Course Blogs for Overseas Study Preparation: A Survey of Student Opinions
September 2007
An Emerging Japanese English
February 2009
Observations on One Japanese University's General English Program
December 2013
Incentivization and In-class Participation in the Japanese University English Language Classroom

Case Study Of A Japanese Learner

Darren Elliott (MA ELT, DELTA)


I looked at three samples; a drafted MA assignment with N's corrections, a short, set writing task about reading and listening skills and a collection of informal email correspondence.


Again, samples ranged from the formal (a recording of a presentation given as part of an MA assignment) to the informal (an interactive speaking activity, recorded telling a story to other MA students at dinner). The interview and a pronunciation test reading provided further data.


Reading assessment came largely from N's own analysis in interview and writing.


N's own evaluation of his listening in interview and writing was the basis for this study, supplemented with observation of performance in MA seminars and lectures.

To define two of the major considerations in test construction, reliability is the achievement of consistency in scoring. To put it simply - if group of students take a test on a Wednesday, they should achieve the same results on Thursday (Hughes, 2003: 36). A wide range of factors might affect this, from the environmental (the temperature of the room, the comfort of the furniture) to test features (item types, clarity of the rubric) to scorer characteristics (standardisation procedures, sufficient marking time and training).

A test can assert a certain degree of validity if it tests what it is meant to test and if there is 'correspondence between the test and the non-test activities that the scores are expected to reflect.' (Luoma, 2004: 184).

In testing it is commonly accepted that 'there is a trade off between [reliability and validity]' (Alderson et al, 1995: 187). That is, one might make claims for the reliability of a carefully-designed multiple choice vocabulary tests, as scorer consistency is easier to achieve. However, the validity of such a test is possibly limited as a measure of the ability of the examinee to understand and produce the vocabulary in real life situations. Weir (1988: 34) argues that for a fuller learner profile the largest possible sample of data should be collected. Scale often makes this impractical, but with only one subject I have been able to focus on the data in a way which an institution designing regular assessment would not.

The assessment of validity can be further broken down into a number of types. Baxter focuses on three of these sub-divisions; content validity, construct validity and face validity (Baxter, 1997: 18).

Content validity assesses the fit between the syllabus and the test (Underhill, 1987: 106). In this case, there is no syllabus of language study, so I am testing N's ability to fulfil his academic obligations in English. If content validity need not be linked to a course, but can be based purely on authentic use, then this method of testing can be deemed to have content validity.

Construct validity is seen by many as validity itself – the main notion into which other types of validity feed, although it '…essentially involves assessing to what extent the test is successfully based on its underlying theory' (Alderson et al, 1995: 182). It is, according to Weir (1988:27) often linked to a close fit between the test and the teaching that comes ahead of it. N attended a pre-sessional EAP course in preparation for the MA, taught by some of the same tutors who guided him on the MA.

I would like to turn finally to the question of face validity; simply put, whether the test appears to test what it is trying to test (Baxter, 1997: 20). If the test takers and scorers cannot see how this test relates to the aspect being tested, they will lose confidence and the test itself will suffer. Face validity 'is not a scientific notion' (Hughes, 2003: 33), and cannot really be measured as such, although test designers can judge the reactions of scorers and examinees through interviews, questionnaires and other such methods. The methodology under discussion here is transparent in what it aims to test, and is clearly directed to the learner very specifically. N was happy to participate in the study as the benefits to him are clear, so face validity is high.

The main problem with focusing on authentic samples and not using a standardised test is that of practicality. It has been difficult to mine the rich data collected in this study for strengths and weaknesses, and a practice test with an answer key would no doubt have been a simpler option. However, Tarone & Yule (1989: 77) point out the danger of variability in testing. No matter how well designed a test, it is hard to correlate precisely accuracy in tests with production in authentic discourse. That is why I focused almost solely on samples from my subject's real life.

Evaluation and analysis of the learner's strengths and weaknesses

Productive Skills: Writing


Task fulfilment and effect on the reader

Each of the samples fulfils its communicative purpose to a certain extent. The emails, although error prone, create a strong impression by appealing directly to the reader (D37 Right?, D40 YOU know the reason).

Appropriacy of style and genre

Again, the writer has a strong understanding of the conventions of email and uses capitalisation and vocabulary for emphasis or dramatic effect (D25 "but what they mailed me back is totally THE SAME", D41 "Just because they are tamed"). These techniques are not used in the semi-formal writing, done for the purposes of this study, or the assignment draft, where they would have been inappropriate. The writer has done especially well to select the appropriate tone for Appendix C without any clear guidance. This is what I had hoped for from an advanced English learner.

Organisation, cohesion and layout

Paragraphing is accomplished, as expected for the level. In the academic assignment, the writer uses a number of cohesive devices to link his ideas accurately and clearly (B70 However, B73 That is, B86 As a result, B97 To cover these factors). These and other linkers are both inter and intra-sentential. The range of linkers in the emails is more limited, D19, 21, 23, 27 "so") but appropriate for the genre. The reader is still able to understand the chronological order of the story and the repetition of language (lexical cohesion) helps to give a sense of the frustration of the writer.

Range of lexis and grammar

N attempts a broad range of both complex and simple structures in his academic writing as to be expected for this level. The range of lexis in the assignment is appropriate for an advanced learner, with use of collocations and set phrases.

Accuracy of lexis and grammar

Lexis is used accurately for the most part, and grammatical errors do not interfere with communication. N has been very careful to proofread his assignment and, with the help of his tutor, correct his grammatical errors, as seen in the second part of Appendix B ( B235 – D505).


Task fulfilment and effect on the reader

Appendix D demonstrates that it is difficult for the writer to answer the question fully and clearly for the reader in academic texts at this level. This will be discussed further under the category of organisation.

Appropriacy of style and genre

Although generally appropriate, occasional stylistic techniques in Appendix B are too informal for this genre. ( B27 "Why is this so?" rhetorical question). The switch between the theoretical and the practical is somewhat challenging and sometimes jars.

Organisation, cohesion and layout

At sentence and paragraph level, all the texts are well organised and punctuated for the genre and level. However, as stated above it is difficult for the reader to follow the argument in Appendix B due to the overall arrangement of ideas. This is something N knows is challenging for him. The problem is set out at the beginning of the piece but no solution is offered until the end in the Japanese style. The assignment would benefit from restructuring to bring the proposals to the fore, a discussion of each, and a restatement at the end.

Range of lexis and grammar

Reporting verbs are overused at this level ("state" B21, 33, 42, 48, 80, 122, 134 etc, similar use of "assert").

Accuracy of lexis and grammar

A number of grammatical errors come up in Appendix B, and although these are informal emails and errors are more acceptable, tense mistakes (B20 "I noticed that no book has arrived", B13 " Before I start travelling this time") are quite surprising at this level. The grammatical corrections in the latter part of Appendix B are a wealth of data. For examples, at this level one would expect fewer mistakes with passive forms (B256, 265). Lexis is selected well for the most part, although word forms are sometimes problematical (D11 frustrating).

Page 1 2 3 4

Top of page 

All Rights Reserved Copyright © JISLD - Journal of The Institute for Second Language Development