Lund University.
Dept.
of Linguistics and Phonetics Working Papers 53 .
, 61-79 Lexical diversity and lexical density in speech and writing: a developmental perspective Victoria Johansson Introduction Literature about early, pre-school lexical development often mentions vocabulary development.
As an example, the reader of Handbook of child language (Fletcher and MacWhinney 1.
is referred to 'vocabulary development' when looking up the term 'lexical development'.
The same term is used by e.
Dromi 1999 in her overview of early lexical development, and the index in David Crystal's The Cambridge encyclopedia of language .
refers to 'vocabulary' from the index entry 'lexicon'.
This article will compare two measures that often have been used to describe later lexical development: lexical diversity and lexical density.
Lexical diversity is a measure of how many different words that are used in a text, while lexical density provides a measure of the proportion of lexical items .
nouns, verbs, adjectives and some adverb.
in the text.
Both measures have the advantage of being easy to operationalise, and also practical to apply in computer analyses of large data corpora.
Further, both lexical diversity and lexical density have been shown to be significantly higher in writing than in speaking (Ure 1971.
Halliday 1.
One conclusion from this could be that the two measures are interchangeable, and that we will encounter a similar developmental pattern independent of the measure used for describing lexical development.
It is, however, theoretically possible that a text has high lexical diversity .
contains many different word type.
, but low lexical density .
contains many pronouns and auxiliaries rather than nouns and lexical verb.
, or, vice versa, that a text has low lexical diversity .
the same words or VICTORIA JOHANSSON phrases are repeated over and ove.
but high lexical density .
the words that are repeated are nouns, adjective or verb.
Lexical diversity is often used as an equivalent to lexical richness .
, by Daller, van Hout & Treffers-Daller 2.
However.
Malvern et al.
begin their book about lexical diversity with discussing the difference between lexical diversity and lexical richness, stating .
long the lines of Read 2.
that the lexical diversity measure is only one part of the multidimensional feature of lexical richness.
Other factors proposed by Read are lexical sophistication, number of errors, and lexical density (Read 2.
I side with Read and Malvern et al.
neither lexical diversity nor lexical density is the one and only measure.
However, both measures are easily accessible and easy to apply to corpora of different kinds.
No doubt they also provide important insights into the texts, and as long as the measures are not used as the only way to judge a text qualitatively, they are very useful.
The aim and outline of the study This study focuses on developmental patterns in terms of the measures lexical diversity and lexical density.
I will examine whether these measures are sensitive to genre .
arrative vs.
and modality .
riting vs.
Another goal is to investigate to what extent the two measures are The article starts with a theoretical background on the two measures, followed by a presentation of the data, then moves on to statistical analyses presented measure by measure, age group by age grup, and ends with a general discussion and a conclusion.
Lexical diversity The more varied a vocabulary a text possesses, the higher lexical diversity.
For a text to be highly lexically diverse, the speaker or writer has to use many different words, with littie repetition of the words already used.
The type-token ratio The traditional lexical diversity measure is the ratio of different words .
to the total number of words .
, the so-called type-token ratio, or T T R .
Lieven 1978.
Bates.
Bretiierton & Snyder 1.
A problem with the TTR measure is that text samples containing large numbers of tokens give lower values for TTR and vice versa.
The reason for this is that the number
of word tokens can increase infinitely, and although the same is true for word
LEXICAL DIVERSITY AND LEXICAL DENSITY
types, it is often necessary for the writer or speaker to re-use several function words in order to produce one new .
This implies that a longer text in general has a lower T T R value than a shorter text, which makes it especially complicated to use TTR in developmental comparisons, e.
between age-groups, where the number of word tokens often increase with Gayraud 2000 compares TTR and the number of word tokens and shows that although the number of word tokens increases substantially with speaker/writer's age, the TTR drops.
One consequence of this is that T T R is only possible to use when comparing texts of equal length.
In spite of this.
T T R is still used for comparing text production, for instance between children's texts, or between various groups with language impairment.
For instance.
T T R is part of the SALT {Systematic Analysis of Language Transcript.
programs, a set of computer programs developed by Miller and Chapman in order to quantify developmental aspects of speech for typically as well as atypically developing children (Miller & Klee 1.
A variant of the T T R measure is the so-called index of Guiraud.
This measure uses the square root of TTR.
Other proposed variants are Advanced TTR and Guiraud Advanced, for instance used by Daller et al.
Vermeer 2000 discusses TTR and various other measures, and their use in both first and second language acquisition.
She concludes her discussion with proposing that lexical richness can be more successfully measured by exploring the degree of difficulty for the words in a text, as measured by their frequency in everyday life.
Theoretical vocabulary Other ways around the TTR-problem have been proposed and used.
One is the so-called theoretical vocabulary .
ee e.
Broeder.
Extra & van Hout The principle behind this measure is to pick a number of words .
100 word.
from a text at random, and calculate the number of word types in the sample.
The theoretical vocabulary takes into account all possible ways of choosing 100 words from the text.
In this way, one can compare texts of different lenghts, with the only restriction that the shortest text limits the maximal number of random words to be picked.
Johansson 1999 uses theoretical vocabulary for comparing spoken and written expository texts between a group of Swedish university students and 12-year-olds.
In this case the program Vocab .
eveloped by Leif Gronqvist.
Department of Linguistics.
Goteborg Universit.
was used for calculating VICTORIA JOHANSSON theoretical vocabulary.
The result shows that the lexical diversity is higher in writing than in speech for both the adults and the 12-year-olds.
The adults have higher lexical diversity than the 12-year-olds.
Vocah was also used by Wengelin 2002 to compare written texts in various genres from three populations: a group of adult controls, a group of congenitally deaf adults, and a group of adults with reading and writing difficulties.
The adult controls had higher diversity than the other groups.
Some of the written texts had spoken equivalents, and Wengelin was able to show that the control group had a greater difference between their spoken and written texts than the group with reading and writing difficulties.
VocD In order to compare texts of different lengths, a measure independent of sample size is required.
One such measure is the D measure developed by Brian Richards and David Malvern (Richards & Malvern 1997.
Malvern et 2004.
MacWhinney 2.
The D measure is based on the predicted decline of the T T R , as the sample size increases.
This mathematical curve is compared with empirical data from a text sample.
For calculating D, information from the whole text sample is used .
he minimum length of the text is 50 words, howeve.
A higher value of D indicates higher lexical diversity, and thus a richer vocabulary.
The D measure is implemented in the most recent versions of CLAN (MacWhinney 2.
, under the name VocD.
The measure VocD is described at length in Malvern et al.
2004, with many examples and references to previous studies about lexical measures.
Although Malvern and Richards claim that VocD permits comparisons between texts of unequal length, not everybody is convinced that the text length factor is completely eliminated by using VocD.
The D measure has been criticised, for instance by Daller et al.
2003, who instead prefer the index of Guiraud.
Malvern and Richards' D measure is severely criticised by McCarthy and Jarvis 2007 for not being insensitive to text lengths.
McCarthy and Jarvis compare D to 13 alternative methods for measuring lexical diversity.
They conclude that D .
r VocD) performs better than most alternatives, but that there are better options.
However, another conclusion is that the length of texts one wants to compare should determine which measure one uses, since some measures are more effective within certain ranges.
Their analysis shows that D is the second best of all measures within the text length of 100-^00 word tokens, which is also what is claimed in Malvern et al.
McCarthy LEXICAL DIVERSITY AND LEXICAL DENSITY and Jarvis 2007:483 finish by questioning "whether a single index has the capacity to encompass the construct of lexical diversity".
Stromqvist et al.
2002 used VocD to compare spoken and written expository and narrative texts produced by adults from four countries.
The results show strong differences between speech and writing, where writing has a much higher lexical diversity.
However, a conclusion from this study is that one should be careful when using the measure to compare data from different languages.
The morphological structure of the language highly influences the outcome of the comparison.
The definition of lexical diversity in this article To conclude, there are several ways to compare lexical diversity between texts of different lengths.
In spite of some criticism.
VocD seems to be the most accurate instilment to use.
For the calculations of lexical diversity below.
I will consequentiy use the measure D.
Lexical density Lexical density is the term most often used for describing the proportion of content words .
ouns, verbs, adjectives, and often also adverb.
to the total number of words.
By investigating this, we receive a notion of information a text with a high proportion of content words contains more information than a text with a high proportion of function words .
repositions, interjections, pronouns, conjunctions and count word.
Various variants of lexical density have been proposed.
A popular 'minor variant' is to calculate the noun density, the number of nouns divided by the total number of tokens in the text.
Other options are for instance verb or adjective or adverb types per total lexical words.
Various options are described and discussed in Wolfe-Quintero.
Inagaki & K i m 1998.
Introducing the concept of lexical density.
Ure 1971 distinguishes between words with lexical properties, and those without.
According to Ure, items that do not have lexical properties can be described "purely in terms of grammar" .
, meaning that such words .
r item.
possess a more grammatical-syntactic function than the lexical items.
Lexical density is then defined as the total number of words with lexical properties divided by the total number of orthographic words.
The result is a percentage for each text in the corpus.
Ure concludes that a large majority of the spoken texts have a lexical density of under 40%, while a large majority of the written texts have a lexical density of 40% or higher.
One remark here is that these numbers LEXICAL DIVERSITY AND LEXICAL DENSITY VICTORIA JOHANSSON ought to be highly language dependent - a language with more bound morphology would probably show a higher proportion of lexical items.
In a later article.
Ure defines lexical density as "the proportion of words carrying lexical values .
embers of open-ended set.
to the words with grammatical values .
tems representing terms in closed set.
Since all words have grammatical values, this is a part : whole relation" (Ure & Ellis 1977:.
Ure and Ure & Ellis correctiy maintain that the matter of lexicality is important when discussing the concept of lexical density.
Traditionally, nouns, verbs and adjectives are the three word classes considered to have lexical properties .
lthough this is not stated clearly in Ure 1971 or Ure & Ellis 1.
Often these items are called content words or open class words .
ecause of the possibility to easily include new members of the class - while die more grammatical parts of speech are called closed classes, since new prepositions or pronouns seldom enter the languag.
The concept of lexical density is developed, and further refined by Halliday 1985.
He points out the importance of discriminating between lexical items and grammatical items.
A n item may consist of more than one word.
Thus.
Halliday counts turn up as one lexical item, while Ure 1971 counts it as one lexical item .
and one grammatical item .
A lexical item is by Halliday defined as an item that "function.
in lexical sets not grammatical systems: that is to say, they enter into open not closed contrasts" (Halliday 1985:.
The lexical item is part of an open set, that can be contrasted with a number of items in the world.
A grammatical item, on the other hand, enters into a closed system, according to Halliday.
Characteristic for the grammatical system is that the .
classes belonging to it have a fixed set of items, where it is impossible to add new members.
According to Halliday, child language gives evidence for tiie existence of two classes, one with lexical and one with grammatical items.
In the beginning of their linguistic development, children often construct sentences where all grammatical items are missing.
Halliday further emphasises that there is a continuum from lexis into grammar, and that there are - and always will be - intermediate cases.
For instance, he claims that English prepositions and certain classes of adverbs are on the borderline between lexical and grammatical items.
The adverbs that he gives as examples are the modal adverbs, such as always and perhaps.
When comparing e.
speech and writing, the important thing is to be consistent in drawing the line between 'lexical' adverbs and 'grammatical' adverbs, but it matters less where the line is drawn.
The definition of lexical density given by Halliday is thus "the number of lexical items, as a proportion of the number of running words" (Halliday 1985:.
The difference between Halliday's and Ure's definitions of lexical density is that Halliday counts some adverbs as lexical items.
The definition of lexical density in this article This article follows Halliday's definition of lexical density.
Thus, grammatical adverbs are included in the closed class items, while non-grammaticalised adverbs .
ncluding all adverbs derived from adjective.
are counted as lexical In our data, lexical density was calculated by dividing the number of lexical items by the total number of words in each text.
Data To compare lexical diversity and lexical density in a developmental perspective.
I have used material from the Swedish part of an intemational study on developing literacy, the so-called Spencer project^ .
or more details on data collection, see Berman and Verhoeven 2002, or Johansson 2.
The Spencer study aimed at investigating the development of literacy in both speech and writing in two different genres: narrative and expository.
The Swedish data consist of 316 texts distributed evenly on written and spoken narrative and expository texts.
Four age groups participated in the study: 10year-olds .
th-grader.
, 13-year-olds .
th-grader.
, 17-year-olds .
, and adults .
niversity students with at least 2 years of university education, during which they had produced at least one major pape.
A l l participants were monolingual Swedish speakers^, with no known reading or writing difficulties.
Each group consisted of 20 persons, except the adult group which had only 19 members.
The text length range was 50-650 words.
After watching a wordless elicitation movie showing scenes from a school-day .
, from cheating, fighting, bullying, stealin.
, the participants were asked to produce four texts each.
The experimental tasks were balanced 'The project was supported by the Spencer Foundation Major Grant for the Study of Developing Literacy to Ruth Berman.
Apart from Sweden, six other countries participated: Israel.
Netheriands.
France.
Spain.
Iceland and California.
USA.
2'MonoIinguaI speaker' here means that both parents had Swedish as their first language, and that Swedish was the main language used both at home and at school.
At the time of the recording, all subjects had at least started to learn English in school, however, and some of the participants in the adult group might have spent long time abroad.
LEXICAL DIVERSITY AND LEXICAL DENSITY
VICTORIA JOHANSSON
for order.
The text types and the topic for each taslc were as follows .
ith the elicitation question rephrase.
Spoken narrative (NS): Tell me about one time when you helped somebody in/was helped by somebody out of a predicament.
Written narrative (NW): Write about one time when you helped somebody in/was helped by somebody out of a predicament.
Spoken expository (ES) .
a speec.
: Give a speech, where you discuss the problems you just saw in the film.
Don't describe the film, but instead say something about the cause of the problems, and possible solutions.
Written expository (EW) .
an essa.
: Write an essay where you discuss the problems you just saw in the film.
Don't describe the film, but instead say something about the cause of the problems, and possible solutions.
Correlating lexical diversity and lexical density Before exploring each lexical measure individually, a correlation test will give a hint on whether or not the two measures are connected in the data.
Not surprisingly, given that both measures have been proposed to show lexical development, there proved to be a highly significant correlation between lexical diversity and lexical density .
= 0.
733, p < 0.
Overall patterns of age, modality and genre After stating that lexical diversity and lexical density are correlated in the data, multivariate ANOVA was used to explore overall patterns of age, modality and genre for each lexical measure.
To summarise the results below, the general effects were significant for almost all factors, including an interaction of genre, age and modality.
To investigate the main effects of genre and modality and the interactions between these factors, a within-subject factor test was used, while a betweensubjects test was used to look for main effects of age.
Table 1 shows an overview of the results of the post hoc tests.
Lexical diversity: Multivariate analyses Multivariate analyses of lexical diverstity show a significant main effect of genre (F.
= 4.
236,/? < 0.
^ = 0.
, of modality (F.
= 333.
p < 0.
01, rf = 0.
, and of age (F.
= 3302.
206, p < 0.
T?^ = 0.
Table 1.
Results of the post hoc comparisons between lexical diversity and lexical density.
Lexical Measure Lexical diversity Lexical density Subset 1 10-year-olds 13-year-olds 10-year-olds 13-year-olds Subset! 17-year-olds Subset 3 Adults 17-year-olds Adults A significant interaction of modality and age is also found (F.
= 11.
664,/7<0.
01,772= 0.
, as with genre and modality (F.
= 3.
jf7<0.
05, rf- = Q.
\2%).
However, there is no significant interaction of genre and age group.
Tukey's post hoc analyses show no significant difference between the two youngest age groups .
-year-olds and 13-year-old.
, but a significant difference between the two youngest age groups and the two oldest ones.
Further, there was a significant difference between the two oldest groups, in that the adults had higher lexical diversity than the 17-year-olds .
subsets from the post hoc tests in Table .
Lexical density: multivariate analyses Multivariate analyses of lexical density show a main effect of modality (F.
= 651.
744, p<0.
01, j?^^ 0.
, and of age (F.
= 20.
p<(}.
0\,rf- = 0.
, but no effects of genre.
Further, a significant interaction of genre and age is found (F.
= 181, p <0.
01, j?2 = 0.
, and of modality and age group (F.
= 3.
p <Q.
Q5,rf- = 0.
, but there is no interaction between genre and modality.
Tukey's post hoc analyses show no significant differences between 10year-olds and 13-year-olds, or any significant differences between 17-yearolds and adults.
However, there is a significant difference between the two younger age groups on the one hand, and the two oldest age groups on the other .
the subsets from the post hoc test in Table .
Conclusion: overall patterns of age, modality and genre Table 1 shows the homogeneous subsets from the post hoc tests, and summarises in that way the differences between the lexical measures.
17year-olds and adults differ significantiy for the lexical diversity measure, but their texts appear to be equally lexically dense.
Thus, the progression of development is more outstreched when we use lexical diversity.
LEXICAL DIVERSITY AND LEXICAL DENSITY
VICTORIA JOHANSSON
Table 2.
Lexical diversity: means broken down by age group and text type.
One conclusion is that although the correlation test shows a strong correlation betwen the lexical measures, and the multivariate analyses show a significant main effect of age, the developmental pattern varies depending on the lexical measure of investigation.
Both measures showed a modality effect, in that they are significantly higher for the written discourses.
This confirms the results from Ure 1971.
However, we only find an effect of genre for lexical diversity.
the lexical density measure seems to be indifferent to genre.
From this follows that although the lexical measures are correlated, we might get different insights depending on which measure we look at.
Compared with lexical density, lexical diversity proved to be more genre sensitive, as well as more sensitive to development.
Comparing text types within each age group In the following.
I will compare each measure within each text type .
arrative written, narrative spoken, expository written or expository spoke.
as well as within each age group.
A multivariate ANOVA will be used to compare the differences for each text type within each age group, with the aim of investigating genre and modality differences within each age group.
If such differences can be established, a paired sample ?-test will be used to find differences between pairs within a factor, e.
, differences between expository spoken texts and narrative spoken texts.
Text Type 10-year-olds 13-year-olds 17-year-olds Adults Furthermore, there is a significant effect of modality (F( 1,.
= 34.
p<0.
01, ?f = 0.
, so that the written texts have higher lexical diversity than the spoken ones.
A paired sample f-test shows no significant differences between the two spoken genres or the two written genres.
13-year-olds The 13-year-olds show a significant effect of modality (F.
= 141.
p < 0.
01, rp- = 0.
, but no effect of genre.
This means that the 13-year-olds have higher lexical diversity in their written texts, but there is no difference between narrative written and expository written texts.
Lexical diversity Table 2 shows the means of lexical diversity broken down by age group and text type, and Figure 1 illustrates this graphically.
There is a trend for lexical diversity to increase with age, and the striking difference between speech and writing for ail age groups, independent of text type, is salient in the figure.
10-year-olds The 10-year-olds show a significant effect of genre .
= 5.
p < 0.
05, rf- = 0.
, in that the expository texts are more lexically diverse than the narrative ones.
The highest lexical diversity is found in the written expository texts, and the lowest lexical diversity is found in the spoken narrative texts.
Figure I.
Lexical diversity broken down by age group and text type.
LEXICAL DIVERSITY AND LEXICAL DENSITY
VICTORIA JOHANSSON
However, a paired sample Mest shows a significant difference between the narrative spoken texts and the expository spoken texts .
= 2.
/7<0.
Thus, for the spoken texts there is a genre effect, where the expository spoken texts are more lexically diverse.
17-year-olds The 17-year-olds show patterns similar to the 13-year-olds'.
Thus, there is a significant effect of modality (F.
= 132.
124, p<0.
01, 772 = 0.
, but no significant effect of genre.
This means that the 17-year-olds have higher lexical diversity in their written texts, but there is no difference between narrative and expository texts.
Adults Like the younger age groups, the adults also show a significant effect of modality (F.
= 86.
502, p <0.
01, ri^ = 0.
%2S).
Again, there is no significant effect of genre.
The written texts thus have higher lexical diversity than the spoken texts, but there is no difference between narrative and expository A paired sample f-test shows that the adults, just like the 13-year-olds, have a difference between their spoken narrative texts and their spoken expository texts (/.
= 3.
378, p < 0.
the expository spoken texts have higher lexical diversity.
Lexical density Table 3 presents the means of lexical density broken down by age group, and text type.
Figure 2 gives a graphic overview of the same data.
Just as for lexical diversity, the graph of lexical density show a difference between the spoken and the written texts, independent of genre.
We also find a trend for lexical density to increase with age, although the ti-end seems less salient than for lexical diversity .
Figure .
Table 3.
Lexical density: means broken down by age group and text type.
Text Type NS .
NW :
10-year-olds 13-year-olds 17-year-olds Adults Furthermore, there is a genre effect, where the narrative texts have higher lexical density than the expository texts (F.
= 9.
942, p<Om, rf- = Thus, the narrative written texts show the highest lexical density, while die lowest lexical density is found in the expository spoken texts.
paired sample f-test shows a significant difference between the narrative spoken texts and the expository spoken texts .
=-7.
730, /><0.
Thus, there is a genre effect for the spoken texts, where the narrative spoken texts are more lexically dense than the expository ones.
17-year-olds The 17-year-olds show a significant effect of modality (F.
= 183.
p<0.
01, 77^=0.
, where - again - the written texts have higher lexical density than the spoken texts.
There are no effects of genre.
30% !
lO-year-olds The 10-year-olds show a significant difference of modality (F.
= 360, / 3 < 0.
A?2 = 0.
, in that the written texts have higher lexical density than the spoken ones.
However, there are no genre effects.
0% 'A
10-year-olds 13-ycar-olds 17-ycar-olds Adults AiAAiNS AiAAiNW Ai A - E S A -i- -EW 13-year-olds The 13-year-olds show a significant modality effect (F.
= 171.
p<O.
Ol,rj^= 0.
, with the highest lexical density in the written texts.
Figure 2.
Lexical density broken down by age group and text type.
VICTORIA JOHANSSON
Adults The adults have a significant effect of modality (F.
= 173.
284,/? < 0.
if = 0.
, where the lexical density is higher in the written texts.
No genre effects are found.
Conclusion: lexical measures within each age group After comparing the text types within each age group, some conclusions can be drawn.
First, the 10-year-olds show a modality difference in all tests, but no genre differences.
Thus, this group seems to be highly sensitive to the modality, but able to adapt their lexicon less to genre.
The 13-year-olds, on the other hand, is the odd group here.
they show both modality and genre differences.
The modality difference is not so difficult to independent of age group a more diverse and dense language is generally required in writing compared to speaking, due to the decontextualised conditions in writing.
In the same way it is problematic to produce a lexically diverse or dense text in speech.
repetitions are necessary .
hich makes the diversity lowe.
, and pronouns are both more adequate and easier accessible in speech .
hich makes the density lowe.
More interesting are the genre effects found in the 13-year-olds' texts.
Their spoken expository texts have higher lexical diversity than the narrative equvalents, but the spoken narrative texts have higher lexical density than the equivalent expository spoken texts.
This means that more content words are used in their spoken narratives, but that the vocabulary is more varied in the spo-ken expositories.
One factor which would increase diversity, but decrease density is an extensive use of pronouns .
uch as 'they', 'we'.
T , or man ('one', the Swedish generic pronou.
to express degrees of generalisation, in combination with consecutive conjunctions ('because', 'so that', 'therefore') to express connections between problems and solutions.
Another factor is that the elicitation of the expository task invited to a more context-bound discourse.
all subjects knew that the experiment leader had seen the elicitation movie.
In addition they were explicitiy told not to describe the movie.
Together, these factors may invoke a more extensive use of pronouns, and thereby decrease lexical density, while the diversity remains higher than in the narratives since the variation of pronouns is great.
The 17-year-olds show a modality difference for all the lexical measures, but no genre differences at all.
In the light of the 13-year-olds' pattern we could interpret the results so that the 17-year-olds increase their number of lexical items in the expository texts, and in that way even out genre LEXICAL DIVERSITY AND LEXICAL DENSITY differences for lexical density here.
The genre effect of lexical diversity can be explained by the 17-year-olds' immense use of fillers and empty phrases.
they are making strong efforts not to be silent during their spoken On the other hand, this decreases the lexical diversity The adults show, like all groups, modality differences, but the genre differences in this group are especially interesting in a developmental perspective.
To resume, at the upper end of the developmental scale, the adults use a more varied vocabulary in their spoken expository texts than i n the spoken narrative ones.
One conclusion is that the adults are able to use knowledge acquired and practised in writing, also when they speak, and that this is most noticeable in the cognitively more demanding expository genre.
Comparing text types between age groups Following strong indications of significant differences between genres, between modalities, and finally between age groups.
ANOVA will be used to examine how the differences distribute over age in each text type.
Thus, for each lexical measure I will look for differences in age groups in each text type (Expository spoken.
Expository written.
Narrative spoken and Narrative Since each participant only wrote one text of each text type, an ordinary ANOVA can be used to compare the four age groups with each other.
Results from Tukey's post hoc tests will be used to explore significant age group differences within a text type.
Lexical diversity: text types and age Figure 1 showed that lexical diversity increased with age.
This is confirmed by the findings previously presented.
The results from Tukey's post hoc tests presented in Table 4 show how the age groups can be divided into homogeneous subsets in the various text types.
The table indicates that there are no differences between 10-year-olds and 13-year-olds.
Nor do 17-yearolds and adults differ in the written conditions.
However, the spoken conditions show a more outstretched developmental The 17-year-olds use a more varied vocabulary in the narrative spoken texts than the youngest age group, indicating that the familiarity of that text type facilitates a more lexically diverse text production.
The adults find it even more easy to vary their lexicon in their spoken narratives.
Notable is also that the 17-year-olds are not more lexically diverse in their LEXICAL DIVERSITY AND LEXICAL DENSITY VICTORIA JOHANSSON Table 4.
Lexical diversity: results of the post hoc comparison, presented for each text type separately.
Text type Subset I 10-year-olds 13-year-olds 10-year-olds 13-year-olds 10-year-olds 13-year-olds 17-year-olds 10-year-olds 13-year-olds Subset 2 17-year-olds Subset 3 Adults 17-year-olds Adults Adults Table 5.
Lexical density: results of the post hoc comparisons presented for each text type separately.
Text type 17-year-olds Adults spoken expositories than the 10-year-olds and the 13-year-olds, while the adults outrule them all.
Lexical density: text types and age If we end the analysis by looking at how the age groups divide into homogeneous subsets for the text types in lexical density we achieve a more complicated pattern.
Table 5 gives an overview of the results of the post hoc tests for lexical It shows that the narrative spoken texts are equally dense for all age Thus, the adults use the same proportion of content words as the 10year-olds ! In the narrative written condition the pattem is more stretched-out, indicating that the 17-year-olds and the adults differ from the 10-year-olds.
Further, the adults differ from the 13-year-olds.
For the expository spoken texts, again, the adults are outstanding.
They use more content words than the other age groups.
In the written expository texts, however, we find no difference between adults and 17-year-olds, indicating that the 17-year-olds can compete with equally lexically dense texts in writing, but not in speech.
As has been proposed before, one explanation might be that the adults take time to think before they formulate their spoken texts, while the 17-year-olds repeat the same phrases while thinking, decreasing the ratio of content words to the total number of word Conclusion This study has shown that although both lexical density and lexical diversity can be used to account for modality differences and developmental Subset 1 10-year-olds 13-year-olds 17-year-olds Adults 10-year-olds 13-year-olds 10-year-olds 13-year-olds 10-year-olds 13-year-olds Subset 2 Subset 3 13-year-olds 17-year-olds 13-year-olds 17-year-olds 17-year-olds Adults 17-year-olds Adults Adults differences, a closer analysis where both measures are used on the same material reveals that they are not interchangeable.
Interesting enough, for both measures, there is no age difference between the 10-year-olds and the 13-year-olds.
In the same way, we do not find differences between 13-year-olds and 17-year-olds for all text types.
This indicates that although there is an age factor involved in the increase of lexicon .
ndependent of measur.
, these patterns will not always be salient if we do not look at a long term development.
One should be careful not to use these measures alone when comparing texts produced by children with small age differences.
Another conclusion is that we perceive a more noticeable developmental trend for lexical diversity than for density.
This suggests that lexical diversity is a better measure to use for detecting differences between age groups.
Finally, much development takes place between the last years in high school, and the university.
The main differences, independent of measure, have been found in the spoken conditions between the adults and the other age groups.
I would like to propose that the adults' more extensive use of written language .
oth reading and writin.
have given them a vocabulary platform, which facilitates not only their written language, but also have high influence on their spoken productions.
The 17-year-olds are in many ways able to compete with the adults in writing .
hen it comes to a varied, lexical dense vocabular.
, when the time constraints of speech is removed, but tis varied vocabulary is less accessible in writing.
VICTORIA JOHANSSON
References