Dynamic computational networks John Goldsmith University of Chicago
- Slides: 115
Dynamic computational networks John Goldsmith University of Chicago April 2005
Work done in collaboration with Gary Larson (Wheaton College) Other work by Bernard Laks and his students (Paris X)
Two models in neurocomputing: 1. In space: lateral inhibition Work done jointly with Gary Larson. Discrete unit modeling. 2. In time: neural oscillation
The structure of linguistic time n n Language is not merely a sequence of one unit after another – at any level. In phonology, we have known since classical times about syllables and feet – whatever they are. What are they?
One view of syllables The Panini-Saussure view: Language is uttered in waves of increasing and decreasing sonority. Syllables are units that begin with a sequence of rising sonority, and end with a sequence of falling sonority. n
Pike-Hockett-Fudge-Selkirk n n The alternative to the wave view of the syllable was proposed by Pike (Pike and Pike 1947), who proposed: To apply Bloomfield’s syntactic model of immediate constitutents to phonology. Bloomfield was not amused. Hockett 1955 (among others) took this as a central fact about phonology: that all apparent phonological sequences were really hierarchical structure.
Accent n Metrical theory (Liberman 1975) came in two flavors: – Hierarchical theory (Liberman and Prince 1977) – Metrical Grid (Prince 1982) – The grid model emphasized the rise and fall of an unnamed quantity. – Halle (et collaborator) attempted to integrate constituency and the grid.
Immediate constituents (ICs) n n n So the granddaddy of the constituent theory of syllables and feet is the structuralist theory of IC. ICs were reformulated by Harris and by Chomsky as Phrase-Structure Grammars. What’s the central idea of PSGs? (and why should we care? )
PSGs n n Basic message: The structure in language does not pass from one terminal element to another, but flows up and down a tree. The structural link between two adjacent elements is expressible directly iff they are sisters:
a ‘det’ can be followed by an N because there is a rule NP det N; the generalization is through the mother category. This relationship is unchanged if there is a linearly intervening element:
PSGs are not designed to deal with relationships between adjacent terminal elements. That’s a hypothesis about the nature of syntax.
PSGs n n are designed to deal with structurally defined positions that can be recursively elaborated indefinitely. They are unnecessary for accounting for material that can be indefinitely expanded in a linear sense (i. e. , flat structure).
PSGs n Not good at dealing with distinct functions assigned to the same distributional categories in different positions (i. e. , marking pre-verbal NPs as subjects, post-verbal NPs as objects; distinguishing the functions of post verbal NPs; etc. )
Note what GPSG did: n n Split up PS rules into mother-daughter relations (immediate constituency) and left-right relationships. And in phonology?
What kind of structure do phonological representations need? n n Proposal: They need to be able to identify local peaks and global peaks of two quantities: sonority and accent. We need to build a model in which that computation is accomplished, and no other.
Original motivation for this particular model n n n Dell and Elmedlaoui’s analysis of Tashlhiyt Berber: There, the generalization appears to be that segments compete with their neighbors with respect to sonority, so to speak. In most cases, a segment is a syllable nucleus if and only if (iff) its sonority is greater than that of both of its neighbors.
n n We take that to be the central operation: search for elements for which a function takes on a peak value w. r. t. its neighbors. (discrete versions of 1 st and 2 nd derivative). To this, we add another hypothesis: that the value of the function may be influenced by its context.
Its context? n n Sonority: the inherent sonority of a segment may be influenced by its environment. The segments to its left and right may increase or decrease its sonority. Derived sonority is a function of both the inherent sonority and the sonority of the neighbors. Accent:
Accent n n The accent on an element is a function of both its inherent accentability (weight of the syllable = sum of the sonorities of the syllable or the coda) and its context: Context? A stressed syllable destresses syllables on either side; An unstressed syllable stresses syllables on either side. All part of the same computational system.
Syllabification and accent are not part of a general, all-purpose phonological computational engine.
Dynamic computational nets ¶ Brief demonstration of the program · Some background on (some aspects of) metrical theory ¸ This network model as a minimal computational model of the solution we’re looking for. ¹ Its computation of familiar cases º Interesting properties of this network: inversion and learnability » Link to neural circuitry
Dynamic computational nets ¶ Brief demonstration of the program · Some background on (some aspects of) metrical theory ¸ This network model as a minimal computational model of the solution we’re looking for. ¹ Its computation of familiar cases º Interesting properties of this network: inversion and learnability » Link to neural circuitry
Let’s look at the program --
Dynamic computational nets ¶ Brief demonstration of the program · Some background on (some aspects of) metrical theory ¸ This network model as a minimal computational model of the solution we’re looking for. ¹ Its computation of familiar cases º Interesting properties of this network: inversion and learnability » Link to neural circuitry
Initial activation Final activation
P(i) is the Positional activation assigned to a syllable by virtue of being (first, last) syllable of the word. That activation does not “go away” computationally.
Beta = -. 9: rightward spread of activation
Alpha = -. 9; leftward spread of activation
Dynamic computational nets ¶ Brief demonstration of the program · Some background on (some aspects of) metrical theory ¸ This network model as a minimal computational model of the solution we’re looking for. ¹ Its computation of familiar cases º Interesting properties of this network: inversion and learnability » Link to neural circuitry
Examples (Hayes) Pintupi (Hansen and Hansen 1969, 1978; Australia): “syllable trochees”: odd-numbered syllables (rightward); extrametrical ultima: Ss Ss. Ss. Sss
Weri (Boxwell and Boxwell 1966, Hayes 1980, HV 1987) n n Stress the ultima, plus Stress all odd numbered syllables, counting from the end of the word
Warao (Osborn 1966, HV 1987) Stress penult syllable; plus all even-numbered syllables, counting from the end of the word. (Mark last syllable as extrametrical, and run. )
Maranungku (Tryon 1970) Stress first syllable, and All odd-numbered syllables from the beginning of the word.
Garawa (Furby 1974) (or Indonesian, …) n n Stress on Initial syllable; Stress on Penult; Stress on all even-numbered syllables, counting leftward from the end; but “Initial dactyl effect”: no stress on the second syllable permitted.
Two other potential parameters to explore: n n Penult activation Bias = uniform activation for all units
Why penult? Why not: in most cases, n negative Penult activation = positive Final activation, n positive Penult activation = negative Final activation But. . . n
Two reasons to consider Penult. . . (in addition to the fact that it’s easily learnable): ¬ One source for Antepenult patterns Explanation of two patterns of cyclic stress assignment
Two kinds of cyclic assignment Indonesian type: Stress the penult: …ss. Ss add a suffix: …. s s s S ] s add a suffix: …. s s ] S ] s
[ S s S ] òtogògráfi Versus [ [ S s s s S s ] ] kòn tin u a sí ña (A. Cohn) I = 0. 65 Pen = -1. 0 alpha = -. 4 beta = -0. 2 [ [ s s s 0. 31 -0. 72 -0. 85 ] ] n
Greek n n n a. [ Inherent Derived s 1 s 2 0 0 -0. 2 0. 5 s 3 -1 -1 ] b. [ Inherent Derived c. [[ Inherent Derived s 1 s 2 0 0 0. 06 s 1 s 2 0 0 -0. 12 0. 25 s 3 0 -0. 2 s 3] -1 -0. 5 s 4 ] -1 0. 5 -1 s 4 ] -1 -1 n n n n
Other type (Greek, …). . s s S s add a suffix: …s s S s ] s (stress doesn’t shift) add another suffix: …s s s s] S ] s
Q-sensitive: Latin stress rule Stress penult if it is heavy; otherwise, stress the antepenult.
Q-sensitive systems: an example: ultima or penult
An analysis a=-0. 2, b = 0. 8 Heavy syllables get D = 2. 0 If bias > 0 Yapese If bias < 0 Rotuman
Dynamic computational nets ¶ Brief demonstration of the program · Some background on (some aspects of) metrical theory ¸ This network model as a minimal computational model of the solution we’re looking for. ¹ Its computation of familiar cases º Interesting properties of this network: inversion and learnability » Link to neural circuitry
= Network M Input (underlying representation) is a vector Dynamics: Quite a surprise! (1) Output is S*: equilibrium state of (1), which by definition is: Hence:
Please note. . . n This is not a system where you input a vector U, and watch in the limit
Inversion, again -- note the near eigenvector property Dynamics: Output is S*: equilibrium state of , which by definition is: Hence: U = S 0 M*S 0 S 1 (I is the identity matrix) S 2 S* = Sn M*S 1 M*Sn
Fast recoverability of underlying form This means that if you take the output S* of a network of this sort, and make the output undergo the network effect once — that’s M S* — [M’s a matrix, S a vector] and subtract that from S* — that’s (I-M) S* — you reconstruct what that network’s input state was. (This would be a highly desirable property if we had designed it in!)
Learnability Work done with Gary Larson, reported in his 1993 dissertation (University of Chicago) In a word: very learnable. Why? Because of the continuous character of the mapping from parameter space to prediction space: a small change in parameters leads to a small change in predictions. . .
a small change in parameters leads to a small change in predictions… in a sense, the opposite of a theory constructed to have a rich deductive structure. Because of continuity, ….
We used a variant of simulated annealing. Simulated annealing is usually used to find an optimal value in state space (with a given set of parameters learned during the learning phase). Its attractiveness is its ability to escape from local optima that aren’t globally optimal. We used a variant during the learning phase….
We establish an initial “temperature” of 100 degrees. Think of “temperature” as a measure of uncertainty. 100= no knowledge; 0 means no need to change one’s mind. Training: present forms with correct stress patterns. If the stress patterns are what the model predicts, decrease the temperature 1 degree….
Training: present forms with correct stress patterns. If the stress patterns are what the model predicts, decrease the temperature 1 degree…. If not, change the parameters (the parameter vector) in a random direction, for a distance proportional to the current temperature. Stop when it’s freezing.
We seek regions, not settings A setting P of parameters is a point, of measure zero. That setting maps onto a “phenomenological” characterization (I. e. , linguist-speak) HP. How big is the region that maps to HP. ? Picture….
The accessability of a metrical system is the measure of the region it maps from (its inverse image) in simple terms, its area (as a proportion of total area) That region is the inverse image of HP: the set of values that are realized as that kind of stress system HP: Penult stress, with alternating stress from right to left
Dynamic computational nets ¶ Brief demonstration of the program · Some background on (some aspects of) metrical theory ¸ This network model as a minimal computational model of the solution we’re looking for. ¹ Its computation of familiar cases º Interesting properties of this network: inversion and learnability » Link to neural circuitry
The challenge of language: n n For the hearer: he must perceive the (intended) objects in the sensory input despite the extremely impoverished evidence of them in the signal -- a task like (but perhaps harder than) visual pattern identification; For the speaker: she must produce and utter a signal which contains enough information to permit the hearer to perceive it as a sequence of linguistic objects.
Never was there a better use of the phrase, “I have a story to tell…. ” Let’s try it anyway.
Let’s interrogate the visual system to see if any of its basic components offer means to do the computation we’re taking a look at today.
Visual context: edge detection Mach bands
Edge detection through lateral inhibition In a 1 - or 2 -dimensional array of neurons, neurons: n a. excite very close neighbors; n b. inhibit neighbors in a wider neighborhood; n c. do not affect cells further away excitation Activation here. . . Region of inhibition
DOGs Center-surround structures are often modeled as “difference of gaussians”: take 2 gaussian distributions of different variances (widths), and you get a sombrero.
Difference of gaussians = sombrero? See white board! The Web failed me.
Old news in phonology? Stress on initial syllable or penult has a demarcative function in phonology -demarking the word-edge for the hearer.
A brief run-through on lateral inhibition. . . n n Hartline and Ratliff 1957 in the horseshoe crab (Limulus) Lateral inhibition leads to contrast enhancement and edge detection, under a wide range of parameter settings. Early models used non-recurrent connections; Later models preferred recurrent patterns of activation. . .
Recurrent lateral inhibition Recurrent models include loops of activation which retain traces of the input over longer micro-periods. (Wilson and Cowan 1972; Grossberg 1973, Amari) Recurrent inhibitory loops also leads to circuits that perform (temporal) frequency detection.
Recurrent lateral inhibition n n …also leads to winner-take-all computations, when the weight of the lateral inhibition is great. Most importantly for us, as noted by Wilson and Cowan 1973, lateral inhibition circuits respond characteristically to spatial frequencies.
n Evolution of thinking about visual cell’s receptive field from simple characteristic field (Hubel & Wiesel) to spatial frequency detector (J. P. Jones and L. A. Palmer 1987 An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. , 58(6): 1233 -1258. )
Initially lateral inhibition gives rise to edge detection, and classic Mach band phenomena. Observe how a recurrent (feedback) competitive network of lateral inhibition gives rise to a pattern of spatial waves.
DOGs suggest. . . Each syllable unit (in this model) has a receptive field. Looking at just its 1 st neighbor to left and right is the crudest simplification of such a model. The next step would be to add a second term for the “over-neighbor”. . .
Overneighbor term If DOG < 0 then we have sombrero pattern
DOG pattern Gives rise to interesting new patterns: for example, over a wide range of negative values for DOG ratio, and a > 0, F= -1, we get a robust antepenult stress pattern (demo). (This appears to be an edge-transient effect, like a large part of the effects seen in this model. ) n
To wrap up: things not spoken of 1. This has been a theory of stress lapse, not a theory of stress clash. Save that for another day. (2 nd model that the brain might use. ) 2. Tone languages 3. Constituents, mainly feet
Addendum: May 19, 1999 n n Let’s bring time and dynamical systems into the picture: by which I mean, computational time = real time; That excludes Right-to-Left system, but leaves open very many complex systems.
Quantity-sensitive L->R alternation As in Yup’ik: In a sequence of light (CV or CVC) syllable, stress even-numbered syllables: n da dá n o 0 o 0 n but you cannot skip a heavy (CVV) syllable:
da dá dáa da dá 0 1 1 0 1 Note that you reset the timing. Some systems reset the timing starting with the Heavy itself, others reset with the next syllable. n
2 oscillators n n n One for stress = Foot, one for syllables. The syllable oscillator is driven by the phonological substance (the consonants and vowels). We need a system with the following properties:
n n If we plot the frequencies of Foot and Syllable against each other, we want to find 1: 1 and 1: 2 are attractor states; when they are in 1: 1 relationship, every syllable is stressed; 1: 2, every second syllable is stressed. This sounds very familiar, but. . .
f=s 1: 2 Foot freq f = 1/2 s 1: 1 Syll frequence
Entrainment with m: n ratios. . . n is common enough; but what is different about this system is that time is of the essence! We don’t have 20+ cycles to hop from one attractor to the other: we have to do that in much less than one cycle.
f=s Foot frequency 1: 2 f = 1/2 s 1: 1 Syllable frequency
Simulation --
Hayes’s generalizations n n Culminativity: each word or phrase has a single strongest syllable bearing the main stress. TRUE IF THAT SYLLABLE IS USED TO MAP A TONE MELODY (ETL). Rhythmic distribution: syllables bearing equal levels of stress tend to occur spaced at equal distances. Stress hierarchies (Liberman/Prince): several levels of stress Lack of assimilation as a natural process
Metrical grid x x x x x The height of the grid marks rhythmic prominence. Each level may represent a possible rhythmic analysis (“layer”).
Goldsmith-Larson (dynamic computational) model Model syllables as units with an activation level; the strength of the activation level roughly corresponds to the height of the column on the metrical grid.
Some generalizations about prosodic systems of the world Very crude distinction between tone and non-tone languages. It’s easier to say what a tone language is; not clear that non-tone languages form a homogeneous group. They have accent/stress. . .
Light editing of Hayes’ typology of accentual systems. . . “Free versus fixed stress”: when is it predictable which syllable is accented. When it is predictable, what kinds of computation are necessary to make the prediction?
Word-based generalizations (i. e. , not sensitive to word-internal morphological structure): Rhythmic versus non-rhythmic systems In rhythmic systems, there are upper limits on how many consecutive unstressed syllables there may be. The usual limit is no more than 1. And the usual limit is no more than 1 consecutive stressed syllable.
I
Hayes’s typologies n n Free vs. fixed stress (predictable or not by rule) Rhythmic versus morphological stress – Morphological: boundary-induced versus use of morphological information to resolve competition n Bounded versus Unbounded stress (length of span of unstressed syllables)
Is the height of a metrical column a value of a variable? n If so, this would explain the Continuous Column Constraint: a grid is ill-formed if a grid mark on level n+1 is not matched by a grid mark on level n in the same column (an effect that shows up in several environments: in stress shift, in immobility of strong beats, main stress placement, in destressing).
Is constituency in metrical structure strongly motivated? #(x. ). . . # á a á a. . . (x. )#. . . á a á a # We could think of assigning trochaic feet, either from left to right or from right to left.
Syllable weight Syllables divided into Heavy and Light syllables, primarily by the sum of the sonority of the post-nuclear material in the syllable. Latin stress rule: n No stress on final syllables; n stress on antepenult if penult is light; else n Stress on (heavy) penult.
Hayes’ parametrical theory n Choice of foot type: – i. size (maximum: unary/binary/ternary/ unbounded) – ii. Q-sensitivity parameter – iii. Trochaic vs. iambic (S/W, W/S) Direction of parsing: rightward, leftward n Iterative foot assignment? n Location (create new metrical layer, new layer) n Extrametricality. . . n
Extrametricality n n n Units (segments, syllables, feet, …) can be marked as extrametrical… if they are peripheral (at the correct periphery)… and enough remains after they become invisible.
Dynamic computational networks (Goldsmith, Larson)
Goal: to find (in some sense) the minimum computation that gets maximally close to the data at hand. What structure is required in the empirically robust cases?
= Network M Input (underlying representation) is a vector Dynamics: (1) Output is S*: equilibrium state of (1), which by definition is: Quite a surprise! Hence:
Learnability n Larson (1992) showed that these phonological systems were highly learnable from surface data.
n A spatial sinewave. . .
n A spatial square wave. . .
Initially lateral inhibition gives rise to edge detection, and classic Mach band phenomena. Observe how a recurrent (feedback) competitive network of lateral inhibition gives rise to a pattern of spatial waves.
1. Introduction and overview; the cognitive task of language -generating and perceiving linguistic objects 2. Linguistics: Metrical stress theory; Goldsmith-Larson model of metrical accentuation 3. Neuro-computation: Lateral inhibition in computational neurobiology 4. Neuro-computation: Neural oscillators 5. Linguistics: Quantity-sensitivity as phase-locking attractor states of a
Present two models today: n n Dynamic computational networks. Work done jointly with Gary Larson. Discrete unit modeling. Coupled harmonic oscillators, to deal with certain types of quantity-sensitive stress assignment (left-to-right only); utilizes attractor states of the dynamical system. Continuous modeling.
Moras and syllables (sequence of CVCVCV…) Moras, Syllables, and Stress
- On the computational efficiency of training neural networks
- Computational biology: genomes, networks, evolution
- Computational fluid dynamic
- Andrea j. goldsmith
- Frankie goldsmith titanic
- Constrained nodes and constrained networks
- The village schoolmaster drawing
- Edie goldsmith
- Andrea goldsmith stanford
- Analogy in the most dangerous game
- Geoff goldsmith
- Oliver goldsmith primary school
- Datagram network
- Basestore iptv
- Ian foster university of chicago
- Loyola chicago nursing
- Csu cougar connect
- Uchicago globus
- Hedera: dynamic flow scheduling for data center networks
- Dynamic image networks for action recognition
- Dynamic dynamic - bloom
- John proctor iii
- Characteristics of computational thinking
- Computational thinking algorithms and programming
- Grc computational chemistry
- Using mathematics and computational thinking
- Computational geometry tutorial
- Neuroscience usc major
- Standard deviation computational formula
- Semi interquartile range
- Computational math
- Decomposition computer science
- Computational sustainability cornell
- Chomsky computational linguistics
- Xkcd computational linguistics
- Comp bio cmu
- Dsp computational building blocks
- Time complexity
- Computational sustainability subjects
- The computational complexity of linear optics
- Leerlijn computational thinking
- Computational demand
- Computational graph backpropagation
- Computational graph
- Computational thinking jeannette m wing
- Computational radiology
- Computational photography uiuc
- Computational neuroethology
- Computational model in computer architecture
- Computational methods in plasma physics
- Computational irreducibility
- Computational geometry
- Computational chemistry branches
- Propositional logic in ai ppt
- Computational chemistry aws
- Computational security
- The computational complexity of linear optics
- Tu bergakademie freiberg computational materials science
- Purdue computational science and engineering
- Nibib.nih.gov computational
- Computational engineering and physical modeling
- Formal vs informal fallacies
- Sp computational formula
- Problem solve
- Computational fluency
- Computational linguist jobs
- Fundamentals of computational neuroscience
- Computational approaches
- Computational lexical semantics
- Computational creativity market trends
- Integrated computational materials engineering
- "computational thinking"
- Computational thinking
- Abstraction computational thinking
- Fluid dynamics
- Computational fluid dynamics
- Cern alice
- Computational linguistics olympiad
- Centre for computational medicine
- Computational photography uiuc
- Computational mathematics
- Computational fluid dynamics
- Computational fluid dynamics
- What is computational thinking?
- Sutherland's law
- Computational intelligence tutorial
- Turing machine formal definition
- Integrated computational materials engineering
- Computational reflection
- Computational philology
- Discrete computational structures
- Jeannette m. wing computational thinking
- Computational thinking jeannette wing
- Slo computational thinking
- Computational thinking jeannette wing
- Computational pharmacology
- Computational diagnostics
- Columbia computational linguistics
- Barefoot computing login
- 沈榮麟
- John hottinger fairfield university
- John von neumann university
- John hopkins university covid 19 map
- John kirby newcastle
- John hopkins
- John kirby newcastle university
- St john's university student employment
- Chicago classification of achalasia
- Urban sprawl ap human geography
- Fire extinguisher training chicago
- In text citation chicago multiple authors
- The academy group chicago
- No dia 1 de janeiro de 1997 chicago amanheceu
- Chicago citavimo stilius
- Direct quote example
- Queen of angels church chicago