The scope of this project can be adapted to PhD, MPhil, or Honours.
Background
Human speech and language are complex traits with significant phenotypic diversity, yet we know surprisingly little about the normal variation in these traits across the population. Understanding the genetic architecture of speech and language and their relationship with physical and mental health is crucial for uncovering the underlying biological mechanisms. This project aims to fill this knowledge gap by leveraging advanced computational techniques and extensive twin datasets to characterise the diversity and heritability of speech and language phenotypes and to explore their genetic correlations with health outcomes.
Approaches, skills and techniques that will be developed
The student will employ a multi-faceted approach, developing and applying various advanced techniques to achieve the project’s objectives.
- Machine Learning and Natural Language Processing: The student will develop and apply machine learning models and natural language processing (NLP) techniques to process and characterise lexical phenotypes from a dataset of over 10,000 Australian twins. These skills are crucial for analysing and understanding the complexities of human speech and language.
- Phenotypic Characterisation and Heritability Estimation: Using the twin dataset, the student will characterise the phenotypic diversity of speech and language traits and estimate their heritability to understand the extent of genetic influence. Skills in statistical genetics and twin study methodologies will be developed and refined.
- Genome-Wide Association Studies (GWAS): The student will conduct GWAS to identify genetic variants associated with speech-related traits. This involves managing large genomic datasets, performing quality control, and applying sophisticated statistical analyses to uncover genetic architecture.
- Correlational Analysis: The student will investigate phenotypic and genetic correlations between speech-related traits and physical and mental health. This will include using statistical techniques to assess the relationship between different phenotypes and leveraging genomic data to explore genetic correlations.
Outcomes
- Characterisation of the phenotypic diversity of speech and language traits in the Australian twin population, providing insights into normal variation.
- Accurate estimates of the heritability of speech and language traits contribute to our understanding of the genetic basis of these complex traits.
- GWAS has identified genetic variants associated with speech and language traits, revealing new insights into the genetic architecture and biological pathways involved.
- Discovery of phenotypic and genetic correlations between speech-related traits and physical and mental health, offering potential implications for health outcomes and therapeutic interventions.
Suitable background
The ideal candidate should possess the following skills and qualifications:
- Knowledge of genetic principles and computational biology techniques is essential. Experience with GWAS and statistical genetics is highly desirable.
- Familiarity with programming languages such as Python or R is necessary. Skills in machine learning and natural language processing are crucial for processing and analysing lexical data would be a plus.
- Experience with managing and analysing large datasets, including genomic data, is important. The candidate should be proficient in data cleaning, quality control, and statistical analysis.
- The ability to critically analyse complex data and interpret results is essential. The candidate should be capable of integrating findings from various sources and drawing meaningful conclusions.
- Strong written and verbal communication skills are important for presenting research findings and collaborating with a multidisciplinary team.