Brief Description of position:
The Group: Morningstar’s Research group provides independent analysis on individual securities, funds, markets, and portfolios. The Research group also provides data on hundreds of thousands of investment offerings, including stocks, mutual funds, and similar vehicles, along with real-time global market data on millions of equities, indexes, futures, options, commodities, and precious metals, in addition to foreign exchange and Treasury markets. Morningstar is one of the largest independent sources of fund, equity, and credit data and research in the world, and our advocacy for investors’ interests is the foundation of our company.
The Role: As a Data Scientist, you will be a leading contributor in the implementation of Artificial Intelligence (AI) within Data Collections software applications, API’s, and other data products. This role requires significant interaction with both upstream and downstream stakeholders across Technology, Data, Products, Sales/Service, and Research
As an expectation a fitting candidate must have/be:
- Ability to analyze business problem and cut through the data challenges.
- Ability to churn the raw corpus and develop a data/ML model to provide business analytics (not just EDA), machine learning based document processing and information retrieval
- Quick to develop the POCs and transform it to high scale production ready code.
- Experience in extracting data through complex unstructured documents using NLP based technologies.
Good to have: Document analysis using Image processing/computer vision and geometric deep learning
Python as a primary programming language.
Conceptual understanding of classic ML Algorithms like Support Vectors, Decision tree, Clustering, Random Forest, CART, Ensemble
- Must Have: Must be hands-on with data structures using List, tuple, dictionary, collections, iterators, Pandas, NumPy and Object-oriented programming
- Good to have: Design patterns/System design, cython
- ML libraries:
- Must Have: Scikit-learn, XGBoost, imblearn, SciPy, Gensim
- Good to have: matplotlib/plotly, Lime/sharp
- Data extraction and handling:
- Must Have: DASK/Modin, beautifulsoup/scrappy, Multiprocessing
- Good to have: Data Augmentation, Pyspark, Accelerate
- NLP/Text analytics:
- Must Have: Bag of words, text ranking algorithm, Word2vec, language model, entity recognition, CRF/HMM, topic modelling, Sequence to Sequence
- Good to have: Machine comprehension, translation, elastic search
- Deep learning:
- Must Have: TensorFlow/PyTorch, Neural nets, Sequential models, CNN, LSTM/GRU/RNN, Attention, Transformers, Residual Networks
- Good to have: Knowledge of optimization, Distributed training/computing, Language models
- Software peripherals:
- Must Have: REST services, SQL/NoSQL, UNIX, Code versioning
- Good to have: Docker containers, data versioning
How is it to work with data collection at Morningstar?
You get to work on
- Research work coupled with business value
- Machine learning development Lifecyle, i.e. End to end project development (Not just POCs)
- Exposure to advanced workspace on cloud environment
- Encouragement for innovation and ideation
Natural Language Processing,