$$$
{{ $t($store.state.user.experience_value_in_dollars) }}
Senior
{{ $t($store.state.user.experience_search_name) }}
0
jobs
Senior Data Science Consultant for NLP, forecasting, predictive modelling, Bayesian modelling in finance, healthcare and energy sectors
Suyash Mishra
,
Surrey, United Kingdom
Experience
Other titles
Skills
I'm offering
Data Driven Solution designer with an experience of working in domains like energy, building technology, steel and healthcare. Focusing on Bayesian learning , stochastic optimization, Non-Linear Time Series Forecasting and Natural Language Processing. Aim to identify, curate and nourish impactful ideas in fields of Pharma R&D, Supply chain, Energy and Mobility sectors.
Markets
United Kingdom
Language
English
Fluently
Ready for
Larger project
Ongoing relation / part-time
Available
My experience
2018 - ?
freelance
Data Science Consultant
ZS Associates, Advanced Data Science Group.
Prime responsibilities:
* Leading a team of 7 to consult pharma client for improving sales and marketing strategy.
* Identifying NLP potential to leverage on Clinical Trail Research. Couple of POC completed.
* Markov Model with inhomogeneous time for Modelling Drug Journey (To predict Drug Launch Date).
* Agent Based simulation to understand interaction of multiple agents in Pharma market, driving
Profit, Market Share and rebating strategies.
* Leading Data Science Team to deliver dynamic Targeting App for Physicians in one disease area.
* Exploring COPD (Asthma) market to understand what drug combination (clinical trial feature) will
best suit client demand in competitive settings.
* Developing Capability to exploit Genetic Data with Model AI tool to extract insight for clinical
design
* Leading Team to scale Near Real Time Dynamic Targeting solution across Drug portfolio of clients
* Sequence Data modelling using methodology (Autoencoders, Embedding, LSTM's, GRU's, Panel Based model with Mixed effect) to monitor impact of Clinical and Expense journey on
his health profile.
• Techniques and Methodology used in past and ongoing projects:
* NLP and it's integration with structured data
o NLP on short text.
* Agent based Simulation and Nash Negotiation Equilibrium
* Markov Model with Hierarchical Learning
* Time series forecasting (along with Bayesian Approach)
o Developed a pipeline to deal with non-stationary and heterogeneous time series with integration of external events.
* Network Analysis.
* Autoencoders, Embedding, LSTM's, GRU's, Panel Based model with Mixed effect
2) Siemens Research in Digitalization & Automation Centre (RDA) October 2016 to December • 2017. (Lead Research Engineer)
• Machine Learning / Data Analysis/ Data Visualization/Statistical Modeling•
o Energy Forecasting & Anomaly Detection for in Energy Theft Detection at Junction.•
* Target: The task of theft detection in real time with higher accuracy, algorithm
• deployment at edge node on IOT device and Visualization of info in real time.
* Solution: This task include evaluation of different algorithm which include, random
forest, FF Neural Network, LSTM (RNN), NUPIC (Brain inspired algorithm), Setting up
right evaluation metrics and field testing.
o Natural Language Processing: Clustering, Classification and Integration of user feedback in
• model to understand service request from CTSCAN and XRAY Machines.
o Coal Mortgage Planning:•
* Target: Evaluate Estimated Life of turbine component and What is load
adjustment for the plant which could be done so as to retire plant in next
• year keeping customer profitability in mind.
* Solution: Basic Hidden markov model was used to predict life of Turbine
Blades and there by financial model was coupled with performance data in
order to find year wise load adjustment. Linear programming was done to
• perform this exercise.
o Fault detection, classification and localization on transmission line:•
* Target: Detection, Mapping and localization of transmission faults were key
challenges of the project
* Solution: Blend of physical modeling and Machine learning was deployed to
formalize this problem. Energy Spectrum analysis coupled with LSTM modeling to classify problem (if that was more because of sudden demand rise or it is because of more power consumption or any mishaps at transmission line level etc.?) (In progress)
o SAMS Remodeling and Blending with Machine Learning: SAMS models are legacy models of Siemens developed to predict life of model based on load, start and fueling for a turbine
series. This legacy development was transformed from Fortran to Python and machine
learning was used to predict different parameter like heat transfer coefficient, heat release and combustion parameter which were traditional used in model by linear modeling. This
whole exercise helped us to integrate these models to integrate with TSDB like influx which
make querying fast and extract acceptable result.•
• AWS Cloud and IOT experience:
o Redshift vs Athena vs Spectrum vs Hybrid View Evaluation for Query Performance:
* Target: This task was created in order to overcome challenges faced by different
business unit with Red shift query Performance (which was because of bad data distribution in slices)
• Athena was successfully deployed and first set of TRACE P queries were conducted on
both Red-shift and Athena. Preliminary results have been published in order to get first
sense of performance of Athena when compared with Red shift.
• Hybrid view evaluation was conducted following athena activity to understand how
much querying hot(3 year data) and cold data in combination is feasible with enhanced
performance.
• Data pipeline for UOM Conversion
o Target: A data pipe line comprising of different AWS component was created to convert units of measurement (uom) in raw files. This task was performed by spawning emr cluster and on
top of which spark code was executed to do the desired job.
o Solution: This pipe line was executed every 2 hours where a cloud watch event trigger lambda to copy data from landing zone to processing zone where UOM conversion is executed, once
the conversion s done successfully, a SNS message is use to trigger another Lambda for
Terminating EMR cluster and putting files in processed zone CloudWatch>lambda>s3>data
pipeline>EMR>SNS>S3
• Shell Script execution with AWS Lambda
o Target: Creation of a aws lambda package where a specific shell script residing inside an EC2
machine is to be executed. The execution of the shell script should be performed in a FIFO
blocking queue, i.e only 1 instance of lambda should be running at a given moment and all
other requests should be queued. Following frameworks were used S3->Lambda-
>CloudWatch->SQS->EC2
• Aws Aurora update based of S3 bucket update:
o Target: A lambda function has been created to update aurora database based on event
(file landing in s3 bucket) followed by SNS notification. S3>Lambda>Aurora>SNS
o
• A pipeline has been setup to transfer data from edge device to AWS Cloud. Protocol Selection
(MQTT vs TCP)
1. Throughput finding (based on optimization, compressed and un compressed data packets)
2. What minimum network bandwidth is required to get maximum throughput without any
data loss. Thought put at both AWS cloud level and insertion at Cassandra level.
3. Raspberry pi , Temp humidity sensor, AWS EC2 machine, AWS IOT stack, Cassandra,
Grafana
4. A similar exercise was performed by replacing Cassandra with influx and improved
throughput was achieved with Influx. Visualization was carried out in Grafana
* Leading a team of 7 to consult pharma client for improving sales and marketing strategy.
* Identifying NLP potential to leverage on Clinical Trail Research. Couple of POC completed.
* Markov Model with inhomogeneous time for Modelling Drug Journey (To predict Drug Launch Date).
* Agent Based simulation to understand interaction of multiple agents in Pharma market, driving
Profit, Market Share and rebating strategies.
* Leading Data Science Team to deliver dynamic Targeting App for Physicians in one disease area.
* Exploring COPD (Asthma) market to understand what drug combination (clinical trial feature) will
best suit client demand in competitive settings.
* Developing Capability to exploit Genetic Data with Model AI tool to extract insight for clinical
design
* Leading Team to scale Near Real Time Dynamic Targeting solution across Drug portfolio of clients
* Sequence Data modelling using methodology (Autoencoders, Embedding, LSTM's, GRU's, Panel Based model with Mixed effect) to monitor impact of Clinical and Expense journey on
his health profile.
• Techniques and Methodology used in past and ongoing projects:
* NLP and it's integration with structured data
o NLP on short text.
* Agent based Simulation and Nash Negotiation Equilibrium
* Markov Model with Hierarchical Learning
* Time series forecasting (along with Bayesian Approach)
o Developed a pipeline to deal with non-stationary and heterogeneous time series with integration of external events.
* Network Analysis.
* Autoencoders, Embedding, LSTM's, GRU's, Panel Based model with Mixed effect
2) Siemens Research in Digitalization & Automation Centre (RDA) October 2016 to December • 2017. (Lead Research Engineer)
• Machine Learning / Data Analysis/ Data Visualization/Statistical Modeling•
o Energy Forecasting & Anomaly Detection for in Energy Theft Detection at Junction.•
* Target: The task of theft detection in real time with higher accuracy, algorithm
• deployment at edge node on IOT device and Visualization of info in real time.
* Solution: This task include evaluation of different algorithm which include, random
forest, FF Neural Network, LSTM (RNN), NUPIC (Brain inspired algorithm), Setting up
right evaluation metrics and field testing.
o Natural Language Processing: Clustering, Classification and Integration of user feedback in
• model to understand service request from CTSCAN and XRAY Machines.
o Coal Mortgage Planning:•
* Target: Evaluate Estimated Life of turbine component and What is load
adjustment for the plant which could be done so as to retire plant in next
• year keeping customer profitability in mind.
* Solution: Basic Hidden markov model was used to predict life of Turbine
Blades and there by financial model was coupled with performance data in
order to find year wise load adjustment. Linear programming was done to
• perform this exercise.
o Fault detection, classification and localization on transmission line:•
* Target: Detection, Mapping and localization of transmission faults were key
challenges of the project
* Solution: Blend of physical modeling and Machine learning was deployed to
formalize this problem. Energy Spectrum analysis coupled with LSTM modeling to classify problem (if that was more because of sudden demand rise or it is because of more power consumption or any mishaps at transmission line level etc.?) (In progress)
o SAMS Remodeling and Blending with Machine Learning: SAMS models are legacy models of Siemens developed to predict life of model based on load, start and fueling for a turbine
series. This legacy development was transformed from Fortran to Python and machine
learning was used to predict different parameter like heat transfer coefficient, heat release and combustion parameter which were traditional used in model by linear modeling. This
whole exercise helped us to integrate these models to integrate with TSDB like influx which
make querying fast and extract acceptable result.•
• AWS Cloud and IOT experience:
o Redshift vs Athena vs Spectrum vs Hybrid View Evaluation for Query Performance:
* Target: This task was created in order to overcome challenges faced by different
business unit with Red shift query Performance (which was because of bad data distribution in slices)
• Athena was successfully deployed and first set of TRACE P queries were conducted on
both Red-shift and Athena. Preliminary results have been published in order to get first
sense of performance of Athena when compared with Red shift.
• Hybrid view evaluation was conducted following athena activity to understand how
much querying hot(3 year data) and cold data in combination is feasible with enhanced
performance.
• Data pipeline for UOM Conversion
o Target: A data pipe line comprising of different AWS component was created to convert units of measurement (uom) in raw files. This task was performed by spawning emr cluster and on
top of which spark code was executed to do the desired job.
o Solution: This pipe line was executed every 2 hours where a cloud watch event trigger lambda to copy data from landing zone to processing zone where UOM conversion is executed, once
the conversion s done successfully, a SNS message is use to trigger another Lambda for
Terminating EMR cluster and putting files in processed zone CloudWatch>lambda>s3>data
pipeline>EMR>SNS>S3
• Shell Script execution with AWS Lambda
o Target: Creation of a aws lambda package where a specific shell script residing inside an EC2
machine is to be executed. The execution of the shell script should be performed in a FIFO
blocking queue, i.e only 1 instance of lambda should be running at a given moment and all
other requests should be queued. Following frameworks were used S3->Lambda-
>CloudWatch->SQS->EC2
• Aws Aurora update based of S3 bucket update:
o Target: A lambda function has been created to update aurora database based on event
(file landing in s3 bucket) followed by SNS notification. S3>Lambda>Aurora>SNS
o
• A pipeline has been setup to transfer data from edge device to AWS Cloud. Protocol Selection
(MQTT vs TCP)
1. Throughput finding (based on optimization, compressed and un compressed data packets)
2. What minimum network bandwidth is required to get maximum throughput without any
data loss. Thought put at both AWS cloud level and insertion at Cassandra level.
3. Raspberry pi , Temp humidity sensor, AWS EC2 machine, AWS IOT stack, Cassandra,
Grafana
4. A similar exercise was performed by replacing Cassandra with influx and improved
throughput was achieved with Influx. Visualization was carried out in Grafana
Lambda, Pharma, NLP, Cassandra, Localization, Testing, Development, Health, Ai, Digitalization, FORTRAN, App, Redshift, Natural, Feature, Science, Power, Visualization, Energy, Processing, Hybrid, Energy, Agent, UP, Basic, Database, Design, Python, Data Analysis, MArketing Strategy, Machine learning, Data Science, AWS, Research, Forecasting, Automation, Deployment, Marketing, Cloud, Event, Iot, Integration, Sales, Node, Service, Network, Spark, Raspberry Pi, It
2013 - 2016
job
Associate Engineer (R&D)
Caterpillar India Private Limited Chennai.
Experienced in Validation, Certification, Simulation and Analysis for CAT and Perkins Engines
and Machines.•
o Python scripts to create and update MYSQL database with field machine data. Developed
capability to perform day, week, and monthly statistical analysis report with better
visualization (graphs).
o Model Quoted price for industrial tube assemblies: How supplier will quote price of tube
from large fleet available for different CAT machines. Solution provided: Developed python
script based on XGB model to quote pricing with an accuracy of 78%.
o Predict machine failure in North Asia based on 1) Component failure, 2) Drivability
Pattern 3) Fuel used 4) Geographical Location and highlight next strategy for R&D and
NPI team to take corrective measures: Developed model based on random forest and SVM (kernel) to classify different machine in different failure group and then identify
correlation to between different parameter to understand right reason for failure.
and Machines.•
o Python scripts to create and update MYSQL database with field machine data. Developed
capability to perform day, week, and monthly statistical analysis report with better
visualization (graphs).
o Model Quoted price for industrial tube assemblies: How supplier will quote price of tube
from large fleet available for different CAT machines. Solution provided: Developed python
script based on XGB model to quote pricing with an accuracy of 78%.
o Predict machine failure in North Asia based on 1) Component failure, 2) Drivability
Pattern 3) Fuel used 4) Geographical Location and highlight next strategy for R&D and
NPI team to take corrective measures: Developed model based on random forest and SVM (kernel) to classify different machine in different failure group and then identify
correlation to between different parameter to understand right reason for failure.
Mysql, Python, R, Database, Visualization, Npi, Asia
2013 - 2013
internship
Best Graduate Engineer Trainee
Hero MotoCorp Private Limited.
5) SOFTWARE AND PROGRAMMING SKILLS:
• Experience with Image, Text and Numerical data set at large scale
o Python Scripting (6 Years Exp.) (Libraries: numpy, Matplotlib , panda, hdf5, pybrain,
PyPi,
o Stats models, SciPy & scikit - Learn, Visualization library: matplotlib, seaborn, bokeh,
plotly)
o Deep Learning Framework (PyTorch, TensorFlow, Keras)
o Statistical modelling
o SQL
o Extensively worked with Aurora & MySql database (Relational DB)
o Time series DB: Influx, Druid
o Visualization Tool Grafana (for real time & large volume of data)
o Introductory spark sql and core (Map Reduce paradigm, transformation etc.)
o Blockchain Enthusiast: Hyperledger Fabric used for demo (Beginner)
o Databricks Platform with Pyspark programming (Data Pre-Processing & Machine Learning)
o AWS cloud computing: Extensive experience in Lambda, sns,
sqs,ec2,3,cloudwatch,datapipeline,Greengrass,EMR,Redshift,Athena, Spectrum.
• Experience with Image, Text and Numerical data set at large scale
o Python Scripting (6 Years Exp.) (Libraries: numpy, Matplotlib , panda, hdf5, pybrain,
PyPi,
o Stats models, SciPy & scikit - Learn, Visualization library: matplotlib, seaborn, bokeh,
plotly)
o Deep Learning Framework (PyTorch, TensorFlow, Keras)
o Statistical modelling
o SQL
o Extensively worked with Aurora & MySql database (Relational DB)
o Time series DB: Influx, Druid
o Visualization Tool Grafana (for real time & large volume of data)
o Introductory spark sql and core (Map Reduce paradigm, transformation etc.)
o Blockchain Enthusiast: Hyperledger Fabric used for demo (Beginner)
o Databricks Platform with Pyspark programming (Data Pre-Processing & Machine Learning)
o AWS cloud computing: Extensive experience in Lambda, sns,
sqs,ec2,3,cloudwatch,datapipeline,Greengrass,EMR,Redshift,Athena, Spectrum.
Block Chain, Framework, Processing, Demo, Visualization, Lambda, Redshift, Software, Keras, Transformation, Spark, Mysql, Tensorflow, Cloud, Database, Deep learning, Scripting, AWS, Machine learning, Python, Sql
My education
Indian Institute of Management
N/a, Executive Programme on Business Analytics and Intelligence
N/a, Executive Programme on Business Analytics and Intelligence
National Institute of Technology
Bachelors, Mechanical Engineering
Bachelors, Mechanical Engineering
Suyash's reviews
Suyash has not received any reviews on Worksome.
Contact Suyash Mishra
Worksome removes the expensive intermediaries and gives you direct contact with relevant talent.
Create a login and get the opportunity to write to Suyash directly in Worksome.
38000+ qualified freelancers
are ready to help you
Tell us what you need help with
and get specific bids from skilled talent in Denmark