Graph Day Returns to Data Day Texas

Yes! Graph Day returns to Data Day Texas! Every time we host Graph Day, it gets bigger. This one will be a full house. If you want to take advantage of discount tickets, don't daudle. Purchase yours now. Expect surprises. Check this page for updates.

Confirmed Graph Day Sessions

Keynote: The State of JanusGraph 2018

Ted Wilmes - Expero

Graph database adoption increased at a rapid clip in 2017 and shows no sign of slowing down as we begin 2018. When coupled with the right problem set, it's a compelling solution and word has spread from the startup world all the way to the Fortune 500. JanusGraph, an Apache TinkerPop compatible fork of the popular Titan graph database, was one of the newcomers into the marketplace last year. Its future was uncertain, but a dedicated community coalesced around it and three releases later and with an ever growing list of contributors, it is here to stay. This talk will introduce JanusGraph, discuss where it fits into the existing graph database ecosystem, and then review the progress made over the past year along with an eye to what exciting things are coming up in 2018.

Improving Graph Based Entity Resolution using Data Mining and NLP

David Bechberger - Gene by Gene

“Hey, here are those new data files to add. I ‘cleaned’ them myself so it should be easy. Right?”
Words like these strike fear into the heart of all developers but integrating ‘dirty’ unstructured, denormalized and text heavy datasets from multiple locations is becoming the de facto standard when building out data platforms.
In this talk we will look at how we can augment our graph’s attributes using techniques from data mining (e.g. string similarity/distance measures) and Natural Language Processing (e.g. keyword extraction, named entity recognition). We will then walkthrough an example using this methodology to demonstrate the improvements in the accuracy of the resulting matches.

Data Science Tools: Cypher for Data Munging

Ryan Boyd - Neo4j

Running data analysis using tools like Pandas, Scikit-Learn, or Apache Spark requires that your data be in a clean format. However, as data scientists, we're often forced to bring data in from many different sources and understand the relationships between the data before running our analysis.
This session will discuss and show how we can use the power of the Cypher query language to bring data in from a variety of different sources, clean it, and prepare it for analysis in a variety of tools. We'll also show how we can supplement the native functionality available in Cypher with APOC - an amazing library of hundreds of utility functions for cleaning, refactoring and analyzing data.
While Cypher is currently used in databases like Neo4j and SAP HANA to query graph structures, it can now be used on Apache Spark with the CAPS alpha project. We'll show how Cypher can be used for Data Prep in both of these scenarios.

Everything is not a graph problem (but there are plenty)

Dr. Denise Gosnell - DataStax

As the reality of the graph hype cycle sets in, the graph pragmatists have shown up to guide the charge. What we are seeing and experiencing is an adjustment in mindset: the convergence to multi-model database systems parallels the mentality of using the right tool for the problem. With graph databases, there is an intricate balance to find where the rubber meets the road between theorists and practitioners.
Before hammering away on the keyboard to insert vertices and edges, it is crucial to iterate and drive the development life cycle from definitive use cases. Too many times the field has seen monoglot system thinking pressure the construction of the one graph that can rule it all which can result in some impressive scope creep. In this talk, Dr. Gosnell will walk through common solution design considerations that can make or break a graph implementation and suggest some best practices for navigating common misconceptions.

How to Destroy Your Graph Project with Terrible Visualization

Corey Lanum - Cambridge Intelligence

We are all using graphs for a reason - in many cases, it's because the graph model presents an intuitive view of the data. Unfortunately, the most elegant graph data models can often be stymied by bad visualizations that obscure rather than enlighten. In this talk, Corey Lanum will discuss a number of bad practices in graph visualization that are surprisingly common. He will then outline graph visualization best practices to help create visual interfaces to graph data that convey useful insight into the data.

Real-time deep link analytics: The next stage of graph analytics

Dr. Victor Lee - TigerGraph

Graph databases are the fastest growing category in data management, according to DB-Engines. However, most graph queries only traverse two hops in big graphs due to limitations in most graph databases. Real-world applications require deep link analytics that traverse far more than three hops. To support real-time deep link analytics, we need the power of combining real-time data updates, big datasets, and deep link traversals.
Dr. Victor Lee offers an overview of TigerGraph’s distributed Native Parallel Graph, a fraud detection system that manages 100 billion graph elements to detect risk and fraudulent groups. Yu discusses the techniques behind the distributed native parallel graph platform, including how it partitions graph data across machines, supports fast update, and is still able to perform fast graph traversal and computation. He also shares a subsecond real-time fraud detection system managing 100 billion graph elements to detect risk and fraudulent groups.
(Product Showcase)

Understanding People Using Three Different Kinds of Graphs

Misty Nodine - Spiceworks

There are various ways that we can learn about people using graph-based approaches.
Social graphs – These graphs help understand people via the connections they have with other people. They are characterized by having one kind of node type (person) and one type of edge type (whatever social relationship the graph is representing). Typical questions we ask in this space are: How important is this person in this relationship? How well-connected are the people? What are the interesting groups?
Knowledge graphs – These graphs represent information we have about a user, what things we can know about them. For instance, it may have nodes not only for people but for places, or companies. There are also a variety of edge types, like ‘lives_in’ between a person and a city. Knowledge graphs typically take two forms: RDF or entity-relationship. The RDF representations also are related to ontologies and the semantic web. Knowledge graphs enable you to leverage existential knowledge or knowledge related to other people to understand a person. Hence, these are graphs that we reason over. Example questions that a knowledge graph might answer include: How big a company does this person work for?
Probabilistic graphical models – Probabilistic graphical models allow us to infer information about a person based on things we have observed directly about the person based on probabilistic relationships. In a PGM, the nodes represent specific things you can observe (variables), and each edge has the conditional dependencies between the two variables. In real life, we observe actual values for some subset of the nodes and can then know the probabilities for the values of the unobserved variables.
This talk will provide an overview of these three different kinds of graphs and their desirable properties, and the algorithms and approaches that you use over those graphs to understand more about a person.

Graph Convolutional Networks for Node Classification

Steve Purves - Expero

We describe a method of classifying nodes in an information network by application of a non-Euclidean convolutional neural network. The convolutional layers are kernelized to operate directly on the natural manifold of the information space, and thus produce output more accurate than analysis on information arbitrarily embedded in a Euclidean geometry. First, we describe the benefits of operating in a non-Euclidean geometry. We then sketch out how graph convolutional networks work. Finally, we demonstrate the application of this technique by predicting the credit-worthiness of applicants based on their population characteristics and their relationships to other individuals.

Writing Distributed Graph Algorithms

Andrew Ray - Sam's Club

Distributed graph algorithms are an important concept for understanding large scale connected data. One such algorithm, Google’s PageRank, changed internet search forever. Efficient implementations of these algorithms in distributed systems are essential to operate at scale.
This talk will introduce the main abstractions for these types of algorithms. First we will discuss the Pregel abstraction created by Google to solve the PageRank problem at scale. Then we will discuss the PowerGraph abstraction and how it overcomes some of the weaknesses of Pregel. Finally we will turn to GraphX and how it combines together some of the best parts of Pregel and PowerGraph to make an easier to use abstraction.
For all of these abstractions we will discuss the implementations of three key examples: Connected Components, Single Source Shortest Path, and PageRank. For the first two abstractions this will be in pseudo code and for GraphX we will use Scala. At the end we will discuss some practical GraphX tips and tricks.

Fishing Graphs in a Hadoop Data Lake

Claudius Weinberger - ArangoDB

Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses. Increasingly data has a natural structure as a graph, with vertices linked by edges, and many questions arising about the data involve graph traversals or other complex queries, for which one does not have an a priori given bound on the length of paths.
Spark with GraphX is great for answering relatively simple graph questions which are worth starting a Spark job for because they essentially involve the whole graph. But does it make sense to start one for every ad-hoc query or is it suitable for complex real-time queries?

Confirmed Graph Day Speakers

Ted Wilmes (Oklahoma City) @trwilmes

Ted Wilmes, Data Architect at Expero, is a graduate of Trinity University where he studied computer science and art history. He started his professional career at a not-for-profit research and development institution where he performed contract software development work for a variety of government and commercial clients. During this time he worked on everything from large enterprise systems to smaller, cutting edge research and development projects. One of the most rewarding parts of each of these projects was the time spent collaborating with the customer.
As Ted’s career continued, he moved on to an oil and gas startup and continued to dig deeper into the data side of software development, gaining an even deeper interest in how databases work and how to eek as much performance out of them as possible. During this time he became interested in the application of graph databases to certain problem sets. Today, at Expero, Ted enjoys putting his deep knowledge of transactional graph computing to work as he helps customers of all types navigate the burgeoning property graph database landscape.
Outside of work, Ted enjoys spending time with his family out-of-doors, listening to and playing loud music, and contributing to the Apache TinkerPop project as a committer and PMC member.
Ted will be giving the Graph Day keynote: The State of JanusGraph 2018

Jans Aasman (SF Bay)

Jans Aasman (Wikipedia / LinkedIn) is a Ph.D. psychologist and expert in Cognitive Science - as well as CEO of Franz Inc., an early innovator in Artificial Intelligence and provider of the graph database, AllegroGraph. As both a scientist and CEO, Dr. Aasman continues to break ground in the areas of Artificial Intelligence and Knowledge Graphs as he works hand-in- hand with numerous Fortune 500 organizations as well as US and Foreign governments. Jans recently authored an IEEE article on “Enterprise Knowledge Graphs”.
Dr. Aasman spent a large part of his professional life in telecommunications research, specializing in applied Artificial Intelligence projects and intelligent user interfaces. He gathered patents in the areas of speech technology, multimodal user interaction, recommendation engines while developing precursor technology for tablets and personal assistants. He was also a professor in the Industrial Design department of the Technical University of Delft. Dr. Aasman is a noted conference speaker at such events as Smart Data, NoSQL Now, International Semantic Web Conference, GeoWeb, AAAI, Enterprise Data World, Text Analytics, and TTI Vanguard to name a few.
Jans will be giving the following Graph Day presentation: Navigating Time and Probability in Knowledge Graphs.

Dave Bechberger (Houston)

Dave Bechberger is a Sr. Architect at Gene by Gene, a genetic genealogy and bioinformatics company, where he works extensively on developing their next-generation data architecture. Dave has spent his career engaging in full stack software development but specializes in building data architectures in complex data domains such as bioinformatics, oil and gas, supply chain management, etc. He uses his knowledge of graph and other big data technologies to build out highly performant and scalable systems. Dave has previously spoken at a variety of international technical conferences including NDC Oslo, NDC London, and Graph DayTexas.
Dave will co-present the following Graph Day session: Improving Graph Based Entity Resolution using Data Mining and NLP.

Ryan Boyd (SF Bay) @ryguyrg

Ryan Boyd (Linkedin) is a SF-based software engineer focused on helping developers understand the power of graph databases. Previously he was a product manager for architectural software, built applications and web hosting environments for higher education, and worked in developer relations for twenty products during his 8 years at Google. He enjoys cycling, sailing, skydiving, and many other adventures when not in front of his computer.
Ryan will present the following Graph Day session: Data Science Tools: Cypher for Data Munging.

Dr. Denise Koessler Gosnell (Charleston) @DeniseKGosnell

In August 2017, Dr. Denise Gosnell, transitioned into a Solutions Architect position with DataStax where she aspires to build upon her experiences as a data scientist and graph architect to further their established line of graph solutions. Prior to her role with DataStax, Dr. Gosnell was a Data Scientist and Technology Evangelist at PokitDok. During her three years with PokitDok, she built software solutions for and spoke at over a dozen conferences on permissioned blockchains, machine learning applications of graph analytics, and data science within the healthcare industry.
Dr. Gosnell earned her Ph.D. in Computer Science from the University of Tennessee. Her research on how our online interactions leave behind unique identifiers that form a “social fingerprint” led to presentations at major conferences from San Diego to London and drew the interest of such tech industry giants as Microsoft Research and Apple. Additionally, she was a leader in addressing the underrepresentation of women in her field and founded a branch of Sheryl Sandberg’s Lean In Circles.
Dr. Gosnell will be giving the following Graph Day presentation: Everything is not a graph problem (but there are plenty).

Michael Grove (Washington, DC) @mikegrovesoft

Michael Grove is VP of Engineering and co-founder of Stardog where he oversees the development of the Stardog Knowledge Graph Platform. Michael studied Computer Science at the University of Maryland and is an alumnus of its well-regarded MIND Lab which specialized in semantic technologies. Before Stardog, he worked at Fujitsu Resarch on the use of graphs and semantic technologies in pervasive computing environments. Michael is an expert in large scale database and reasoning systems and has worked with graphs and graph databases for nearly fifteen years.

Gunnar Kleemann (Berkeley / Austin)

Gunnar Kleemann is a Data Scientist with the Berkeley Data Science Group (BDSG). He is interested in how data science facilitates biological discovery and lowers the barrier to high-throughput research, particularly in small, independent labs. In addition to his work with BDSG, he is also involved in the development and implementation of technologies like the ATX Hackerspace Biology Laboratory.
Gunnar holds a PhD in Molecular Genetics from Albert Einstein College of Medicine and a Master’s in Data Science from UC Berkeley. He did post-doctoral research on the genomics of aging at Princeton University, where his research focused developing high throughput robotic assays to understand how genetic changes alter lifespan and reproductive biology.

Corey Lanum (Boston) @corey_lanum

Corey Lanum (LinkedIn), has a distinguished background in graph visualization. Over the last 15 years he has managed technical and business relationships with dozens of the largest defense and intelligence agencies in North America, in addition to working with many security and anti-fraud organizations in private industry. Prior to joining Cambridge Intelligence as their US Manager, Corey was helping the customers of i2 (now IBM) and SS8 to solve their most complex graph data challenges.
Corey is the author of Visualizing Graph Data from Manning Publications
Corey will co-present the following Graph Day session: How to Destroy Your Graph Project with Terrible Visualization.

Victor Lee (Kent, Ohio)

Dr. Victor Lee is Senior Product Manager at TigerGraph, bringing together a strong academic background, decades of experience in the technology sector, and a strong commitment to quality and serving customer needs. His first stint in Silicon Valley was as an IC circuit designer and technology transfer manager, before returning to school for his computer science PhD, focusing on graph data mining. He received his BS in Electrical Engineering and Computer Science from UC Berkeley, MS in Electrical Engineering from Stanford University, and PhD in Computer Science from Kent State University. Before joining TigerGraph, Victor was a visiting professor at John Carroll University.
Dr. Lee will be giving the following Graph Day presentation: Real-time deep link analytics: The next stage of graph analytics

William Lyon (SFBay) @lyonwj

William Lyon is a software developer at Neo4j, the open source graph database. As an engineer on the Developer Relations team, he works primarily on integrating Neo4j with other technologies, building demo apps, helping other developers build applications with Neo4j, and writing documentation. Prior to joining Neo, William worked as a software developer for several startups in the real estate software, quantitative finance, and predictive API fields. William holds a Masters degree in Computer Science from the University of Montana. You can find him online at lyonwj.com.

Misty Nodine (Austin)

Misty Nodine (Linkedin / GitHub) has a long history of being interested in trying to understand, organize, and make sense of complexity. She is a respected researcher and developer in the areas of natural language processing, information and knowledge management, agent-based information systems, communications system management, and collaboration management. More recently, she has been focused more on data pipelines and data architectures, specifically for developing a comprehensive understanding of users to improve recommendations and ad targeting.
Misty received her Ph.D. in Computer Science from Brown University in 1993. She received her S.B. and S.M. in EECS from Massachusetts Institute of Technology. She has 30+ years of experience in computer and data science, both in
industrial research and in startup companies. She is currently the Data Architect at Spiceworks.
Misty will be giving the following Graph Day presentation: Understanding People Using Three Different Kinds of Graphs

Jason Plurad (Raleigh-Durham) @pluradj

Jason Plurad is a software developer on IBM's Open Technologies team. He is a committer on Apache TinkerPop, an open source graph computing framework. Jason engages in full stack development (including front end, web tier, NoSQL databases, and big data analytics) and promotes adoption of open source technologies into enterprise applications, service, and solutions. He has spoken previously at IBM conferences (Innovate, Insight) and Triangle Hadoop Users Group meetups.
Jason will be presenting the following Graph Day session: Powers of Ten Redux

Steve Purves (Tenerife, Islas Canarias) @stevejpurves

Steve Purves, Senior Software Developer at Expero, describes himself as an engineer first and foremost. He is comfortable working full-stack, cross-platform in a range of languages and is happiest when there is some mathematical or scientific analysis sprinkled in. He graduated in electrical engineering specializing in signal and image processing, which he took into the scientific computing field in the Oil and Gas industry.
During that time his work was largely split into three: development of low-level number-crunching libraries (C, C++, CUDA) and the cross-platform desktop application with 3D visualization to drive it; applied research in signal processing, numerical analysis algorithm development for 3D seismic analysis, during which he was an IEEE journal geek; and finally management of R&D and Product development teams as CTO, championing practices like TDD, BDD and Agile to get it done.
Around 5 years ago, the excitement of daily binary builds wore thin and Steve got hooked on building applications for the web, starting out with web-desktop integration work for seismic analysis on the iPad. Since then activities have included working on full-stack web applications, with and without desktop integration, for startups in sectors such as Dental, TV Production and Software Micro-Consulting.
Today, he builds reactive web applications with Expero, which feeds his desire to learn and work on industrial-strength projects. Steve waits patiently, with ES6 JavaScript and Jupyter Notebooks at the ready, for the imminent explosion of scientific computing on the web.
Steve will be giving the following Graph Day presentation: Graph Convolutional Networks for Node Classification

Andrew Ray (Bentonville, Arkansas)

Andrew Ray is a Senior Technical Expert at Sam’s Club Technology. He is passionate about big data and has extensive experience working with Apache Spark and Hadoop. Andrew is an active contributor to the Apache Spark project including SparkSQL and GraphX. At Walmart Andrew built an analytics platform on Hadoop that integrated data from multiple retail channels using fuzzy matching and distributed graph algorithms. Andrew also led the adoption of Spark at Walmart from proof-of-concept to production. Andrew earned his Ph.D. in Mathematics from the University of Nebraska, where he worked on extremal graph theory.
Andrew will be giving the following Graph Day presentation: Writing Distributed Graph Algorithms

Denis Vrdoljak (SF Bay)

Denis Vrdoljak (Co-Founder and Managing Director at the Berkeley Data Science Group (BDSG)): Denis is a Berkeley trained Data Scientist and a Certified ScrumMaster (CSM), with a background in Project Management. He has experience working with a variety of data types-- from intelligence analysis to electronics QA to business analytics. In Data Science, his passion and current focus is in Machine Learning based Predictive Analytics and Network Graph Analysis. He holds a Master's in Data Science from the UC Berkeley and a Master's in International Affairs from Texas A&M.

Claudius Weinberger (Köln, Germany) @weinberger

Claudius Weinberger is the CEO and Co-founder of ArangoDB GmbH - the company behind identically named NoSQL multi-model database. Claudius has been a serial entrepreneur for the majority of his life. Together with his co-founder, he has been busy building databases for more than 20 years. He started with in-memory to mostly memory databases, moved to K/V stores, multi-dimensional cubes and ultimately graph databases. Throughout the years he focused mostly on product and project management, further sharpening his vision of the database market. He has co-founded ArangoDB in 2012. Claudius studied economics with business informatics as key aspect at the University of Cologne. He spends all his free time with his two little daughters, is a judo enthusiast and occasionally enjoys gardening.
Claudius will present the following Graph Day session: Fishing Graphs in a Hadoop Data Lake.