GENGO.CC


HOME / CODE / LAB

Profile

Masaya Taniguchi is a researcher position in Natural Language Understanding team at RIKEN Advanced Intelligence Project. Additionally, he is a member of TohokuNLP, led by Dr. Kentaro Inui, at Tohoku University. He earned his Bachelor of Science in Mathematics from Gakushuin University and his Master of Science and Ph.D. in Information Science from Japan Advanced Institute of Science and Techonology. Masaya Taniguchi's research primarily focuses on the formal semantics of natural language and programming languages. His keen interests lie in exploring and advancing the understanding of these domains.

Research Interests

Publications and Talks

  1. 青木洋一, 工藤慧音, 栗林樹生, 谷口雅弥, 曾根周作, 坂口慶祐, 乾健太郎, 算術推論問題における自己回帰型言語モデルの内部機序, 言語処理学会第30回年次大会, 2024.3, Kobe, Japan Domestic Conference
  2. 松﨑孝介, 谷口雅弥, 乾健太郎, 坂口慶祐, J-UniMorph: 日本語の形態論における意味分類の体系化, 言語処理学会第30回年次大会, 2024.3, Kobe, Japan Domestic Conference
  3. 三浦東子, 谷口雅弥, 坂口慶祐, 乾健太郎, 検出器の判断に基づく大規模言語モデルの生成テキストの特徴分析, 言語処理学会第30回年次大会, 2024.3, Kobe, Japan Domestic Conference
  4. 青木洋一, 工藤慧音, 曾根周作, 栗林樹生, 谷口雅弥, 坂口慶祐, 乾健太郎, 言語モデルの思考連鎖的推論における探索戦略の動的変化, 言語処理学会第30回年次大会, 2024.3, Kobe, Japan Domestic Conference
  5. Masaya Taniguchi, Mathematical Logic Approach for Incremental Parsing, AIP Retreat, 2024.3, Tokyo, Japan Domestic Conference
  6. Masaya Taniguchi, Selling your research: what we did to get people to use categorical grammar, Discovery Evening, RIKEN, 2024.2, Saitama, Japan Talk
    Mathematical linguistics is an interdisciplinary field between linguistics and mathematics that emerged in the second half of the 20th century. It was known as a natural language processing technology before machine learning became popular. We have promoted this field because we believe that mathematical linguistics can be useful in today's world of generative AI and large language models. In this talk, we will present our overseas research activities to promote our research results and our methods for expanding our network. In addition, recent results in mathematical linguistics will be presented.
  7. Masaya Taniguchi, Context-sensitive Extension of Lambek Calculus, MLG数理論理学研究集会, 2024.2, Miyagi, Japan Domestic Conference
  8. Masaya Taniguchi, Lambek Calculus for Context-sensitive Language, Ookayama Mathematical Logic Seminar, 2024.2, Talk
    This talk is about making a certain kind of grammar, called mildly context-sensitive grammar, more formal. We're looking at formal languages, and they're related to ideas in mathematical logic. In particular, the way we formalize context-free grammar has a lot to do with results in mathematical logic. Back in 1958, Joachim Lambek organised categorical grammar using sequent calculus. Then Martin Pentus showed that Lambek's system was the same as context-free grammar. In the 1990s, Mark Steedman also introduced combinatorial categorical grammar using combinatorial logic, providing a new way of formalising categorical grammar. This grammar has more expressive power than context-free grammar. Thus, there are two ways to formalise and develop categorical grammar. The aim of this study is to extend Lambek's system so that it can express things in the same powerful way as Steedman's system. First, we introduce categorial grammar and show the context-free grammar and the context-sensitive grammar. Then we formalise the grammar in Lambek calculus and show our extension. Finally, we discuss the logical property from the point of view of mathematical logic.
  9. Masaya Taniguchi, Mathematics in Context Sensitive Language, AIP Math retreat, 2024.2, Kanagawa, Japan Domestic Conference
  10. 谷口雅弥, 定理証明支援系によるプログラミング支援:言語学から工学への橋渡し, 第9回 山陰 数学と基礎論研究集会, 2024.1, Tottori, Japan Domestic Conference
    言語学 のモチベーションで登場した文法理論を自然言語処理 (Natural Language Processing, NLP)の構文解析アルゴリズムへ応用する。 このときに、言語学上の議論されていた諸性質が捨象されていないことを定理証明支援系で確認する一連の取り組みを紹介する。 生成文法理論では内省的文法性判断という文の正しさを直ちに判断することができるという能力を提唱している。 この能力は文の正しさを有限時間で判定することができるということを暗黙的に仮定している。 本発表では文の正しさを有限時間で判定することができる、つまり構文解析のアルゴリズムの決定可能性に注目する。 部分構造論理の一つであるランベック計算はカット除去定理が成立し証明探索が有限のステップで終了することが知られている。 一方で、同等の計算能力を持つ範疇文法の派生体系では証明探索の観点に基づく証明論的分析が共有されてこなかった。 本発表では範疇文法の証明探索を定理証明支援系 Isabelle/HOL 上に再現し、そのアルゴリズムの停止性を証明する。
  11. 谷口雅弥, 範疇文法の計算論的解釈, 言語学フェス, 2024.1, Online Domestic Conference
  12. 谷口雅弥, 数理言語学から見るLambek, Hasegawa Laboratory Seminar, 2023.12, Kyoto, Japan Talk
  13. 谷口雅弥, 範疇文法の決定可能性, Matsuzaki Laboratory Seminar, Applied Mathematics Department, Tokyo University of Science, 2023.12, Tokyo, Japan Talk
  14. Masaya Taniguchi, Mathematics in Natural Language, Guest Lecture (Lecture for Natural Language Processing), Information Science Department, Tohoku University, 2023.12, Miyagi, Japan Talk
  15. 谷口雅弥, Computational Complexity of Combinatory Categorical Grammar, 若手による数理論理研究集会, 2023.12, Tokyo, Japan Domestic Conference
    我々の話すことば (自然言語) は主語・述語のように構造を内包している。この構造の複雑さは計算可能性理論によって分類できる。 とくに自然言語は線形拘束オートマトン (LBA) で受理判定できると目されている。LBA は決定可能な手続きが存在することが知られている。 自然言語の文法とその構文解析器も決定可能であることが期待される。本発表では自然言語の構文解析に用いられている組合せ範疇文法を対象にする。 これは組み合わせ論理に基づいた文法である。この文法の構文解析について決定可能な手続きは知られていなかった。 我々は証明論の観点からことばの構造を分析し決定可能な手続きを部分的に明かにした [Taniguchi 2022]。 本発表では決定可能な手続きの詳細とその後の成果について紹介する。 [Taniguchi 2022]: Masaya Taniguchi, “Decidable Algorithm for CG with Type-raising”, 4th The Proof Society Autumn School and Workshop, 2022.
  16. 谷口雅弥, 速習ランベック計算, 数学基礎論若手の会, 2023.12, Chiba, Japan Domestic Conference
  17. 谷口雅弥, Structural Connection between Directional Implications and Type-Raising, 証明論と計算論の最前線, 2023.12, Kyoto, Japan Domestic Conference
    Within the context of the CG natural deduction system, which is sensitive to the order and quantity of assumptions, we introduce two implications (←, →). We derive two rules from these additions: (a ← x) → a and a ← (x → a) from x. Our discussion revolves around the methodology to streamline the complexity of proof diagrams generated by these rules, reducing them to problems associated with higher-order unification. Through this reduction, we reorient the focus from termination judgments in proof search to termination judgments in unification, thereby enhancing the clarity of proof determination in this system.
  18. 谷口雅弥, Bridging the Gap: Combinatorial Categorical Grammars and Proof Theory, Computational Logic Seminar, Technische Universität Wien, 2023.7, Vienna, Austria Talk
  19. Kosuke Matsuzaki, Masaya Taniguchi, Keisuke Sakaguchi, Kentaro Inui, 日本語学習のための形態意味中心の動詞活用, YANS, 2023.7, Tokyo, Japan Domestic Conference
  20. 谷口雅弥, Proof Theoretic Linguistics, Sendai Logic Group Seminar, 2023.5, Miyagi, Japan Talk
  21. 谷口雅弥, ランベック計算LCと範疇文法CGの等価性, 第8回山陰基礎論と数学およびその周辺の研究集会, 2023.1, Tottori, Japan Domestic Conference
    二種類の implication をもつ部分構造論理の体系にランベック計算がある。最も基本的なランベック計算で は、カットを除く一切の構造化規則が許されない。この性質を利用して、ランベック計算を自然言語の文法規 則の記述言語として用いることができる。なぜなら、一般に自然言語では語順の自由度が低く (非可換)、さら に、単語を省略することができず (非冪等性)、また、任意の単語を挿入することができない (非単調性) から である。このような試みは、ランベック計算の研究の初期 [1] から行われており、言語学の文法理論の一つで ある範疇文法から発想を得ていた。範疇文法はチョムスキー階層における文脈自由文法に属する文法理論であ る。その文法理論から発想を得ていたランベック計算も後に文脈自由文法に属することが証明された [2]。と ころで、初期のランベック計算には product が implication の他に追加されているが、現在のランベック計算 では product-free なランベック計算を用いることが多い。そこで、本発表では product-free なランベック計 算について、カット除去定理を証明し、さらに、その計算体系に対応する言語クラスを範疇文法と照らし合わ せることで明らかにする。 [1] Lambek, Joachim. ”The mathematics of sentence structure.” The American Mathematical Monthly 65.3 (1958): 154-170. [2] Pentus, Mati. ”Lambek grammars are context free.” [1993] Proceedings Eighth Annual IEEE Sym- posium on Logic in Computer Science. IEEE, 1993.
  22. Masaya Taniguchi, Decidable Parsing Algorithm for Categorial Grammar with Type-raising, The Proof theory Society workshop, 2022.11, Utrecht, Netherlands International Conference refereed
    One of the combinatory rules is the T combinator called a type-raising rule to correspond with swapping a head in X-bar theory, where the head takes another component. In CG and its variants, we parse a sentence by proving the theorem Γ ⊢ S where Γ is a sequence of categories, e.g., “He walks” is given by NP, NP\S ⊢ S. This proof system is known as a non-cut-free system. The non-cut-free proof is a problem for the decidability of the sequent calculus. Hence, there is a limitation in the usage of the rule in most CCG parsing algorithms. For example, a parser allows the type raising only for the noun phrase. In the present paper, we eliminate the limitation of the type-raising rule by the proof-theoretic analysis.
  23. Masaya Taniguchi, Satoshi Tojo, Left-branching tree in CCG with D combinator, Logic and Engineering of Natural Language Semantics, 2022.11, Tokyo, Japan International Conference refereed
    The essence of incremental parsing is to construct a partial syntactic structure stepwise from the head of a sentence. Thus, the parsing algorithm is preferable when we analyze the process in which human listen to/ read a sentence in the temporal order, and beneficial to reveal an intermediate state of natural language understanding and to parse a long natural language sentence. In categorial grammar (CG) and its variants, there are incremental parsing algorithms, but not all grammatical sentences could be parsed by these algorithms. It is empirically known that all the grammatical sentences might be parsed incrementally by CG with combinator rules. As the previous researches were software simulations of the incremental parsing, which did not show that they work sentences of any length. Thus far, we have empirically shown 4 that any grammatical sentences could be incrementally parsed, however, no proofs to this is known to us. In this paper, we show the main theorem; For the certain parsing tree generated by CG, we can generate the equivalent parsing tree without binary nodes on the right branch by the combintor rules 𝐵, 𝐷, 𝑇, where the condition is to except the backward long reference.
  24. 谷口雅弥, CG for Ungrammatical Sentences: Proving the Unprovability, Michinoku Communicatoin Science Seminar, TohokuNLP, Tohoku University, 2022.10, Miyagi, Japan Talk
  25. Masaya Taniguchi, Satoshi Tojo, Losing a Head in Grammar Extraction, Knowledge and Systems Engineering, 2022.10, Nha Trang, Vietnam International Conference refereed
    The treebank corpus is a collection of the tree that represents a sentence constituency and dependency relation. We are motivated to extract grammar rules from the treebank, that is to decompose the tree data structure and to find grammar rules. After the extraction, we need to validate the adequacy of the grammar so that we inspect the generative power of the obtained grammar. In this phase, the syntactic head is a significant feature, however, in the obtained grammar the head information is missing. Hence, we propose to supplement the lost head information with the type-raising rule of categorial grammar (CG). We extend the same issue to combinatory categorial grammar (CCG) and solve it using the generalized type-raising. Furthermore, we verify our grammar by the formal proof written in the proof assistant system, Isabelle/ HOL.
  26. Masaya Taniguchi, Unprovability of Continuation-Passing Style Transformation in Lambek Calculus, Student Session of European Summer School in Logic, Language and Information, 2022.8, Galway, Ireland International Conference refereed
    We aim at constructing a partial semantics structure incrementally from the beginning word of a sentence. To achieve this property, we employ combinatory categorial grammar (CCG), which enables us to acquire a semantic structure and a parsing tree simultaneously. Therefore, we introduce new rules as a natural extension of existing formalisms, and in addition, we implement an interactive parser that can externalize a parsing tree for each word input. As a result, we find a left-branching tree without employing extra memory for floating tokens that are un-treed words. This is because categorial grammar and its derivations have a confluence property in the simply-typed lambda calculus; that is, the reduction steps do not affect the resultant sequence of semantics terms. Generally, we build a constituency tree structure from a sentence using the parser with some given grammar rules, i.e., the building process is a sequence of unification between applicable Grammar rules and a given Sentence. However, we can execute the parser in the reversal way, as in definite clause grammar; when we execute the unification between an input Sentence and a parsed Tree, we can replicate a set of innate Grammar rules. This system takes a sentence as input and displays the possible parsing results in CCG. For each word input, the partial sentence is processed as follows; 1. The tree and the sentence are used to extract the grammar rules employed in the parser. 2. Based on the extracted grammar, the parser outputs all possible parsing results in CCG. The type of the partial tree is not uniquely determined, and so all the possible grammar rules are exhaustively consulted.
  27. Masaya Taniguchi, Satoshi Tojo, Koji Mineshima, Interactive CCG Parsing with Incremental Trees, Bridges and Caps between Formal and Computational Linguistics of European Summer School in Logic, Language and Information, 2022.8, Galway, Ireland International Conference refereed
    Combinatory categorial grammar (CCG) has been able to accommodate various linguistic phenomena with added combinators to categorial grammar (CG). In particular, the type-raising rule realized by such a combinator in CCG and has solved the scope change of quantifiers, and the rule is generalized as continuation-passing style (CPS) transformation. However, there is concern that CPS may exceedingly accept ungrammatical sentences. In this paper, we analyze the expanded grammar rules using the Lambek calculus, that is a formal system of CG, to restrict CPS transformation. First, we show that Barker’s CPS transformation is provable and Plotkin’s CPS transformation is unprovable in Lambek calculus. Second, we show that a subset of Plotkin’s CPS transformations which is provable in Lambek calculus. Due to the complexity of proving unprovability, we formalize the proof in Isabelle/HOL and verify it. We regard this subset represents the grammatical class in terms of Lambek calculus, and call it type-restricted CPS transformation.
  28. 谷口雅弥, Incremental Parsing: Categorial Grammars and Lambek-style Calculi, Substructural Logic Workshop, 2022.5, Ishikawa, Japan Domestic Conference
  29. Masaya Taniguchi, Satoshi Tojo, Interactive Grammar Extraction from a TreeBank, Journal of Intelligent Informatics and Smart Technology, 2022. Warning: Undefined array key "month" in php-wasm run script(7) : eval()'d code on line 69 , Journal refereed
  30. 谷口雅弥, Formalization of Categorial Grammar, Theorem Proving and Provers meeting, 2021.11, Hokkaido, Japan Domestic Conference
    範疇文法において型の繰り上げ規則などを導入することで,より広い言語クラスの文法を 表現することがSteedmanやMoortgatらによって提案されてきた.我々は範疇文法を定理証明支援系 Isabelleによって形式化し,型の繰り上げ規則などが体系から定理として自然に導出できることを示す.
  31. Masaya Taniguchi, Satoshi Tojo, Interactive Grammar Extraction from a Treebank, Knowledge, Information and Creativity Support Systems, 2021.11, Bangkok, Thailand International Conference refereed
    The notion of normative grammar is helpful for tagging sentences of a large corpus, that is to annotate each word by a part of speech (POS). In this research, we aim at obtaining categorial grammar rules, where the category is a generalized notion of POS. However, to find a proper set of grammar rules is computationally exponential regarding the length of sentence, and thus, a reliable but exhaustive search method is in demand. Here, we present a support system for the annotation by the CCG parser based on the bidirectionality and the non-determinism in logic programming. Contrary to the common usage of the parser, we extract a set of grammar rules from a syntax tree, and moreover, we retrieve all the probable readings with the system. In this paper, we show the parsing algorithm and employ it for the grammar extraction. Further, we consider the application of the algorithm to generate a sentence and to fill a masked sentence.
  32. Masaya Taniguchi, Satoshi Tojo, Incremental derivations with Q combinator in CCG, Logic and Engineering of Natural Language Semantics, 2021.11, Online International Conference refereed
    We show that there exists a left-recursive parsing tree for any given sentence generated by Combinatory Categorial Grammar (CCG). In order to do this, we add four binary deduction rules Q that correspond to Geach’s rule based on the mixed composition, which was introduced in the early study. These rules correspond to Q combinator in Combinatory Logic, and it corresponds to the exchange of the head in linguistics. This paper gives a procedure of translation from a given derivation to a left-recursive one. Our contribution includes the visualization of the transition of semantics in the incremental readings.
  33. 谷口雅弥, Continuations and Polymorphic Lambek Calculus, 論理・言語・代数系と計算機科学の周辺領域, 2021.2, Online Domestic Conference
    We introduced the notion of continuation in lambda calculus for Lambek calculus and showed that the continuation-passing style transformation could be naturally derived from the rules of Lambek calculus. Furthermore, since the answer category of a continuation is given when the whole sentence is determined, we introduced a polymorphic category and generalized the continuation-passing style transformation in the polymorphic Lambek calculus.
  34. 谷口雅弥, CPS変換と多相範疇文法, Symbolic Logic And Computer Science, 2020.12, Online Domestic Conference
  35. Masaya Taniguchi, Satoshi Tojo, Generic Framework to Uncross Dependency, Artifical Life and Robotics, 2020.1, Beppu, Japan International Conference refereed
    Combinatory Categorial Grammar (CCG) called Mildly Context-Sensitive Grammar (MCSG) deals with dependencies. CCG uses the Categorial Grammar for the syntax types, and it uses the extension of simply-typed lambda calculus for the semantics types of each word. Most of the natural languages stay in this grammar as the syntax type. However, we propose the new grammar, in which the semantic type is the extension of System F to analyze the cross-serial dependencies that are the relationship of word reference over the other references. Our grammar has the CPS-transformation rules instead of the Dutch forward crossed composition rule in original CCG. That rule works as the generic framework to uncross dependency.
  36. 谷口雅弥, Subjunctive Markers and Delimited Continuations, Symbolic Logic And Computer Science, 2019.12, Kyoto, Japan Domestic Conference
  37. Song Yang, Masaya Taniguchi, Satoshi Tojo, 4-valued Logic for Agent Communication with Private / Public Information Passing, International Conference on Agents and Artificial intelligence, 2019.2, Prague, Czech Republic International Conference refereed
    Thus far, the agent communication has often been modeled in dynamic epistemic logic, where each agent changes his/her belief, restricting the accessibility to possible worlds in Kripke semantics. Prior to the message passing, in general, the sender should be required to believe the contents of the message. In some occasions, however, the recipient may not believe what he/she has heard since he/she may not have enough background knowledge to understand it or the information may be encrypted and he/she may not know how to decipher it. In this paper, we generalize those messages that require special knowledge as private information and formalize that the recipient does not change his/her belief receiving such private messages. Then, we distinguish the validity of the information from the belief change of the recipient; that is, even though the communication itself is held and the information is logically contradictory to his/her original belief, the recipient may not change his/her belief. For this purpose, we employ 4-valued logic where each proposition is given 2 (usual true and false) times 2 (private or public information or not) truth value.
  38. 谷口雅弥, Introduction to Montague Grammar, 数学基礎論若手の会, 2018.11, Okinawa, Japan Domestic Conference
  39. Hiroki Sudo, Masaya Taniguchi, Satoshi Tojo, Finding Grammar in Music by Evolutionary Linguistics, Knowledge, Information and Creativity Support Systems, 2018.11, Phatthaya, Thailand International Conference refereed
    In this paper, we assume that the progression rules of music are in a subclass of context-free language, and we let computers find them autonomously. We employ the Iterated Learning Model (ILM) by Simon Kirby, and ask if the computer can find a music knowledge that is common to us, and also if the computers can compose music independently of our music knowledge. In this research, we have shown an example set of rules found in the 25 études of Burgmüller by beat. Although many of categories in the tree seem redundant and futile, some of them reflect probable progressions, which well match with our human intuition. This experiment has several virtues compared with other grammar-based formalism for music. One is that we do not need to provide a dictionary beforehand. The other is that we can exclude the human-biased intuition, which had hindered the definition of creativity.

Academic work

Non-academic work

Teaching

I have an experience to teach the formal language theories and undergraduate level mathematics.

Degree

Supervisor

Social Contribution Activities

Financial Support

Research Experience

Research Grant and Research Budget

Awards

Physical Skills

Extra Skills

Social Media

Software products

I am an OSS developer joining many projects such as the interpreter of programming language, utilities for daily work, and some libraries. You can see these works on GitHub.

Acknowledgement

My OSS is supported by Gitpod and GitHub.

Gitpod provides Professional Open Source license to me. GitHub provides GitHub Copilot to me.